What is a Data Warehouse?

by Garrett Alley  
5 min read  • 1 Jun 2018

Today’s enterprise relies on the effective collection, storage, integration, and analysis of data. These activities have moved to the heart of revenue generation, cost containment, and profit optimization. As such, it’s no surprise that the amounts of data generated — as well as the number and types of data sources — have exploded.

Data-driven companies require heavy-duty solutions for managing and analyzing large quantities of data across their organizations. These systems must be scalable, reliable, and secure enough for regulated industries, as well as flexible enough to support a wide variety of data types. The requirements go way beyond the capabilities of any traditional database. That’s where the data warehouse comes in.

Breaking it down: what is a data warehouse, anyway?

A data warehouse is a large-capacity repository that sits on top of multiple databases. Whereas the conventional database is optimized for a single data source, such as payroll information, the data warehouse is designed to handle a variety of data sources, such as sales data, data from marketing automation, real-time transactions, SaaS applications, SDKs, APIs, and more.

There are other differences, as well. For example, single-source databases are built for speed, employing online transactional processing (OLTP) to insert and edit small transactions. However, due to their structure, they do not lend themselves to advanced analytics. In contrast, a data warehouse uses online analytical processing (OLAP), which is designed for fast, sophisticated analysis.

Databases and data warehouses do have some similarities, however. Besides the fact that they are both repositories for large amounts of data, both can be queried. And they both have the ability to store data in tables (although databases only store data in two-dimensional tables; data warehouses contain multidimensional tables with layers of columns and rows).

On-premise and in the cloud

Companies are increasingly moving away from on-premise data warehouses to the cloud, leveraging the cost savings and scalability managed services can provide. The architecture of these cloud-enabled data warehouses differs from that of their traditional, on-premise counterparts.

Traditional data warehouse architecture is separated into three tiers: one for the database server that extracts data from multiple data sources, one for the OLAP server (which transforms the data), and one for the client level.

Cloud-based data warehouses are an entirely different animal. Their architecture varies tremendously among vendors. For example, Amazon’s Redshift is essentially a cloud-based representation of on-premise data warehouses. BigQuery is serverless so it manages computing resources dynamically and hides resource-management decisions from the user.

The cloud offers some distinct advantages:

  • It’s managed. Instead of hiring your own data-warehousing team, a cloud data warehouse lets you outsource the management hassle to professionals who must meet service level agreements (SLAs).
  • It outperforms on-premise data warehouses. Cloud-based solutions offer superior reliability and speed. They are generally more secure than on-premise data warehouses, making them a good choice for the enterprise.
  • It’s built for scale. Cloud-based data warehouses are elastic, so you can instantly add capacity.
  • It’s more cost effective. With cloud, you pay for what you use. Some providers charge by throughput. Others charge per hour per node. In every case, you avoid the mammoth costs incurred by an on-premise data warehouse that runs 24 hours a day, seven days a week.

Check out our guide on selecting the right cloud-based data warehouse for your environment.

Do you need a data warehouse?

Some businesses and industries require more data analysis than others. For example, Amazon uses real-time data to adjust prices three or four times a day. Insurance companies track policies, sales, claims, payroll, and more. They also use machine learning to predict fraud. Gaming companies must track and react to user behavior in real time to enhance the player’s experience. Data warehouses make all of these activities possible.

If your organization has or does any of the following, you’re probably a good candidate for a data warehouse:

  • Multiple sources of disparate data
  • Big-data analysis and visualization — both asynchronously and in real time
  • Custom report generation/ad-hoc analysis
  • Data mining
  • Machine learning/AI
  • Data science

These activities and assets require more than the traditional single-source database can provide. They require an “industrial-strength” data warehouse.

Alooma can help

Of course, scaling your data infrastructure requires more than a data warehouse. The missing link is a modern ETL solution, such as Alooma, which was purpose-built for today’s cloud-based data warehouse. Alooma extracts data from hundreds of data sources, including SaaS applications, cloud storage, APIs, SDKs, traditional databases, and more. It transforms your data, formatting it to match the requirements of your data warehouse or providing the ability to enrich the stream before it’s loaded. Alooma enables real-time data analysis and unprecedented business intelligence.

Alooma can also assist you in planning and executing a flawless migration to the cloud. Our seasoned experts take the guesswork out of the process, so you can focus on data analysis, instead of logistics. The Alooma platform provides enterprise-grade security: it is 100% SOC 2 Type II, ISO27001, HIPAA, and GDPR compliant. Alooma’s supported cloud service providers also meet the strictest standards in the industry.

If you’re ready to learn more about how to choose a data warehousing solution and migrate your infrastructure to the cloud, we’re ready to help! Contact us to get started.

Like what you read? Share on

Get your data flowing

Contact us to start using Alooma for free

Get Started

This might interest you as well

Schedule a free demo!

We'll show you how Alooma can integrate all of your data sources in minutes.