If your company has a data warehouse, you are likely using ETL (Extract, Transform, Load) or ELT (Extract, Load, Transform) to get your data from different sources into your data warehouse. These are common methods for moving volumes of data and integrating the data so that you can correlate information from different sources, store the data securely in a single place, and enable members of your company from different departments to view the data.
The difference between the two terms has to do with the order in which these processes take place. Each of these methods is well-suited to different situations. Let’s look a little closer.
What is ETL?
ETL is a process that involves extracting data from disparate sources (sometimes using a staging table) and transforming it, performing such actions as changing the data types or applying calculations. Then, after the data is transformed, it is loaded into the target database, which is typically a data warehouse. When you perform ETL, the ETL software does the bulk of the heavy lifting — the transforming.
ETL is often used in the following cases:
- The source and target database are different and use different data types
- The volumes of data are moderate or small
- The transformations are compute-intensive
- The data is structured
To learn more about ETL, see What is ETL?
What is ELT?
ELT is a process that involves extracting the data, loading it to the target warehouse, and then transforming it after it is loaded. In this case, the work of transforming the data is completed by the target database. ELT typically occurs with no-SQL databases like Hadoop clusters or cloud installations.
ELT is often used in the following cases:
- The source and target database are the same type (i.e. Oracle source and target)
- There are large volumes of data
- The target database engine is well-adapted to handling large volumes of data
- The data is unstructured
To learn more about ELT, see What is ELT?
ETL and ELT comparison
These descriptions might leave you wondering which approach is better. The truth is each of these methods has advantages over the other in different circumstances, and the best solution depends on your situation.
The following table contrasts some of the key differences between ETL and ELT.
|Maturity||ETL has been around for 20 years, and is specifically designed to work with relational databases, structured and unstructured data, and very large data volume. There are numerous experts and best practices to guide you in using ETL. And there are many ETL tools to choose from.||ELT is not as well-adopted as ETL because it is not specifically designed to work with relational databases, which have dominated the market for the last 20 years.|
|Flexibility||Older ETL tools are well-suited to relational databases, but often less geared for unstructured data. In addition, using ETL tools, you must map out the data that will be moved to the target database, and any changes to that plan require the mapping to be restructured, and all of the data loaded again.||ELT tools can often handle a combination of structured and unstructured data. In addition, ELT tools usually move all of your data to the target so the resulting data set is more flexible.|
|Hardware requirements||Many traditional ETL tools require specific hardware and have their own engines to perform transformations on the data. Though modern ETL platforms run in the cloud.||ELT tools take advantage of the compute power of existing hardware to perform transformations on the data.|
• Structured data
• Smaller volumes of data and complex computations
• On-premise relational databases
• Unstructured data
• Large volumes of data and less complex computations
• Cloud environments
• Data lakes
An ETLT approach
Looking at this list of pros and cons, you likely see features in both camps that are appealing to you and relevant to the problems you want to solve. But, what if you have both structured and unstructured data? What if you want to perform complex computations and you want to load all your source data efficiently to your target warehouse? What if you want to use both ETL and ELT?
Instead of making you choose between ELT or ETL, Alooma is the modern data pipeline that has the flexibility to allow you to do both. Alooma's preferred approach is to support performing some transformations on the fly, as the data is streaming, and to then take advantages of the strengths of the target data warehouse to do the things it does best.
For example, you may want to enrich the data using timestamps or geolocations in the pipeline, but then, after the load, enable the target database to perform complex transforms that may require database-heavy tasks such as joins, which it does efficiently.
Consider a scenario where you want to join source data to existing target data in the data warehouse. With an ETLT (Extract, Transform, Load, Transform) approach, you don’t need to pull the data out of the existing target database just so you can join it to the data you will load. The resulting pipeline is one that just makes sense: allow each of your tools to do thing that they do best for an efficient, flexible solution.
Why choose Alooma?
Alooma is the enterprise data platform built for the cloud. Alooma is format-agnostic and flexible enough to integrate new data sources, both structured and unstructured, and is designed to handle massive amounts of data, moving them in real time so that analysis is near-instantaneous. Because Alooma is cloud-based, it is flexible, allowing you to scale up or down based on your unique business needs. And lastly, you can relax knowing that security is a cornerstone of the platform.