Modern ETL, and more...
Extract your data
Transform your data
Stream your data
Load your data
Connect to your data
Visualize your data
Learn more about ETL
ETL is generally understood by many to be the process of translating the data from one format or store to another.
ETL stands for "Extract, Transform, Load", and is the common paradigm by which data from multiple systems — typically developed and supported by different vendors, departments or stakeholders — is combined to a single database, data store, or warehouse for legacy storage or analytics.
Extraction is the process by which data is extracted from various data sources. Transformation involves transforming the data for storage in proper format for query and analysis. Finally, loading occurs when the transformed data is loaded into the target database, data store, data mart, or warehouse.
Traditional ETL tools are typically homespun, on-premise and support batch processing. Historically, teams would run nightly ETL and data consolidation jobs using free compute resources during off-hours.
These tools typically come with a number of shortcomings. First, their homespun nature means an organization must absorb the cost of maintaining their own data engineering team and knowledge is lost when team members leave or code (or configuration) goes undocumented. Since a homemade ETL solution is a one-off, standard best practices as well as security and scalability planning may be underserved, and the pipeline is only as good as the team which implemented it.
The on-premise nature of traditional ETLs comes at the cost of vertical and horizontal scaling, downtime loss, and even power consumption and facilities costs.
Additionally, an ETL's batch-processing nature means that updates to the data set (and related insights) only appear periodically. Batch processing can also go wrong; it's less costly to troubleshoot a small number of records in real time than contend with the loss of time associated with losing an entire day's worth of ingested data.
While the traditional ETL process includes extract, transform, and load, there is more to it than that.
In that process, data was extracted, in batch, from an OLTP database, and transformed in a staging area for consumption by BI teams. But the modern process can be much more complicated.
These days, data ingestion must work in real time, so users can run queries and see the present picture at any time. Can the ETL handle the full variety of data sources and streams, with new ones being added all the time?
ETLs must be fault-tolerant, secure, scalable, and accurate — along the entire pipeline — with the ability to configure error messages, reroute faulty events, and enrich data programmatically on the fly.
With modern cloud data warehouses like Amazon Redshift, Google BigQuery, and Snowflake, you can perform transformations directly on massive datasets without the need for a dedicated staging area.
Doing your ETL in batches makes sense when you do not need your data in real time. While many companies have disparate and complicated warehousing systems, incompatibility between systems and knowledge lost due to turnover results in spiraling costs and time required to consolidate data.
Modern ETL tools like Alooma are cloud-based, fully managed, and support batch as well as real-time data ingestion. Alooma's enterprise platform provides a format-agnostic, streaming data pipeline to simplify and enable real-time data processing, transformation, analytics, and business intelligence.