What is the difference between a data pipeline and an ETL pipeline?
They are two related, but different terms, and I guess some people use them interchangeably.
ETL pipeline refers to a set of processes extracting data from one system, transforming it, and loading into some database or data-warehouse.
Data pipeline is a slightly more generic term. It refers to any set of processing elements that move data from one system to another, possibly transforming the data along the way.
The term ETL pipeline usually implies that the pipeline works in batches - for example, the pipe is run once every 12 hours, while data pipeline can also be run as a streaming computation (meaning, every event is handled as it occurs).
Another type of a data pipeline that is an ETL pipeline, is an ELT pipeline: loading all of your data to the data warehouse, and transforming it only later.
Additionally, data pipeline doesn’t have to end in loading the data to a database or a data warehouse. It can, for example, trigger business processes by triggering webhooks on other systems
Further read on how to build a data pipeline.
Published at Quora. See Original Question here