What are the pitfalls to avoid when implementing an ETL (Extract, Transform, Load) tool?
Before deciding on an ETL tool you need to define your data pipeline requirements.
- What type of data sources are you using now and in the near future?
- Are you using unstructured data?
- How often do you need to update the DWH (Realtime vs. Batch)?
- What is your Error handling requirement? Check out this cool feature we implemented at Alooma Restream Queue
- How often does your schema change?
- If you are considering a Cloud service (which in most cases I think you should) does it comply with Security and Privacy policies like SOC 2 Type II, HIPAA, GDPR, and EU-US Privacy Shield Framework.
In this blog post you can read about the differences between traditional ETL and a modern ETL: ETL Process: Traditional vs. Modern
Published at Quora. See Original Question here