What is a data lake in the context of big data?
A data lake is a single, un-schemaed, un-structured, raw data repository.
It had some advantages back in 2008 when it was invented. Back then, data warehouses were rigid and heavy. The loading process was complicated and any schema change required hours of engineering.
Today this is no longer the case - data warehouses like Amazon Redshift, Google BigQuery and others are flexible and robust.
The advantages of not having to bother with data collection structure and form dwarf in comparison to the massive headache you’ll have when you try to analyze your data. You’ll be sad to find out your glorious data lake is actually a data swamp.
I’d pick data warehouse over data lake any day of the week.
The reason I’m so adamant about it is because I meet dozens of data teams struggling with this exact problem. To be honest, MongoDB to BigQuery migration - is one of the main use-cases I see here at Alooma.
Good luck, choose right :O)