Alooma: Solving your ETL problems, one question at a time
How can data integrity be achieved, and what are some examples?
Keeping your data integrity is like cleaning your house. It’s a repetitive, iterative process of carefully examining each piece of data and making sure its is not skewed, missing, or duplicated. It’s a never ending process, especially in our era of abundant data sources generating millions of records daily.
How do I go about building a business intelligence app in Python?
Goal is to use python to build the ETL as well as Reporting applications.
What are the best web based ETL tools?
Before we dive into the specific details of Alooma, we can briefly talk about trends in the ETL world that moved from batch-based to stream-oriented solution.
What are the reasons that Data warehouses are moving to the Cloud?
As a business you want to minimize the amount of effort you need to put into aspects which are not your main line of business, and unless you are Amazon, Google or Snowflake, for example, that is most likely not building or maintaining data warehouses.
What is a data lake in the context of big data?
Does Data Lake include raw data only or transformed data as well?
What is spark and how to use MongoDB?
Spark is an open source processing engine built around speed, ease of use, and analytics. It’s good for large amounts of data that requires low latency.
What is the best way to load data into Amazon Redshift from MySQL?
This is actually one of the problems we are trying to address at Alooma - building a robust data pipeline that can take your inputs and reliably move the data into Redshift (in real time, without any data loss, and performing advanced transformation along the way).
What is the best way to move my data from an AWS PostGres to Redshift?
There are a few ways to address this problem, and it mostly depends on what the requirements are and where the server is hosted.
What is the difference between a data pipeline and an ETL pipeline?
They are two related, but different terms, and I guess some people use them interchangeably.
What is the difference between Amazon Redshift and Amazon Redshift Spectrum and Amazon Aurora?
Amazon Aurora is a relational database engine. It’s designed to be compatible with MySQL 5.6, so that existing MySQL applications and tools can run without requiring modification. It’s good for production usage for lots of applications but not necessarily for complex data analytics.
What is your suggested SaaS analytics stack for measuring KPI's and actionable metrics?
There are two different approaches to analytics - custom and off-the-counter. I’m a personal fan of custom analytics, so for me the main ingredients of a stable, strong analytics stack are
What's the most tedious part of building ETLs and/or data pipelines?
Keeping your data pipeline working properly. It is an ever-growing, never-ending job.
Why should I use an existing ETL vs writing my own in Python?
A major factor here is that companies that provide ETL solutions do so as their core business focus, which means they will constantly work on improving their performance and stability while providing new features (sometimes ones you can’t foresee needing until you hit a certain roadblock on your own).