What is Data Ingestion?

by Garrett Alley  
5 min read  • 14 Aug 2018

Companies rely on data to make all kinds of decisions — predict trends, forecast the market, plan for future needs, and understand their customers. But, how do you get all your company's data in one place so you can make the right decisions? Data ingestion allows you to move your data from multiple different sources into one place so you can see the big picture hidden in your data.

Data ingestion defined

Data ingestion is a process by which data is moved from one or more sources to a destination where it can be stored and further analyzed. The data might be in different formats and come from various sources, including RDBMS, other types of databases, S3 buckets, CSVs, or from streams. Since the data comes from different places, it needs to be cleansed and transformed in a way that allows you to analyze it together with data from other sources. Otherwise, your data is like a bunch of puzzle pieces that don't fit together.

You can ingest data in real time, in batches, or in a combination of the two (this is called lambda architecture). When you ingest data in batches, data is imported at regularly scheduled intervals. This can be very useful when you have processes that run on a schedule, such as reports that run daily at a specific time. Real-time ingestion is useful when the information gleaned is very time-sensitive, such as data from a power grid that must be monitored moment-to-moment. Of course, you can also ingest data using a lambda architecture. This approach attempts to balance the benefits of batch and real-time modes by using batch processing to provide comprehensive views of batch data, while also using real-time processing to provide views of time-sensitive data.

Data ingestion challenges

Slow. Back when ETL tools were created, it was easy to write scripts or manually create mappings to cleanse, extract, and load data. But, data has gotten to be much larger, more complex and diverse, and the old methods of data ingestion just aren’t fast enough to keep up with the volume and scope of modern data sources.

Complex. Because there is an explosion of new and rich data sources like smartphones, smart meters, sensors, and other connected devices, companies sometimes find it difficult to get the value from that data. This is, in large part, due to the complexity of cleansing data — such as detecting and removing errors and schema mismatches in data.

Expensive. A number of different factors combine to make data ingestion expensive. The infrastructure needed to support the different data sources and proprietary tools can be very expensive to maintain over time, and maintaining a staff of experts to support the ingestion pipeline is not cheap. Not only that, but real money is lost when business decisions can’t be made quickly.

Insecure. Security is always an issue when moving data. Data is often staged at various steps during ingestion, which makes it difficult to meet compliance standards throughout the process.

Well-designed data ingestion: Alooma’s solution

When data ingestion goes well, everyone wins.

Faster and flexible. When you need to make big decisions, it's important to have the data available when you need it. With an efficient data ingestion pipeline such as Alooma’s, you can cleanse your data or add timestamps during ingestion, with no downtime. And you can ingest data in real time, in batches, or using a lambda architecture.

Less complex. While you may have a variety of different sources with different data types and schemas, a well-designed data ingestion pipeline should help take the complexity out of bringing these sources together. With Alooma, you can import data from hundreds of data sources into your cloud data warehouse. Alooma can help translate from an on-premise schema, such as Oracle, to whatever schema you're using in your data warehouse. Alooma can even infer the schema from the structure of the data. Once Alooma determines the schema, it can start streaming immediately.

Cost efficient. Well-designed data ingestion should save your company money by automating some of the processes that are costly and time-consuming. In addition, data ingestion can be significantly cheaper if your company isn’t paying for the infrastructure to support it. With Alooma’s Cloud platform, you save money by reducing infrastructure costs, but you also save money when Alooma helps you automate the data ingestion process — allowing you to make the business decisions that save your company money in a timely manner.

Secure. Moving data is always a security concern. But, security is baked into the DNA of the Alooma platform and is an area where we shine. Alooma is SOC 2 Type II, HIPAA, GDPR, and EU-US Privacy Shield Framework compliant and supports OAuth 2.0. Data is encrypted in motion and at rest.

Getting started with Alooma

Alooma is a real-time data ingestion solution designed to take the headaches out of data ingestion by helping automate and simplify the process.

Are you ready to get started? Contact Alooma today to see how we can help.

This might interest you as well