Using Real-Time ETL to Improve Decisions

by Jason Lim  
7 min read  • 14 Nov 2018

In order to compete and be successful, a business needs to make good decisions fast. And we all know that the best decisions are informed by good data. But it’s not enough to simply have data — in many cases, it needs to be data that you can get in real time because it is the most useful.

Take Alooma customer Bringhub, a contextual commerce ad platform that helps publishers monetize their digital content by surfacing relevant ads. In order to optimize their platform’s performance, they capture click stream information from their clients' high-traffic websites, tracking loads, impressions, and clicks. Leveraging Alooma to replicate their click stream into Snowflake in real time allows Bringhub to better understand the behavior of their users, and improve the overall experience of their offerings.

Think of the importance of real-time data when it comes to monitoring security. In light of recent major data breaches, like ones at Equifax and Yahoo, it’s crucial that companies can monitor their data in real time to find anomalies and mitigate security risks. The fallout from not detecting, fixing, and communicating such breaches can be devastating.

Real-time data needs other real-time data to play with

The best decisions come from linking data together to discover and understand the relationships between your data sources. Too often, data is siloed in different places — usage data might live in your SDK, marketing data might live in a third-party tool, customer data might live in a CRM. Referencing this data in decision making is a good place to start, but linking your data together helps you get the full picture.

For real-time decision making, it’s essential to have all data sources linked in one place — especially if you rely on data to understand the health of your business.

“Real-time data helps us identify when certain areas of our app aren’t performing well,” said James Draper, Director of Analytics at Chatbooks.

Chatbooks is a company that lets users automatically make books of Instagram photos. Chatbooks has designated user milestones which help the Product and Design team see drop-off rates at each milestone as they release new versions. They compare new versions against old ones to determine if they need to rapidly change or reverse their UI to an older version, or to go back to the drawing board entirely.

Chatbooks also uses real-time data to detect problems. They once had a server reboot and stop their order processor. They used data from Alooma to see that their revenue had flatlined, long before a developer had caught the error. They were able to investigate the problem and start collecting orders again right away.

Real-time data is better in the cloud

The advantages of operating in the cloud include flexible cost, scalability, and speed. Compared to managing data housed on-premise, cloud infrastructure modernizes as fast as new technology is developed. In order to capture all the right data in real time from all sources in the cloud, a modern ETL is the optimal solution.

This is especially true for large, traditional organizations spread across geographies with fragmented business units using different systems. If data is stuck on-premise, it can be sluggish to move it into one place. But with cloud storage and a cloud ETL, all of it becomes streamlined. Imagine a traditional manufacturing company that needs to link machine data, production data, and financial data to measure equipment efficiency in real time in order to adapt manufacturing processes. With the cloud, this is now achievable.

Real-time data relies on stream processing

The method Alooma uses to move large volumes of data from different sources in real time is stream processing. This means that each datapoint is streamed, or processed as it arrives, independently from other data points. This allows a user to view a continuously updated state of a system and its data, rather than periodical snapshots that would have been produced by a batch processing method.

Stream processing is super complex to achieve and is the reason why a real-time ETL tool is difficult to build. Under the hood, the key to achieving high throughput streaming is a distributed system, which splits data up into smaller pieces to be processed across multiple servers in parallel.

Alooma’s stream processing technology is built using a combination of open-source technologies: Apache Kafka, Apache Storm, Redis and more. Apache Kafka is a distributed and robust queue that can handle large volumes of data and enables messages to pass from one end point to another. Apache Storm is a distributed real-time processing platform that allows for scalable manipulations on real-time data. Storm, along with its Trident extension, enables Alooma to guarantee that every event will be processed 'at least once', by tracking and retransmitting failed events, and at 'most once' by overwriting already processed events (aka idempotency). Together, 'exactly once processing' is achieved for minimal data loss and corruption.

real time stream processing

Trying to understand how data is flowing in real time is hard. That’s why we developed Alooma Live, a visualization of your data flowing in real-time from all your data input sources.

Making sense of real-time data requires BI tools

Imagine that every decision that every team makes can be informed by real-time data. BI tools make data straightforward and easy to absorb. Tools like Looker, Tableau, and Periscope Data create visualizations which can illustrate patterns, data relationships, and historical trends. They can also be good for morale! With the right BI tool, it’s much easier for teams to understand how their day-to-day work impacts the business overall. An increasingly popular application of this, is putting a large visible TV dashboard in the office for teams to monitor product usage metrics, customer sentiment, or digital marketing. This helps to rally teams around making quick decisions when it matters most.

Start thinking about applications for real-time data

When you’re in a fast-paced industry and need to make quick data-informed decisions, the only answer is to have access to your data in real time. We encourage you to imagine all the possibilities of what you could do with this data and how it can improve your business. Once you get used to using real-time data to work, it’s very hard to go back.

This might interest you as well