Three Ways to Ingest Data Without a Native Connector

by Max Zuckerman  
5 min read  • 30 Oct 2017

"How do I ingest data if there isn’t a native connector in Alooma?"

This is a really common question we hear. The good news is, there are three ways to ingest almost any data via Alooma.

Although we’re proud to offer many popular data source inputs natively, it’s inevitable that we can’t offer every single one with a native connector. But we’re not content to just leave it at that. As we say, "No data gets left behind!"

In an effort to make sure you can still ingest all of the data you need, even if Alooma does not yet have a native connector, we have developed three powerful methods to ingest data without you having to build a totally custom solution from the ground up. As a bonus, you still get to take full advantage of Alooma’s Code Engine, Restream Queue, and Mapper with these non-native integrations.

So let’s go over these three options:

  1. Push to Alooma’s REST Endpoint / Webhooks
  2. Pull from your data source using Alooma’s REST API Reader
  3. Automatically load from a periodically dumped file (e.g. CSV, JSON lines)

Push to Alooma’s REST Endpoint or leverage Webhooks

Pushing events directly to Alooma’s REST endpoint is a simple and incredibly fast way to ingest JSON events to your data warehouse. Our endpoint can handle high volumes of data (some of our customers are pushing billions of events per day this way) and it still allows you to utilize the Code Engine for transformations on the fly. This could be of particular importance since you may have little control over how the data is received from the Webhook and therefore basic functionality such as renaming fields, discarding fields, and more advanced transformations may be desirable.

For real-time use cases, this is your new best friend. This has been especially key for customers in industries like mobile apps, gaming or anyone looking to implement highly customizable real-time analytics. Events pushed to Alooma’s REST endpoint are loaded to your data warehouse in seconds, not hours or days. Segment, JIRA, and Stripe are examples of data sources that work well with this method.

An example of how our customer pulls data using a Webhook, is Chatbooks. Chatbooks gets customer NPS or Net Promoter Score data from Delighted. They then integrate this data with customer questions and feedback from HelpShift and Sales data. This made it possible to analyze how much customers with a high NPS score spend versus those that give a low score.

To push events to Alooma’s endpoint, we have SDKs for you to leverage or if it’s a SaaS application that has Webhooks capabilities, setup will take you less than 5 minutes:

non native data

Even if Webhooks is not available or you’ve already implemented a method to push JSON events you would prefer to continue using, that’s no problem. Alooma will provide you with a token and you can have total control over how events are pushed — all in real-time.

If you want to learn more about how Webhooks work, you can read this blog post.

Use Alooma’s REST API Reader to pull data from your sources

Sometimes our customers have data sources that are not commonly requested like Salesforce DMP or especially your own data through your internal APIs. That’s why they may not be on our roadmap quite yet. Many of these customers have large data teams and have asked "can we build an Alooma integration on our own?"

Well, it depends. If the data source has a REST API that can be queried, then most likely yes. The limitation to this is if any additional authentication or response request from the API is required. In these situations, contact us to see if we can figure out a good solution together.

Bankrate, an online financial rate comparison site, is an example of a customer using Alooma’s REST API Reader to pull data.

To set up the REST API Reader, configuration will require a few basic fields including the URL of the API you wish to pull from, any specific parameters the API is expecting, and the frequency you wish to query the API:

Non native data

Similar to pushing events to our REST endpoint, this method may leave you at the mercy of the data source as to how the data is formatted upon receiving it. As before, the Code Engine comes to the rescue to allow you to transform and enrich every single event on the fly. Keep in mind this occurs within Alooma’s hosted platform before the event is loaded to your data warehouse. So there is no negative performance impact on your data warehouse and your data is formatted exactly how you need it upon loading.

Automatically load from a periodically dumped file (e.g. CSV, JSON lines)

Finally, if your data source is not natively supported by Alooma and does not have a way to push or pull JSON events using a REST endpoint, a file-based approach could be an option.

Alooma automatically detects any new files in a file store such as Amazon S3, Google Cloud Storage, an FTP server, or Box. It can detect fields and create a schema for you in your data warehouse completely automatically if your files have headers.

Rover, an online network of pet sitters and dog walkers, is an example of a customer using CSV to pull in search analytics data. TrendKite, a PR analytics customer is using CSVs to pull in Google Drive data. Bankrate, mentioned earlier also uses JSON lines to pull in data from S3 buckets.

No headers? No problem. You can define headers in the Code Engine before the events hit the Mapper, like this:

import csv, StringIO

#The field names in the order they appear in the CSV file 
headers = ["field_1", "field_2", "field_3"]

def transform(event):
  string = event['message']
  metadata = event['_metadata']
  f = StringIO.StringIO(string)
  reader = csv.reader(f, delimiter=',')
  event = {}
  fields = list(reader)[0]
  event['data'] = dict(zip(headers[:len(fields)], fields))
  event['num_fields'] = len(fields)
  event['expected_fields'] = len(headers)
  event['_metadata'] = metadata
  return event

Keep in mind, this method works both for periodic file dumps or if you need just a one-time historical data load. Once you’ve loaded your historical data, you can continue to append new events with the Webhooks or REST API options.

Get Started

Now that you know you’re not limited to just our native input connectors, it’s exciting to think of all the possibilities for data ingestion. At Alooma, we want to empower customers with the flexibility and control to manage their data, just the way they need. So if you don’t see your source on our Integrations List, contact us to see if we can figure out how to leverage one of these three alternatives to get you started!

This might interest you as well