Export Application Insights to Amazon Redshift with Alooma

by Itamar Weiss  
14 min read  • 16 Nov 2015

Visual Studio Application Insights is a monitoring and diagnostics solution for web applications in production. It helps you detect and diagnose application failures, availability and performance issues, and understand what users actually do with your app. It's designed for developers, to help you continuously improve the performance, reliability and usability of your application.

Application Insights works with apps on a wide variety of platforms: .NET or J2EE, hosted on-premises or in the cloud. The SDK for each platform includes a range of modules that monitor the app straight out of the box. In addition, you can code your own telemetry for more detailed and tailored analytics. Telemetry data collected from your application is stored and analyzed in the Azure Portal, enabling intuitive views and powerful tools for fast diagnosis and analysis.

More value from your telemetry data

Application Insights collects a wealth of data about your application and enables you to analyze it. However, in many cases, you want to do more to realize value captured in this data. For example:

  • Run custom queries and build custom dashboards to answer questions that are specific to your business.
  • Use the business intelligence and visualization tools of your choice.
  • Build and train machine learning models.
  • Join your telemetry data with other data sources, such as billing data, marketing data and transactional databases to get a holistic view of your business.

These scenarios (and more) can be enabled by moving your application telemetry into your own data warehouse. Application Insights enables exporting the data it collects, but you still need a pipeline to process, aggregate and load it. Alooma enables just that.

About Alooma

Alooma lets you build data pipelines in minutes.

Alooma's SaaS service allows you to create scalable, fault-tolerant real-time connections from every data source you have today or will have in the future, directly into data warehouses such as Amazon Redshift. We believe you should be able to leverage your data without worrying about how to manage the data.

With Alooma, collect and transmit data from any source, in any format, at any scale into Amazon Redshift and other data warehouses:

  • Easily integrate data sources - Provide credentials for the data source and your data will start flowing in seconds. Use one of our dozens of built-in connectors, or use our APIs to build your own. New connectors are added on a regular basis to make sure all of the popular data sources are covered.
  • Transform your data on the fly - In-stream transformations allow you to filter, clean, fix, and enrich your data so that it lands in your Redshift ready for analysis, all in real-time.
  • Control how your data is stored - Let Alooma automap your structured and semi-structured data to Redshift tables for you, or use our data exploration tools to guide you to the optimal mapping.
  • Intelligently handle changing schemas - Get notified when a new field appears in your data, or when an existing field appears in a different format. Events from the new schema will be kept for you until you decide how you would like them to load to Redshift - no data gets lost.
  • Forget about infrastructure - Our fault-tolerant distributed architecture ensures you will never lose any data or have duplicates, even in cases of failure.
  • Transparent monitoring - Monitor the performance of your pipeline with our built-in dashboards or build your own over our metrics API.

In this article:

This article describes how to move your telemetry from Visual Studio Application Insights into the AWS Redshift data warehouse by using Continuous Export and Alooma.

Continuous export moves your telemetry into Azure Storage in JSON format. We’ll use Alooma to continuously read from Azure Storage, clean the JSON objects using in-stream Python code, and load them to Redshift in real-time.

Continuous Export is the way to do your own analysis of the telemetry your apps send to Application Insights. You could adapt this example to do other things with the exported telemetry, such as joining the Application Insights data with data from other sources for analysis.

We’ll start with the assumption that you already have the app you want to monitor, and the Redshift cluster that you would like to use.

In this example, we will be using the page view data, but the same pattern can easily be extended to other data types such as custom events and exceptions.

Set up continuous export to Azure storage

You can use the following article to set up continuous export of Application Insights data. In a nutshell, you need to:

  1. Create a "classic" storage account in your subscription
  2. Create a storage container to hold exported telemetry
  3. Configure export in Application insights to export specific telemetry types to the Azure storage Configure Microsoft Application Insight continuous export Configure Microsoft Application Insight continuous export

Create a Redshift cluster

If you don’t already have your Redshift cluster, you can find instructions in AWS Redshift getting started guide.

Connect Alooma to your Redshift

When you first log in to your Alooma system, you will receive a message asking you to connect to your Redshift. Enter the Redshift configuration you defined upon the creation of your Redshift cluster and click on ‘Finish’.

Connect Alooma to your Redshift

Add a new input on Alooma

Click on “Add new input”: Plumbing screen

Choose the Azure input: Add Microsoft Application Insights as an Azure input

Name your input (e.g. App_Insights): name your Microsoft Application Insights input

Provide your Azure Storage credentials: Provide your Azure Storage credentials

If your data is found in a specific directory inside the container or has a specific prefix, you can specify the directory and/or prefix in the File Prefix text field.

You can also choose between only importing new files, or importing all files. When importing new files, only files that were added to the container after adding the input will be streamed. When importing all files, all the files in the container will be imported, and new files will be streamed as they arrive.

In this example, we will only import the page view data from Application Insights, therefore we will choose the <app_directory>/PageViews/ folder as the file prefix.

Provide your Azure Storage credentials

After saving your new input you will see a new box on your Plumbing screen and your data will start streaming into the system.

Note: If you have many files in your container it may take a few minutes before the system starts streaming your data.

Update the transform code

Each Alooma data pipeline has a component called “Transform”, that allows you to write Python code that runs on every event in your event stream at scale. Using the transform code, you can filter, clean and enrich the events in your data stream.

To be able to effectively map the Application Insights events to Redshift tables, we need to first simplify their structure. The original Application Insights events have fields that contain arrays of length 1. We will write simple Python code that flattens these arrays to simple objects.

Click on the “Transform” icon on the left sidebar, paste the code below, and click on Deploy.

def transform(event):
  event['view'] = event['view'][0]
  del event['context']['custom']
  del event['context']['application']
  return event

The transform function runs on every event in the stream, and in this case, flattens an array of length 1 and removes both the custom and application fields for simplicity. Avoiding the arrays allows us to more easily map the event fields to Redshift columns in the next step. If you are using custom attributes or need your ‘application’ data (e.g. version) you should consider flattening the custom and application fields as well.

In-stream Python transform code

You can test how the code runs on your own events by clicking on ‘Get sample’ and then ‘Run Test’:

Map

In this step we will create a mapping between the page view events to a Redshift table. We will map each field in the event to a Redshift column.

Click on the ‘Mapper’ icon on the left sidebar to go to the mapping screen.

Mapping screen

Map fields to Redshift columns

Click on the event type ‘App_Insights’ and then on “Auto mapping”. Auto mapping automatically chooses Redshift-compatible column names for each field and uses the field statistics to determine the optimal column type.

All the event fields should now appear in green (mapped).

Mapping screen

Create a new Redshift table

To finalize your mapping, you need to connect your mapping to an actual table. Click on ‘Select table’ and then on ‘Create a table’. The ‘Create new table’ dialog will open. Note that when creating a new table, it is important to carefully choose the distribution key and sort key. You can find more details about Redshift best practices in this guide.

Provide a name for your table (e.g. app_insights_pageviews). A suggested distribution key is ‘context_user_anon_id’ and a suggested sort key is ‘context_data_event_time’.

Click on ‘Create table’ to create the table in your Redshift cluster. Then, click on ‘Apply’ to apply the mapping we created at the previous step.

Your pipeline is now ready. All new page view events will automatically load to your Redshift.

Create a Redshift table

Monitor your pipeline

Click on the ‘Dashboard’ sidebar icon to go to Alooma’s dashboard. Here you can monitor both the health of your pipeline in real-time, by watching the throughput and latency, and its historical health and performance, by watching the events chart.

Monitor your data pipeline

Restream

The “restream queue” contains events that either did not have a mapping or encountered an error in the Python transform.

Go to the plumbing screen by clicking on the ‘Plumbing’ icon on the left sidebar. On the upper right corner you will see that you have events waiting to restream. In this example, the events in the restream queue are events that arrived to the system before your mapping was applied. Now that a mapping exists, they can be re-sent through the pipeline. Click on ‘Start Restream’ and the events will flow through the system and load into Redshift.

Monitor your data pipeline

Analyze

Now that your data is flowing to your Redshift, you can connect the business intelligence tool of your choice to analyze your data and build real-time dashboards.

Here are a few simple examples of dashboards we built on top of the page-view data.

Dashboard events per hour Dashboard visitors per country Dashboard new vs returning visitors

That's it! Your Microsoft Application Insights data is now streaming to your AWS Redshift cluster. Plumb away!

Like what you read? Share on

Get your data flowing

Contact us to start using Alooma for free

Get Started

This might interest you as well

Schedule a free demo!

We'll show you how Alooma can integrate all of your data sources in minutes.