How to Load MongoDB Data Into Snowflake

by Garrett Alley  
6 min read  • 28 Sep 2018

Getting your MongoDB data into your Snowflake data warehouse is the first step in setting up a powerful analytical workflow and getting valuable insights from your data. Typically, having that data together with data from your various other sources in Snowflake delivers a compounding effect. The more data you have, the better your analysis, the better your insights.

The problem

Let's say you want to take advantage of the power and scalability of the cloud to improve the performance of your MongoDB queries. At first, the idea of moving data from MongoDB into Snowflake may sound straight forward. You could use mongoexport and push your data out into a .json file. Then you could load that into Snowflake. But what if you have huge MongoDB tables with gigabytes of data? You'll need a secure and robust way to pipe that data into Snowflake. That's likely to require a lot of time and specialized resources.

And usually around this point you realize that you might want to set up a way to repeat this task, maybe put together some script to call the export and schedule that via cron. But that job is getting bigger and more complicated. Whether you want to perform the same migration into Snowflake periodically, or you want to add different tables (or even different data sources besides MongoDB), you'll need someone to build a method for scheduling, tracking, and logging the process. And it will need to be scalable. Oh, and you'll need to make sure you have a way of catching and handling any errors that occur along the way.

You may also notice an opportunity to scrub PII (personally identifiable information) or enrich the data with things like geolocation data or currency conversion before the data is uploaded.

It's not an easy task, and unless you already have a seasoned team in place, you will need to train or hire someone who has the expertise to do the work. In reality, by the time you factor in security concerns, headcount, training, and technical complexity, you realize that you are, in essence, building your own ETL platform, just to extract your MongoDB data.

The solution: Alooma

We recommend that you don't build a custom ETL tool and take on all of the technical challenge and resource costs. The better solution is to use a modern ETL platform designed to move data from MongoDB (and other sources) into Snowflake and make strategic transformations along the way.

Alooma is the enterprise data platform built for the cloud. With built-in support for MongoDB and Snowflake, and bolstered by enterprise security and scalability, it's the ideal solution.

Importing your MongoDB data into Snowflake

Getting your MongoDB data into Snowflake is incredibly simple with Alooma. Let's break down the process.

Before you can create your MongoDB input in Alooma, you'll need to set things up so that MongoDB is writing to the OpLog and that Alooma has access. See this article for more information: https://support.alooma.com/hc/en-us/articles/360000714652-MongoDB-Setup

Once that's done, you can create your input in Alooma.

On the Plumbing page, click "Add new input" and select MongoDB from the list of integrations.

mongodb to snowflake alooma plumbing screen

Name your input, and then enter your connection information:

mongodb to snowflake configuration screen

That's all there is to it. If you have more MongoDB databases in your cluster, create an input for each database you want to import. Once you save your input, assuming your credentials are correct, your MongoDB data will automatically begin importing into Snowflake. Once you have the initial snapshot loaded into Snowflake, Alooma uses Change Data Capture (CDC) to replicate the data from your MongoDB cluster into your target data warehouse by tailing the MongoDB OpLog. See our MongoDB documentation for more information.

mongodb to snowflake flow

Of course, there's a lot more you could do along the way:

  • You could use the Code Engine to transform/enrich/cleanse data as it flows from MongoDB to Snowflake.
  • You could change how the schema is mapped, via the Mapper; however, most of the time Alooma's powerful auto-mapping works just fine.
  • You could click on the Live tab for your MongoDB input and monitor the data flow. Or click the Samples tab to see examples of the actual data being loaded.

What's next?

Put Your Data to Work: Now that you have your MongoDB data in Snowflake you can take advantage of the scaling and processing power of having your data in the cloud, boosting your query performance so you can get more out of your data. And this is just the beginning.

Bust Data Silos: Don't just work with data from MongoDB. Perform an information census and look for data silos within your company. Integrating multiple data sources into Snowflake is straightforward and simple, and each new source — whether it's a stream, a database, a file, etc. — potentially increases the usefulness and impact of your analysis.

Automate the Process: Using an enterprise data platform means you can automate data extraction and transformation from multiple sources without having to build out your infrastructure.

Benefits

Enterprise scalability and performance: The Alooma platform provides horizontal scalability, handling as many events from as many data sources as you need.

Security at the core: The Alooma platform is built around a robust and flexible security architecture, providing full visibility and control over data. SOC 2 Type II, HIPAA and EU-US Privacy Shield, GDPR compliant, Alooma does not store any data permanently and encrypts all data in motion.

Guaranteed data integrity and reliability: The Restream Queue, Alooma's intelligent data integrity engine, is your safety net against data loss. The Restream Queue collects all the events that were not loaded to Snowflake, for whichever reason, making them easy to fix and enabling you to "restream" them into Snowflake later.

Flexible data enrichment: The Code Engine, a stateful, python-based processing engine, enables on the fly data enrichment for sophisticated use cases, such as real-time alerts, sessionization, anomaly detection, and more. Customize your data exactly how you want by writing real code to transform data on the stream.

Simple yet powerful data management: The Mapper automatically infers schemas, maps schema changes, or enables customization of mappings to your liking, ensuring you meet all your data governance requirements.

Cost effective: You won't need to hire or train staff to build the process, saving time and money. You won't need to buy more machines or processing power as your data grows, and adding new data sources to import into Snowflake is a breeze.

Get your MongoDB data into Snowflake today

Ultimately, you want the process of getting insights from your data, regardless of the source or structure, to be as simple as possible. The fewer steps, the lower the cost, the better. And if you can scale up to get data from other sources thrown in without requiring custom coding or processes, you're even further ahead of the game. Taking advantage of the power and scalability of the cloud to store and process that data is the natural next step.

Alooma was designed and built for the cloud. We enable businesses to use all of their data to make better data-driven decisions, providing Data Scientists and Data Engineers the ability to integrate, cleanse, enrich, and bring together batch or streaming data from various data silos at any time to any destination.

Alooma makes the whole process of getting your MongoDB data into Snowflake simple and affordable.

Ready to get started? Alooma is here to help. Contact Alooma today to learn more about how a MongoDB and Snowflake integration solution can benefit your business.

About MongoDB

MongoDB is a cross-platform, highly scalable document oriented NOSQL database, used by thousands of organizations, from startups to fortune 100.

About Snowflake

Snowflake is an analytic data warehouse provided as Software-as-a-Service (SaaS). Snowflake provides a data warehouse that is faster, easier to use, and far more flexible than traditional data warehouse offerings.

Like what you read? Share on

Get your data flowing

Contact us to start using Alooma for free

Get Started

This might interest you as well

Schedule a free demo!

We'll show you how Alooma can integrate all of your data sources in minutes.