Getting your MongoDB data into your Redshift data warehouse is the first step in setting up a powerful analytical workflow and getting valuable insights from your data. Typically, having that data together with data from your various other sources in Redshift delivers a compounding effect. The more data you have, the better your analysis, the better your insights.
Let's say you want to take advantage of the power and scalability of the cloud to improve the performance of your MongoDB queries. At first, the idea of moving data from MongoDB into Redshift may sound straight forward. You could use
mongoexport and push your data out into a .json file. Then you could load that into Redshift. But what if you have huge MongoDB tables with gigabytes of data? You'll need a secure and robust way to pipe that data into Redshift. That's likely to require a lot of time and specialized resources.
And usually around this point you realize that you might want to set up a way to repeat this task, maybe put together some script to call the export and schedule that via
cron. But that job is getting bigger and more complicated. Whether you want to perform the same migration into Redshift periodically, or you want to add different tables (or even different data sources besides MongoDB), you'll need someone to build a method for scheduling, tracking, and logging the process. And it will need to be scalable. Oh, and you'll need to make sure you have a way of catching and handling any errors that occur along the way.
You may also notice an opportunity to scrub PII (personally identifiable information) or enrich the data with things like geolocation data or currency conversion before the data is uploaded.
It's not an easy task, and unless you already have a seasoned team in place, you will need to train or hire someone who has the expertise to do the work. In reality, by the time you factor in security concerns, headcount, training, and technical complexity, you realize that you are, in essence, building your own ETL platform, just to extract your MongoDB data.
The solution: Alooma
We recommend that you don't build a custom ETL tool and take on all of the technical challenge and resource costs. The better solution is to use a modern ETL platform designed to move data from MongoDB (and other sources) into Redshift and make strategic transformations along the way.
Alooma is the enterprise data platform built for the cloud. With built-in support for MongoDB and Redshift, and bolstered by enterprise security and scalability, it's the ideal solution.
Importing your MongoDB data into Redshift
Getting your MongoDB data into Redshift is incredibly simple with Alooma. Let's break down the process.
Before you can create your MongoDB input in Alooma, you'll need to set things up so that MongoDB is writing to the OpLog and that Alooma has access. See this article for more information: https://support.alooma.com/hc/en-us/articles/360000714652-MongoDB-Setup
Once that's done, you can create your input in Alooma.
On the Plumbing page, click "Add new input" and select MongoDB from the list of integrations.
Name your input, and then enter your connection information:
That's all there is to it. If you have more MongoDB databases in your cluster, create an input for each database you want to import. Once you save your input, assuming your credentials are correct, your MongoDB data will automatically begin importing into Redshift. Once you have the initial snapshot loaded into Redshift, Alooma uses Change Data Capture (CDC) to replicate the data from your MongoDB cluster into your target data warehouse by tailing the MongoDB OpLog. See our MongoDB documentation for more information.
Of course, there's a lot more you could do along the way:
- You could use the Code Engine to transform/enrich/cleanse data as it flows from MongoDB to Redshift.
- You could change how the schema is mapped, via the Mapper; however, most of the time Alooma's powerful auto-mapping works just fine.
- You could click on the Live tab for your MongoDB input and monitor the data flow. Or click the Samples tab to see examples of the actual data being loaded.
Put Your Data to Work: Now that you have your MongoDB data in Redshift you can take advantage of the scaling and processing power of having your data in the cloud, boosting your query performance so you can get more out of your data. And this is just the beginning.
Bust Data Silos: Don't just work with data from MongoDB. Perform an information census and look for data silos within your company. Integrating multiple data sources into Redshift is straightforward and simple, and each new source — whether it's a stream, a database, a file, etc. — potentially increases the usefulness and impact of your analysis.
Automate the Process: Using an enterprise data platform means you can automate data extraction and transformation from multiple sources without having to build out your infrastructure.
Enterprise scalability and performance: The Alooma platform provides horizontal scalability, handling as many events from as many data sources as you need.
Security at the core: The Alooma platform is built around a robust and flexible security architecture, providing full visibility and control over data. SOC 2 Type II, HIPAA and EU-US Privacy Shield, GDPR compliant, Alooma does not store any data permanently and encrypts all data in motion.
Guaranteed data integrity and reliability: The Restream Queue, Alooma's intelligent data integrity engine, is your safety net against data loss. The Restream Queue collects all the events that were not loaded to Redshift, for whichever reason, making them easy to fix and enabling you to "restream" them into Redshift later.
Flexible data enrichment: The Code Engine, a stateful, python-based processing engine, enables on the fly data enrichment for sophisticated use cases, such as real-time alerts, sessionization, anomaly detection, and more. Customize your data exactly how you want by writing real code to transform data on the stream.
Simple yet powerful data management: The Mapper automatically infers schemas, maps schema changes, or enables customization of mappings to your liking, ensuring you meet all your data governance requirements.
Cost effective: You won't need to hire or train staff to build the process, saving time and money. You won't need to buy more machines or processing power as your data grows, and adding new data sources to import into Redshift is a breeze.
Ultimately, you want the process of getting insights from your data, regardless of the source or structure, to be as simple as possible. The fewer steps, the lower the cost, the better. And if you can scale up to get data from other sources thrown in without requiring custom coding or processes, you're even further ahead of the game. Taking advantage of the power and scalability of the cloud to store and process that data is the natural next step.
Alooma was designed and built for the cloud. We enable businesses to use all of their data to make better data-driven decisions, providing Data Scientists and Data Engineers the ability to integrate, cleanse, enrich, and bring together batch or streaming data from various data silos at any time to any destination.
Alooma makes the whole process of getting your MongoDB data into Redshift simple and affordable.
Ready to get started? Alooma is here to help. Contact Alooma today to learn more about how a MongoDB and Redshift integration solution can benefit your business.
MongoDB is a cross-platform, highly scalable document oriented NOSQL database, used by thousands of organizations, from startups to fortune 100.
AWS Redshift is a fast, fully-managed, petabyte-scale data warehouse service that makes it simple and cost-effective to efficiently analyze all your data using SQL and your existing BI toolset.