Companies today have access to more data than ever before. And that data is growing — in both volume and variance — at a breakneck pace. Getting all that information, from all those sources, together into a cloud repository is crucial to business, whether it's traditional analytics or cutting-edge machine learning and artificial intelligence.
Migrating your data to the cloud is like moving to a new house. You need to know why you're leaving, what you're taking with you, and where you're going before you can plan the move.
What do you hope to get from the new place? Are you outgrowing your current setup? Looking to better fuel your data analysis or machine learning stack?
Are you looking to take advantage of all the cloud has to offer? Leveraging its architectural benefits? Its scalability and elasticity? Its security? Cost savings?
What are you bringing? Everything? Do a census: you may be surprised at just how much data — from so many sources — you have.
Of course a move as complicated as that isn't something you start on a whim. Rather, it takes significant planning and the right set of tools to be successful. Before we build our migration checklist, let's go through some common considerations.
In a world where data privacy is becoming the focal point of good data management practices, it’s important to fully understand what data is shareable and what data isn’t. Making sure that your PII (Personally Identifiable Information) data is not exposed to the wrong parties is crucial. Therefore it’s important to fully review the entire pipeline and database capabilities for securing data in motion and at rest. Knowing how data is staged, what systems can access it, and how long it's retained is critical in fully understanding your exposure.
Additionally, just as with traditional on-prem solutions, considerations should be made regarding how applications secure objects and data. For instance, whether there are role based access controls (RBAC) and whether data can be secured at the appropriate granularity (row or column level) are important things to consider. Is deidentification and masking of data a prerequisite for the platform and is delegated user administration a key factor for your operation?
In the end, knowing which data objects in your inventory of assets require special handling for regulatory or contractual purposes is a must. Ensuring your platform and applications can support those requirements is critical when starting your migration journey and planning process.
Having your data available and as current as possible is vital when migrating to the cloud. Data analysis can provide a business advantage, and it relies on data. Often on-prem solutions have been lovingly hand crafted and engineered specifically to meet the business’s UX requirements and SLAs. When migrating to the cloud, regardless of the approach you are using, it’s vital to fully vet and test the performance of your pipeline and database under a specific production workload. This is the only way to understand what and where performance bottlenecks can occur and how they can be resolved.
Establishing specific benchmarks and candidate workloads for continuous testing and monitoring is certainly a best practice. This can be done using cloud monitoring services and canary queries that constantly poll and validate that the platform is working optimally.
Managing current and future capacity is both an art and science in the world of on-prem solutions. It goes hand-in-hand with maintaining a consistent level of performance experienced by the consumers of your platform. One of the major draws of migrating to the cloud is realizing the benefits of elasticity and auto-scaling as your workload and capacity demands evolve.
When considering your future cloud architecture it is vital that the engineering required to respond to peak demand scenarios is built into the platform's capabilities upfront. Know what your peak needs will be, and make sure the cloud platform can handle them. It’s also critical to ensure you can continually test and monitor these peak demand scenarios.
Take advantage of the monitoring and reporting services native to cloud platforms. Orient your DevOps teams toward these technologies and build this competency up front and not as an afterthought of your migration.
Assets and Inventory Census
It’s imperative that before executing your cloud migration strategy, you capture a full inventory of all your data assets, what their dependencies are and what upstream and downstream applications support these assets. Creating and maintaining this inventory serves as the basis for all planned activities during the migration phase. Without this inventory it’s impossible to fully understand the impact, risk and cost associated with moving each component in your stack to the cloud.
Additionally, by going through this inventory and assessment exercise you often discover opportunities for refactoring and retiring redundant data sources and services. This should save you from wasting resources on items that have limited or no future value to your organization. You may also discover new data sources to migrate.
The focus for this guide is on data migration to the cloud, but you'll likely need to migrate applications as well. You can choose from several different approaches for migrating your applications. Your own particular needs and goals will determine which is best for you.
Do you want to move most of your applications to the cloud, but avoid making changes to them? This is typically a variation of the "lift and shift" approach, where your existing applications are run, as-is, in virtual machines hosted on the cloud and not customized or replaced. It's faster and less disruptive, but it also fails to take full advantage of all the cloud has to offer.
Another approach might be fully migrating your applications to a cloud-native environment. This is more expensive in terms of money and time, as your applications may need to be customized or even replaced, but at the end you'll have a much more robust system in place.
In summary, know your assets and their value. Know your security requirements and evaluate the exposure and risk against your planned target architecture. Ensure that you can achieve the same or better levels of service to your customers, and lastly, take advantage of native cloud architecture services for monitoring and capacity management by building into your migration strategy the necessary instrumentation needed to take your business to the next level.
Cloud migration checklist
Moving often requires tracking a lot of details, and it usually helps to have a list of the things you need to remember to do. Did you rent a truck? Did you change your mailing address? etc. When you're migrating to the cloud, you'll want a checklist.
- Establish team roles and define success factors. What does a successful migration look like for your situation, and who will be helping make that happen?
- Know what you’re moving, what the dependencies are and what the impact is to your application consumers. This is where you'll be glad you performed that data census and where you can take advantage of the migration to perform some spring cleaning, and lessen the cruft you'll be migrating. This is when you'll start to get a picture of the capacity you need now, and what you might need in the future.
- Perform a security assessment. Ensure data is protected and meets compliance or regulatory standards.
- Establish benchmarks and standards for performance before flipping the switch.
- Determine in advance how your DevOps teams will support your applications in the cloud and what cloud-native instrumentation and tooling you will be using.
- Update your SLAs to reflect the new environment and configuration.
- Ensure that your applications are taking advantage of native cloud capabilities for auto-scaling before you’ve migrated and that these capabilities are tested.
Of course, this is basically a jumping off point, a place for you to get started. Your actual implementation may be considerably more complex, or even simpler. Now that you know about the basics, you can consider adding a few additional steps to the process.
Options along the way
While migrating the data is the main concern, you should take advantage of the migration to perform any data transformation you may need.
Because your data will be coming from disparate sources and will involve different schemas, the migration is the ideal time to re-map data to a common schema. You may also want to enrich data or cleanse it, to make further analysis easier.
Most data pipelines offer some sort of transformation support, whether it's performed before or after loading into the cloud (ETL vs ELT).
Alooma fully supports transforming your data during or after loading. This way you can enrich your data or cleanse it prior to loading, and then take advantage of the compute power the cloud offers to perform various joins or other transformations within the cloud data warehouse. Want help getting started migrating your data to its new home in the cloud? Contact Alooma today!