The Oracle data warehouse is the market leader in enterprise data warehouses. Even as the adoption of cloud technologies continues to grow, and enterprises are beginning to consider migrating to the cloud, Oracle will remain a leader in on-prem data warehouses for several years to come. Enterprise cloud migrations are huge projects and migrating an Oracle data warehouse is a major challenge. In this two post feature, we will first focus on Oracle: a brief history of data warehousing technology, Oracle’s internal replication mechanism, the LogMiner. And in the second post, how Alooma facilitates the first steps of migrating your data to the cloud.
Oracle has been the leader of data warehousing solutions for the past three decades, with numerous achievements under their belt: they were the first to release a commercially available RDBMS system (they even beat IBM to it); they invented PL/SQL, making database automation productive and widely used; and they were the first database to make the leap to a 64-bit architecture. Until today’s cloud era, there were very few alternative solutions to storing gigabytes of historical business data. But now, the landscape has changed. The problem transformed from gigabytes to terabytes, from weekly reports to real time decisions, and from business data to any kind of data: ads tracking, event tracking, application logs, and CRM data — just to name a few. This evolution is causing enterprises to seek other solutions.
In 2013, AWS debuted their cloud based columnar data warehouse, Redshift. Its cost was about a tenth of any other competing, non-cloud, data warehouse. It had the fastest adoption rate of any AWS product at the time. Redshift marked a big change, making cloud data warehouse migration very attractive: fully managed infrastructure, zero downtime, petabyte scale analytics within a minute’s reach, and all that at a price any SMB could afford. Many enterprises started reevaluating their data warehouses and continue to do so as new data warehouses are released by the major cloud providers. Oracle might be late to the cloud game, but it is definitely planning to catch up and to provide a cloud data warehouse service as well as a cloud data integration service. From our perspective, 2017 marked the year when real enterprises began to consider migrating to the cloud, despite the numerous difficulties.
Migrating a data warehouse to the cloud is still a very difficult process and before committing to it, questions regarding TCO, data variety, and data velocity need to be addressed. We described in length many aspects of such decisions in a previous post. In sum, this migration process is not a one-time data transfer. It includes:
- Setting up a continuous synchronization between the on-prem data warehouse and the cloud data warehouse, often requiring multiple schema adjustments
- Migrating the BI platform, including existing queries, dashboards, and reports
- Migrating, rewriting or replacing all of the custom in-house data applications
- Training all employees on using and maintaining the new infrastructure
- Migrating the remaining ETL processes and retiring the legacy data warehouse
Depending on how large your enterprise is, this migration can take up to a year.
The first step, setting up a continuous synchronization process, is probably the hardest. This is where Alooma can help and this is what this blog post is all about. We have just recently released a completely new version of our Oracle data warehouse integration.
Alooma’s new Oracle integration reads directly from the Oracle LogMiner, which provides a view into the database change log: inserted, updated, and deleted records, as well as schema changes. Reading from the LogMiner enables Alooma to continuously replicate all changes to your Oracle data, including updates and deletes, providing you with an exact replica on a cloud data warehouse. To understand how this works, let's dive deeper into how the Oracle LogMiner works, and how the Alooma integration leverages it to build stable foundations for a cloud migration.
How Oracle LogMiner works
The Oracle LogMiner can be used for various needs: tracing data corruption to the malformed transaction, restoring a table to a past state, auditing changes made to data and schema, and of course replicating data. The LogMiner mechanism is composed of several sub components. Let's go over these components:
- The Redo Logs are files which record all of the changes that are made to the source database. When starting a LogMiner session (see stored procedures below), you may specify a set of redo logs to read. By default, the redo logs do not contain the actual data, and for the LogMiner to work, Supplemental Logging needs to be enabled. In addition, the redo logs contain schema information as table and column IDs, without the actual names.
- The LogMiner Dictionary is a map translating the schema, table, and column IDs in the redo logs to the actual names (and column types). The dictionary can be configured to be read dynamically, as the LogMiner view is accessed, or statically from a previous dump to the redo logs. In both modes, it is a challenge to handle schema changes, keeping the dictionary in-sync.
- The Mining Database is where the LogMiner sessions actually run. Most of the time this would be your primary data warehouse, but in cases where resources are limited, you may set up an Oracle replica, and use it as the Mining Database.
- The LogMiner Contents View is a special view which exposes the information in the redo logs in a queryable format. This view is called `V$LOGMNR_CONTENTS`. This view is the actual LogMiner. It includes the following columns:
- OPERATION - the type of operation (insert, update, etc.)
- SCN - theSystemChangeNumber at which the change was made
- CSCN - the SCN indicating when the change was committed (only available when reading committed transactions only)
- RS_ID, SSN - the record set identifier and the SQL sequence number, which together, uniquely identify a row in the view
- SEG_OWNER, SEG_NAME - the schema and table name (or IDs, if no dictionary was loaded) of the modified object
- USERNAME - the user who performed the operation
- SQL_REDO, SQL_UNDO - SQL statements to redo or undo the operation. Note that to replicate the changes to a non-Oracle warehouse, these SQL statements need to be parsed and adjusted.
- The LogMiner Stored Procedures are used for starting and ending a LogMiner session, as well as for specifying various configurations like what kind of dictionary to use. The stored procedures are:
- DBMS_LOGMNR.START_LOGMNR - starts a LogMiner session and enables reading from the contents view
- DBMS_LOGMNR.ADD_LOGFILE - manually specify which redo logs to include in the LogMiner session
- DBMS_LOGMNR_D.BUILD - build a LogMiner dictionary and store it to a redo log file
- DBMS_LOGMNR.END_LOGMNR - finish a LogMiner session
As you can see, the LogMiner is a powerful mechanism, but requires a skilled DBA to operate it. Oracle has great documentation for diving deeper on how to use the various mechanisms for multiple use cases. In 2009, Oracle acquired a company which built the GoldenGate product. GoldenGate provides data replication between Oracle, DB2, MySQL, SQLServer, and Teradata. It can be used to migrate data between any pair of those databases and Oracle also provides it as a service in its Oracle cloud.
In sum, Oracle has been a pioneer and leader in data warehouses (not to mention MySQL and Java). We cannot be certain, but with Oracle’s strong cloud directives, they may catch up and regain leadership once again. If you are heavily vested in Oracle software and are planning to migrate to the Oracle cloud, it will probably be best to check out Oracle Goldengate. If not, you might want to read part 2, describing Alooma’s solution to replicating your Oracle data warehouse to the cloud data warehouse of your choice.