BeyeBLOGS | BeyeBLOGS Home | Get Your Own Blog

Main | How to design fact table for multicurrency column »

January 6, 2007

Find out how to achieve change data capture for Oracle 9i database without adding triggers on the source table

By: Milind Zodge

In the Data warehousing project you need to pull the data from different environments. The source can be different databases or even different data sources like combination of database with flat file. If the source is purely database chances are that the source and target database have different database versions even different kinds of databases like SQL Server, Oracle etc. In this article I am focusing on getting data from Oracle 9i database.

This article will help you in giving another way of pulling changed data without modifying the source table structure or without adding triggers on the source table. This article is meant for any Database developer, Data Warehouse developer, Data Warehouse Architect, Data Analysts, Managers or even ETL Architect, and, ETL Designer who wants to pull the changed data for their project.

This article is not covering the details of how to create materialized view log and materialized view and not covering the fundamentals of how materialized view and log works, it just explain in brief about these objects and how it is used in this solution. You can get more information on materialized view and materialized log from Oracle's web Site.

Overview
Consider a case of having an Oracle 9i as a source database and Oracle 10g as a target database and we want to pull only changed records from the source table. There are two ways we can do this, first, add modified and inserted date on the source table and use it in the ETL script to incrementally fetch and process the data. Second, add DML triggers on the source table to insert a record into a stage table. In both these cases you need to modify the table object. If you want to pull data from different systems, sometimes it can turn into a time consuming effort. What I mean by this, is, it may trigger series of meetings if you are going to modify the table structure or going to add triggers like in on the tables as most of the time, different departments in the company have their own schedule for developing the application or even for releasing new features. Since this is going to modify the object layout, it needs to be prioritized, and go thro the standard lifecycle of the project like impact analysis etc. All these required activities may take time, which will affect your project. Now if you are in fix and wants to get a changed data with out modifying the existing table structure or even don't want to add any triggers on the existing table then you will find this article helpful.

We needed to pull the data from different databases into Data Warehouse. All these databases had different versions so using Asynchronous CDC package feature of 10g was not an option. Adding triggers was a huge effort as its going to affect the online transaction processing system. So challenge was to figure out a way to so that an incremental load process can be developed for data warehouse load which will save tremendous processing time.
To overcome this problem we had two solutions, one to store the data in stage1, read the snapshot of data from the source system, compare it with the stage1 and load the changed or new records in stage2. Then use stage 2 to transform and load the data into Data Warehouse. This was again a costly effort and was not a scalable solution. The processing time with this solution will be more as more data gets loaded in the system.

Another solution was using materialized view log. This log will be populated by the transaction log and can be used in materialized views. It is a three step process. First step was performed in the source database and other two were performed on the target database.

Step 1: Creating a Materialized Log in the source database
Create a materialized log on the desired table. A materialized view log must be in the source database in the same schema as the table. A table can have only one materialized view log defined on it.
There are two ways you can define this log, either on rowid or primary key. This log's name will be MLOG$_table_name which is an underlying table. This log can hold primary key, row ids, or object ids can also have other columns which will support a fast refresh option of materialized view which will be created based on this log.
When data changes are made to master table data, Oracle will pull these changes to the Materialized log as defined. The function of this log is to log the DML activities performed on the used table.
E.g. CREATE MATERIALIZED VIEW LOG ON

WITH