BeyeBLOGS | BeyeBLOGS Home | Get Your Own Blog

« How to create Personal Dashboard using Oracle Portal | Main | Design technique for Date type columns in fact table for maximum performance »

May 5, 2007

CDC Technique for dimension table which is based on a multi-table query

By: Milind Zodge

In the Data warehousing project you have dimension and fact tables. Usually, if data is coming from a single table, we can use the approach what I have presented in the last article "Change data capture for Oracle 9i database without adding triggers on the source table".

There are also plenty of other options available like CDC, using timestamp etc. However the problems comes when you have a dimension table which is constructed based on a multi-table query. In this case none of the above approach can work directly.

Overview
Consider a case of Sales Representative dimension, this dimension is based several attributes like area, login etc. These attributes are coming from different tables. Now we will see what we can use to have an incremental update of this table.

The examples shown in this article is for Oracle database however same concept can be used for other database engines.

Step 1: Creating a Function which will return hash value
We will be using a hash value technique to compare the rows. Well we really have one more option, compare each field and see if any one of them is changed and that way determine the changed row.

However hash value method is faster than the above approach and code also become manageable with less conditional statements. Both methods are same though.

Create a function such that it will read a value as a text parameter and will return a hash value for it.

e.g.
FUNCTION salesrep_hashvalue (p_input_str VARCHAR2)
RETURN VARCHAR2 IS
l_str VARCHAR2(20);
BEGIN
l_str := dbms_obfuscation_toolkit.md5(input_string => p_input_str);
RETURN l_str;
END salesrep_hashvalue;

Step 2: Add a new column in the dimension table to hold a hashvalue
Create a new column "hashvalue" in the dimension table. And update its value by using the above created function using the required columns.

Make sure you use the same set of columns and in the same sequence in the ETL logic to create a hash value for a new row.

Step 3: Write ETL code
In the ETL code read the records from this multi-table SQL in a cursor loop. For each record find out the hash key value. Get the old hash key value by selecting the record from the dimension table using a key. If no record exist then insert the record. If record exists compare these tow hash keys if it is different update the record otherwise skip it.

Conclusion
This way you can achieve change data capture for a multi-table select statement query used for creation of a dimension table.

Posted by Milind Zodge at May 5, 2007 10:30 PM

Comments

It is excellent article ! I just love'd this !

Posted by: swarovskiemu at May 15, 2011 1:36 PM

of course like your web-site but you need to check the spelling on quite a few of your posts. Many of them are rife with spelling issues and I find it very troublesome to tell the truth nevertheless I will definitely come back again.

Posted by: Heathrow escorts at May 16, 2011 7:42 PM

Thank you for the auspicious writeup. It in fact was a amusement account it. Look advanced to far added agreeable from you! However, how could we communicate?

Posted by: escort London girls at May 16, 2011 11:15 PM

There is obviously a lot to identify about this. I believe you made various nice points in features also.

Posted by: Allegro.pl at May 17, 2011 10:14 AM

Post a comment




Remember Me?