<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
<channel>
<title>Milind&apos;s Blog on BI and DW</title>
<link>http://www.beyeblogs.com/milinddwbi/</link>
<description>Blog for Data Warehouse and Business Intelligence architecture related topics.</description>
<language>en</language>
<copyright>Copyright 2008</copyright>
<lastBuildDate>Mon, 09 Jul 2007 21:00:00 -0700</lastBuildDate>
<generator>http://www.movabletype.org/?v=3.33</generator>
<docs>http://blogs.law.harvard.edu/tech/rss</docs> 


<item>
<title>What is metrics and what are the different types of metrics</title>
<description><![CDATA[<p>By: Milind Zodge</p>

<p><B><U>Overview</U></B><br />
Any BI application's main role is to show information based on some measurements. These measurements are metrics. E.g. you measure how much sale you have done, so total sales revenue is your metrics. In this article I am focusing on the types of metrics and when and how to use a proper one in the application.</p>

<p><B><U>Details</U></B><br />
There are three main types of metrics you can use in your application:</p>

<p>1.<U>Leading Indicators:</U> If you want to measure activities like how many touches are required to convert a prospect into a customer; leading indicator metrics are used, which measures activities. Generally these indicators will show how many calls/activities you need to do to achieve you goal.</p>

<p>2.<U>Lagging Indicators:</U> If you want to measure any business financial amounts like sales revenue; lagging indicators are used, which measures outcome of the activities. Generally these indicators will show where you stand currently.</p>

<p>3.<U>Key Performance Indicators (KPI):</U> If you want to see how is your performance and where you stand, is it good or bad then Key Performance Indicators are used which measure the performance. E.g. If you want to see how is sales revenue with respect to sales quota</p>

<p><B><U>Conclusion</U></B><br />
A proper metrics is used based on which application you are designing for. If it is a BAM-Business Activity Monitoring then Lagging Indicators or KPI will convey the information. </p>]]></description>
<link>http://www.beyeblogs.com/milinddwbi/archive/2007/07/what_is_metrics_and_what_are_t.php</link>
<guid>http://www.beyeblogs.com/milinddwbi/archive/2007/07/what_is_metrics_and_what_are_t.php</guid>
<category></category>
<pubDate>Mon, 09 Jul 2007 21:00:00 -0700</pubDate>
</item>

<item>
<title>Design technique for Date type columns in fact table for maximum performance</title>
<description><![CDATA[<p>By: Milind Zodge </p>

<p><B><U>Overview</U></B><br />
In the Data warehousing project you have dimensions and facts tables. And when you design a Data warehouse or Data Mart you come across many Date data type attributes. In this article I have pointed out a design technique for Date columns which gives highest performance.</p>

<p><B><U>Design</U></B><br />
Consider a data mart having a fact table "Order" with many columns like Order Number, Order Date, Shipped Date and Amount and a "Time" dimension which have an entry for each day. You use "time_id" for "Order Date" however most of the time "Shipped Date" is kept as a Date column. </p>

<p>Consider in your reporting system you want to design a report to report number of orders shipped in a particular year. Now you will have format the shipped date column so that you can compare its year portion to get result. If you have a massive fact table this query is going to take more time as it will not be using any index, well you can create index to solve this problem.</p>

<p>Now consider you have a report which report number of orders shipped in a particular month, day, quarter etc. To speed up this operation you will have create index probably more than one. However if we use the id column and index on that column then you can avoid the above problem</p>

<p>Add shipped_date_id column along with the Shipped Date column in the fact table. Derive the value by using Time dimension. So whenever you query you always use index.</p>

<p><B><U>Conclusion</U></B><br />
This way you can achieve maximum performance without adding more indexes. You can just go to your time dimension get the required ids and join it with your fact table which will use index defined on "shipped_date_id" column. <br />
</p>]]></description>
<link>http://www.beyeblogs.com/milinddwbi/archive/2007/06/design_technique_for_date_type.php</link>
<guid>http://www.beyeblogs.com/milinddwbi/archive/2007/06/design_technique_for_date_type.php</guid>
<category></category>
<pubDate>Wed, 20 Jun 2007 19:30:00 -0700</pubDate>
</item>

<item>
<title>CDC Technique for dimension table which is based on a multi-table query</title>
<description><![CDATA[<p>By: Milind Zodge </p>

<p>In the Data warehousing project you have dimension and fact tables. Usually, if data is coming from a single table, we can use the approach what I have presented in the last article "Change data capture for Oracle 9i database without adding triggers on the source table". </p>

<p>There are also plenty of other options available like CDC, using timestamp etc. However the problems comes when you have a dimension table which is constructed based on a multi-table query. In this case none of the above approach can work directly.</p>

<p><B><U>Overview</U></B><br />
Consider a case of Sales Representative dimension, this dimension is based several attributes like area, login etc. These attributes are coming from different tables. Now we will see what we can use to have an incremental update of this table.</p>

<p>The examples shown in this article is for Oracle database however same concept can be used for other database engines.</p>

<p><U>Step 1: Creating a Function which will return hash value </U><br />
We will be using a hash value technique to compare the rows. Well we really have one more option, compare each field and see if any one of them is changed and that way determine the changed row.</p>

<p>However hash value method is faster than the above approach and code also become manageable with less conditional statements. Both methods are same though.</p>

<p>Create a function such that it will read a value as a text parameter and will return a hash value for it.</p>

<p>e.g.<br />
FUNCTION salesrep_hashvalue (p_input_str VARCHAR2) <br />
RETURN VARCHAR2  IS<br />
   l_str VARCHAR2(20);<br />
BEGIN<br />
    l_str := dbms_obfuscation_toolkit.md5(input_string => p_input_str);<br />
    RETURN l_str;<br />
END salesrep_hashvalue;</p>

<p><U>Step 2: Add a new column in the dimension table to hold a hashvalue</U><br />
Create a new column "hashvalue" in the dimension table. And update its value by using the above created function using the required columns. </p>

<p>Make sure you use the same set of columns and in the same sequence in the ETL logic to create a hash value for a new row. </p>

<p><U>Step 3: Write ETL code</U><br />
In the ETL code read the records from this multi-table SQL in a cursor loop. For each record find out the hash key value. Get the old hash key value by selecting the record from the dimension table using a key. If no record exist then insert the record. If record exists compare these tow hash keys if it is different update the record otherwise skip it.</p>

<p><B><U>Conclusion</U></B><br />
This way you can achieve change data capture for a multi-table select statement query used for creation of a dimension table. <br />
</p>]]></description>
<link>http://www.beyeblogs.com/milinddwbi/archive/2007/05/cdc_technique_for_dimension_ta.php</link>
<guid>http://www.beyeblogs.com/milinddwbi/archive/2007/05/cdc_technique_for_dimension_ta.php</guid>
<category></category>
<pubDate>Sat, 05 May 2007 22:30:00 -0700</pubDate>
</item>

<item>
<title>How to create Personal Dashboard using Oracle Portal</title>
<description><![CDATA[<p>By: Milind Zodge </p>

<p><B><U>Overview</U></B><br />
I recently worked on creating a dashboard application which show contents based on who is logged in. In short that person will have his/her own personal dashboard.<br />
There are various ways to do this. One can use Virtual Private Databases technique to achieve this. What I am focusing in this article is a case where you don't want to create hundreds of database users which is a maintenance problem, rather using your company's existing  security structures. Let's consider for this purpose we have an active directory structure or Oracle Portal's Internet Directory structure. We have one database user and hundreds of internet users.</p>

<p><B><U>Design</U></B><br />
1. Create a table which will hold loginname, user role like "Regional Manager", "Area Manager", "Sales Rep", "Analyst" etc. and particular code like region code, area code, salesrep code<br />
2. Create a welcome page which will prompt for entering a password<br />
3. Once the user entered their network credential, capture the entered user id and call other page by passing the user id as a parameter as shown by the below MOD-PL/SQL code:<br />
HTML<br />
BODY<br />
ORACLE<br />
DECLARE<br />
  l_user_info varchar2(50) := sys_context('USERENV', 'CLIENT_INFO');<br />
  l_user varchar2(50):= SUBSTR(l_user_info, 3, INSTR(l_user_info, ',') - 3); -- This will extract the userid<br />
BEGIN<br />
  IF UPPER(l_user) = 'PUBLIC' THEN<br />
    NULL;<br />
  ELSE<br />
    IF UPPER(l_user) = 'PORTAL' THEN<br />
      NULL;<br />
    ELSE<br />
      Here you can call other page and pass the user id as a parameter<br />
    END IF;<br />
  END IF;<br />
END;<br />
/ORACLE<br />
/BODY<br />
/HTML<br />
4. Now you can use this parameter and table created in the first step to determine the level of the user and how to display the data using proper region code or area code or other ids</p>

<p><B><U>Conclusion</U></B><br />
This way you can create Personal Dashboards for your users without creating hundreds of database users. <br />
</p>]]></description>
<link>http://www.beyeblogs.com/milinddwbi/archive/2007/04/how_to_create_personal_dashboa.php</link>
<guid>http://www.beyeblogs.com/milinddwbi/archive/2007/04/how_to_create_personal_dashboa.php</guid>
<category></category>
<pubDate>Wed, 11 Apr 2007 22:15:00 -0700</pubDate>
</item>

<item>
<title>Different types of Dashbords, how to choose a required one</title>
<description><![CDATA[<p>By: Milind Zodge</p>

<p><B><U>Overview</U></B><br />
There are many times a question surface, which dashboard should we use. In this article I am focusing on what are the different types of Dashboards and how to choose a right one. For more detail reading please read <B>"Performance Dashboards"</B> book by "Wayne W. Eckerson".</p>

<p><B><U>Details</U></B><br />
There are two ways one can deliver the information. First more current snapshot of the data, second trends of data over a period of time window like 12 months.</p>

<p>There are following well known types of Dashboards: <br />
1. <U>Operational Dashboard:</U> This is simply known as Dashboard. If there is a need to report more current information like, where do I stand now with respect to my quota, this type of dashboard can be used. These dashboards usually have gauges to indicate the KPIs or even stop lights to indicate the current state.</p>

<p>2. <U>Strategic Dashboard:</U> This type is known as Scorecard. If there is a need to monitor a performance of certain activity over the period of several months/ day etc, these dashboards are well suited. These dashboards usually show graphs for a period of time which indicates the trend. E.g. A scorecard to report OTD% , here it doesn’t make any sense if you say OTD% is 80 this month. The question comes is, how is 80, good or bad, whether this is following a natural wave hence even if it is low it is fine. To answer these questions you need have trends instead of just current snapshot.</p>

<p>3. <U>Tactical Dashboard:</U> It is a combination of above which is delivered through a portal interface, which provides much flexibility of adding and removing parts. E.g. you want to show a trend as well also want to have alerts for some current activities.<br />
Most of the time you end up having two or more as your BI application, means combination of option1, 2 and 3 as an application.</p>

<p><B><U>Conclusion</U></B><br />
Information is effective if it is delivered in right format. Choosing a right dashboard type is crucial for the success of the BI system.<br />
</p>]]></description>
<link>http://www.beyeblogs.com/milinddwbi/archive/2007/03/deciding_which_type_of_dashboa.php</link>
<guid>http://www.beyeblogs.com/milinddwbi/archive/2007/03/deciding_which_type_of_dashboa.php</guid>
<category></category>
<pubDate>Sat, 03 Mar 2007 22:00:00 -0700</pubDate>
</item>

<item>
<title>How to use Oracle&apos;s Metadata package for impact analysis</title>
<description><![CDATA[<p>By: Milind Zodge</p>

<p><B><U>Overview</U></B><br />
Business is always changing and you have to make some changes based on the business change. </p>

<p>Before doing any change you want to perform an impact analysis. Most of the data modeling tools have provision to do it. I am focusing in the article how you perform this task if you don't have a tool.</p>

<p><B><U>Details</U></B><br />
Consider a case, we have Oracle database and wants to alter a column width and would like to see wherever this column is used/ referenced. </p>

<p>We can use Oracle's metadata package as indicated below</p>

<p>SET pagesize 0<br />
SET long 90000<br />
SET feedback off<br />
SET echo off </p>

<p>SELECT DBMS_METADATA.GET_DDL('TABLE',ut.table_name)<br />
     FROM USER_TABLES ut;</p>

<p>This will give DDL scripts for all the tables. Now you can use any text tool like Notepad to search for the required column and find out the references.</p>

<p><B><U>Conclusion</U></B><br />
There are various ways to do it. This is one of them. This will help you determining the impact exposure.<br />
</p>]]></description>
<link>http://www.beyeblogs.com/milinddwbi/archive/2007/02/how_to_use_oracles_metadata_pa.php</link>
<guid>http://www.beyeblogs.com/milinddwbi/archive/2007/02/how_to_use_oracles_metadata_pa.php</guid>
<category></category>
<pubDate>Sat, 24 Feb 2007 20:45:00 -0700</pubDate>
</item>

<item>
<title>Data Staging Strategy considerations</title>
<description><![CDATA[<p>By: Milind Zodge </p>

<p><B><U>Overview</U></B><br />
Whenever we start designing a Data Mart/ Data Warehouse environment first thing comes is staging area. In this article I am focusing on what different types of Data Staging Techniques are there and how to define a Data Staging Strategy.</p>

<p><B><U>Detail</U></B><br />
While defining the strategy you will have to focus on <br />
What technique you will be using? <br />
What kind of data load it will be, full data or incremental?<br />
Where the staging data will reside?<br />
Where should aggregation be performed?</p>

<p>There are following well-known techniques are available:<br />
1. Store and Forward: In this technique, a data is stored in staging area and then used for transformation and loading into Data Warehouse environment<br />
2. Direct Database insert/update: In this technique, a data is directly read from ODS and directly will be inserted or updated in the Data Warehouse environment</p>

<p>There are following Data Load types:<br />
1. <U>Full Data:</U> Here you use all the rows and update Data Warehouse environment with the data. This is time consuming process and the processing time will gradually increase because of data growth rate<br />
2. <U>Delta or Incremental:</U> Here you only get the changed/new records and you process these records so that information is passed to Data Warehouse environment</p>

<p>Types Staging Data Stores:<br />
1. <U>File:</U> Data can be placed in File. If more sorting operation needs to be performed then storing data in this format is beneficial<br />
2. <U>Table:</U> Staging data can be stored in the table either permanently or for some time till it gets published to the Data Warehouse area</p>

<p>Where to perform aggregation: If the aggregation is required by the Data Warehouse process. It can be either performed while loading the stage area or loading Data Warehouse area. Decide where you want to perform this.</p>

<p><B><U>Conclusion</U></B><br />
Thus considering all the sides you can prepare a good Data Staging Strategy.</p>

<p>Please refer to "Data Strategy" book by Sid, Larissa and Majid for more details.<br />
</p>]]></description>
<link>http://www.beyeblogs.com/milinddwbi/archive/2007/02/data_staging_strategy_consider.php</link>
<guid>http://www.beyeblogs.com/milinddwbi/archive/2007/02/data_staging_strategy_consider.php</guid>
<category></category>
<pubDate>Sat, 17 Feb 2007 20:45:00 -0700</pubDate>
</item>

<item>
<title>How to design if many to many relationship exists between dimension and fact table</title>
<description><![CDATA[<p>By: Milind Zodge </p>

<p><B><U>Overview</U></B><br />
While designing a data mart/ warehouse sometimes you  get cases where you find many-to-many relationship exists between dimension and fact table. Usually you have one-to-many relationship between dimension and fact table which is best for good OLAP/ cubes. In this article I am focusing on such M2M cases and how to design to resolve these cases.</p>

<p><B><U>Design</U></B><br />
Let’s consider a case. We have a time dimension, which have regular attributes, like year, month, date, day, WOY, DOW etc. And we have a fact table which have following date columns, schedule date, shipped date, order date and  promised date. <br />
As per the regular design we can either store these dates as it is or can store particular time_id from a Time dimension. If we store a time dimension id (surrogate key) we will create many to many relationship. If we don’t store ids we will not be able to use these columns easily for data analysis.<br />
To resolve this we can use a "Role Based Dimension" concept.  <br />
1. Create following views "schedule_date_vw", "shipped_date_vw" and "promised_date_vw" using the SELECT * FROM dim_time. This will create different roles of time dimension<br />
2. Now use these views as dimensions in your schema<br />
3. Add dim_time dimension table in the design <br />
4. Use surrogate keys from the above created views/ dimension along with dim_time and store it in the fact table</p>

<p><B><U>Conclusion</U></B><br />
This will resolve a problem of many to many relationship and can easily use in OLAP and cubes for better data analysis. <br />
</p>]]></description>
<link>http://www.beyeblogs.com/milinddwbi/archive/2007/02/how_to_design_if_many_to_many.php</link>
<guid>http://www.beyeblogs.com/milinddwbi/archive/2007/02/how_to_design_if_many_to_many.php</guid>
<category></category>
<pubDate>Sat, 03 Feb 2007 21:45:00 -0700</pubDate>
</item>

<item>
<title>How to design fact table for multicurrency column</title>
<description><![CDATA[<p>By: Milind Zodge </p>

<p><B><UOverview</U</B><br />
Most of the time we have companies doing business in more than one country. If you have KPIs which demands to do a comparison in original currency you will need to have a provision in your design. In this article I am focusing on this case. </p>

<p><B><U>Design</U></B><br />
There are two ways you can design this. <br />
1.	Add a column which will hold the conversion rate of that day/ date. So your fact table will have following columns:  local currency amount column, conversion rate, and can also use USD amount column which will hold the converted amount. This will solve the purpose of recording the conversion rate. However if you want to slice and dice the data based on the different currencies and different point in time, you can design using the next approach.<br />
2.	Create a Currency Dimension which will hold the conversion rates for every day. Now in fact table store the respective surrogate key along with other amount column. Now you can design an analysis cube to use this dimension. </p>

<p><B><U>Conclusion</U></B><br />
This way you can achieve maximum performance for your queries or reports or cubes to analyze the data with respect to a daily changing rate. <br />
</p>]]></description>
<link>http://www.beyeblogs.com/milinddwbi/archive/2007/01/how_to_design_fact_table_for_m.php</link>
<guid>http://www.beyeblogs.com/milinddwbi/archive/2007/01/how_to_design_fact_table_for_m.php</guid>
<category></category>
<pubDate>Sat, 27 Jan 2007 22:45:00 -0700</pubDate>
</item>

<item>
<title>Find out how to achieve change data capture for Oracle 9i database without adding triggers on the source table</title>
<description><![CDATA[<p>By: Milind Zodge</p>

<p>In the Data warehousing project you need to pull the data from different environments. The source can be different databases or even different data sources like combination of database with flat file. If the source is purely database chances are that the source and target database have different database versions even different kinds of databases like SQL Server, Oracle etc. In this article I am focusing on getting data from Oracle 9i database. </p>

<p>This article will help you in giving another way of pulling changed data without modifying the source table structure or without adding triggers on the source table. This article is meant for any Database developer, Data Warehouse developer, Data Warehouse Architect, Data Analysts, Managers or even ETL Architect, and, ETL Designer who wants to pull the changed data for their project. </p>

<p>This article is not covering the details of how to create materialized view log and materialized view and not covering the fundamentals of how materialized view and log works, it just explain in brief about these objects and how it is used in this solution. You can get more information on materialized view and materialized log from Oracle's web Site. </p>

<p><B><U>Overview</U></B><br />
Consider a case of having an Oracle 9i as a source database and Oracle 10g as a target database and we want to pull only changed records from the source table. There are two ways we can do this, first, add modified and inserted date on the source table and use it in the ETL script to incrementally fetch and process the data. Second, add DML triggers on the source table to insert a record into a stage table. In both these cases you need to modify the table object. If you want to pull data from different systems, sometimes it can turn into a time consuming effort. What I mean by this, is, it may trigger series of meetings if you are going to modify the table structure or going to add triggers like in on the tables as most of the time, different departments in the company have their own schedule for developing the application or even for releasing new features. Since this is going to modify the object layout, it needs to be prioritized, and go thro the standard lifecycle of the project like impact analysis etc. All these required activities may take time, which will affect your project. Now if you are in fix and wants to get a changed data with out modifying the existing table structure or even don't want to add any triggers on the existing table then you will find this article helpful.</p>

<p>We needed to pull the data from different databases into Data Warehouse. All these databases had different versions so using Asynchronous CDC package feature of 10g was not an option. Adding triggers was a huge effort as its going to affect the online transaction processing system. So challenge was to figure out a way to so that an incremental load process can be developed for data warehouse load which will save tremendous processing time.<br />
To overcome this problem we had two solutions, one to store the data in stage1, read the snapshot of data from the source system, compare it with the stage1 and load the changed or new records in stage2. Then use stage 2 to transform and load the data into Data Warehouse. This was again a costly effort and was not a scalable solution. The processing time with this solution will be more as more data gets loaded in the system.</p>

<p>Another solution was using materialized view log. This log will be populated by the transaction log and can be used in materialized views.  It is a three step process. First step was performed in the source database and other two were performed on the target database.</p>

<p><U>Step 1: Creating a Materialized Log in the source database </U><br />
Create a materialized log on the desired table. A materialized view log must be in the source database in the same schema as the table. A table can have only one materialized view log defined on it. <br />
There are two ways you can define this log, either on rowid or primary key. This log's name will be MLOG$_table_name which is an underlying table. This log can hold primary key, row ids, or object ids can also have other columns which will support a fast refresh option of materialized view which will be created based on this log.<br />
When data changes are made to master table data, Oracle will pull these changes to the Materialized log as defined. The function of this log is to log the DML activities performed on the used table. <br />
E.g. CREATE MATERIALIZED VIEW LOG ON <table name> WITH <option like OBJECT ID, PRIMARY KEY or ROWID</p>

<p><U>Step 2: Creating a Materialized View in Target Database <br />
        using this log</U></p>

<p>Create a Materialized view based on the above created materialized log. Materialized view is a replica of the desired table. This is like a table and needs to be refreshed periodically. You can define the needed refresh frequency to fast refresh this view based on the materialized log in the target database. <br />
Whenever a DML operation is performed, on the defined table that activity will be recorded in the log which is in the Oracle 9i database, in the source system. Now we have a materialized view defined on this log in our system, Oracle 10g, which is target system. This view will only pull in the changes as defined in the log. These changes will be applied to the rows. One can define a desired frequency of refreshing this view. This process doesn't create any physical trigger, however there is a little overhead, as database has to store the row in the defined log table whenever a commit is issued. </p>

<p><U>Step 3: Writing triggers on Materialized View</U></p>

<p>As we know materialized views are like a table, hence we can write triggers on it. In prior two steps we saw how the changed data is pulled from the source system and will be loaded in the materialized view defined in the target system. Now the question is how to use this view to determine the changes. For this purpose we will write database triggers on this materialized view, triggers like after insert, after update and after delete. <br />
These triggers will capture which operation was performed on the row. Now we will define a new table having same structure as of staging/target table with few additional columns. First, an indicator of which operation is done, whether it is insert/update or delete. Then a sequence number, this is important as you may get a row which is a new row and also got modified in the same time window. This time the sequence number will tell the sequence of the activity.</p>

<p>Now whenever a DML operation is performed on the table, the log will get refreshed by the new information based on the defined frequency, then materialized view will be refreshed with the new information based on the information available in materialized log. Appropriate trigger will be fired based on the operation performed on the data row. This trigger will create a new record in the staging table with appropriate operation mode like: I for Insert, U for Update and D for Delete with the activity sequence number.</p>

<p><B><U>How this works</U></B><br />
Whenever a data is changed or added to the source table, a materialized log captures that information. Based on the refresh frequency, materialized view will be refreshed using the log. During refresh, it will insert new records in the view and will update the existing records. During this DML operation, DML triggers will be activated and will insert a row into the stage table, which can be further used to transfer the data into Data Warehouse or Data mart. </p>

<p><B><U>Conclusion</U></B><br />
No matter what you do, there will be some overhead on the database. The discussed solution has some overhead too; however it is a nice handy alternative solution to pull the data.<br />
</p>]]></description>
<link>http://www.beyeblogs.com/milinddwbi/archive/2007/01/find_out_how_to_achieve_change_1.php</link>
<guid>http://www.beyeblogs.com/milinddwbi/archive/2007/01/find_out_how_to_achieve_change_1.php</guid>
<category></category>
<pubDate>Sat, 06 Jan 2007 22:30:00 -0700</pubDate>
</item>


</channel>
</rss>