BeyeBLOGS | BeyeBLOGS Home | Get Your Own Blog

« January 2007 | Main | March 2007 »

February 24, 2007

How to use Oracle's Metadata package for impact analysis

By: Milind Zodge

Overview
Business is always changing and you have to make some changes based on the business change.

Before doing any change you want to perform an impact analysis. Most of the data modeling tools have provision to do it. I am focusing in the article how you perform this task if you don't have a tool.

Details
Consider a case, we have Oracle database and wants to alter a column width and would like to see wherever this column is used/ referenced.

We can use Oracle's metadata package as indicated below

SET pagesize 0
SET long 90000
SET feedback off
SET echo off

SELECT DBMS_METADATA.GET_DDL('TABLE',ut.table_name)
FROM USER_TABLES ut;

This will give DDL scripts for all the tables. Now you can use any text tool like Notepad to search for the required column and find out the references.

Conclusion
There are various ways to do it. This is one of them. This will help you determining the impact exposure.

Share: del.icio.us Digg Furl ma.gnolia Netscape Newsvine reddit StumbleUpon Yahoo MyWeb  

Posted by Milind Zodge at 8:45 PM | Comments (3)

February 17, 2007

Data Staging Strategy considerations

By: Milind Zodge

Overview
Whenever we start designing a Data Mart/ Data Warehouse environment first thing comes is staging area. In this article I am focusing on what different types of Data Staging Techniques are there and how to define a Data Staging Strategy.

Detail
While defining the strategy you will have to focus on
What technique you will be using?
What kind of data load it will be, full data or incremental?
Where the staging data will reside?
Where should aggregation be performed?

There are following well-known techniques are available:
1. Store and Forward: In this technique, a data is stored in staging area and then used for transformation and loading into Data Warehouse environment
2. Direct Database insert/update: In this technique, a data is directly read from ODS and directly will be inserted or updated in the Data Warehouse environment

There are following Data Load types:
1. Full Data: Here you use all the rows and update Data Warehouse environment with the data. This is time consuming process and the processing time will gradually increase because of data growth rate
2. Delta or Incremental: Here you only get the changed/new records and you process these records so that information is passed to Data Warehouse environment

Types Staging Data Stores:
1. File: Data can be placed in File. If more sorting operation needs to be performed then storing data in this format is beneficial
2. Table: Staging data can be stored in the table either permanently or for some time till it gets published to the Data Warehouse area

Where to perform aggregation: If the aggregation is required by the Data Warehouse process. It can be either performed while loading the stage area or loading Data Warehouse area. Decide where you want to perform this.

Conclusion
Thus considering all the sides you can prepare a good Data Staging Strategy.

Please refer to "Data Strategy" book by Sid, Larissa and Majid for more details.

Share: del.icio.us Digg Furl ma.gnolia Netscape Newsvine reddit StumbleUpon Yahoo MyWeb  

Posted by Milind Zodge at 8:45 PM | Comments (0)

February 3, 2007

How to design if many to many relationship exists between dimension and fact table

By: Milind Zodge

Overview
While designing a data mart/ warehouse sometimes you get cases where you find many-to-many relationship exists between dimension and fact table. Usually you have one-to-many relationship between dimension and fact table which is best for good OLAP/ cubes. In this article I am focusing on such M2M cases and how to design to resolve these cases.

Design
Letís consider a case. We have a time dimension, which have regular attributes, like year, month, date, day, WOY, DOW etc. And we have a fact table which have following date columns, schedule date, shipped date, order date and promised date.
As per the regular design we can either store these dates as it is or can store particular time_id from a Time dimension. If we store a time dimension id (surrogate key) we will create many to many relationship. If we donít store ids we will not be able to use these columns easily for data analysis.
To resolve this we can use a "Role Based Dimension" concept.
1. Create following views "schedule_date_vw", "shipped_date_vw" and "promised_date_vw" using the SELECT * FROM dim_time. This will create different roles of time dimension
2. Now use these views as dimensions in your schema
3. Add dim_time dimension table in the design
4. Use surrogate keys from the above created views/ dimension along with dim_time and store it in the fact table

Conclusion
This will resolve a problem of many to many relationship and can easily use in OLAP and cubes for better data analysis.

Share: del.icio.us Digg Furl ma.gnolia Netscape Newsvine reddit StumbleUpon Yahoo MyWeb  

Posted by Milind Zodge at 9:45 PM | Comments (7)