« February 2008 | Main | April 2008 »
March 21, 2008
Data Modeling in the BI World
One of the key enablers of successful Business Intelligence programs are the ubiquitous, hard-working "Data Models". Data Model is the heart of any software system and at a fundamental level provides placeholders for data elements to reside.
Business Intelligence systems with all its paraphernalia - Data Warehouses, Marts, Analytical & Mining systems etc. typically deals with the largest volume of data in any enterprise and hence data models are highly venerated in the Data Warehousing world.
At a high level, a good Data Warehouse data model has the following goals: (Corollary - If you are looking for a data modeler look for the following traits)
1) Understand the business domain of the organization
2) Understand at a granular level the data generated by the business processes
3) Realize that business data is an ever-changing commodity - The placeholder provided by the data model should be relevant not only for the present but also for the future
4) Can be described at a conceptual and logical level to all relevant stakeholders
5) Should allow for non-complicated conversion to the physical world of databases or data repositories that is manipulated by software systems.
Extensible Data models deal with all the 5 points mentioned above and more specifically has future-proofing as one of its main stated goals. Such extensible models should also be "consumption agnostic", i.e. - it provides for comparable levels of performance irrespective of the way data is being consumed.
Entity-Relationship & Dimensional modeling (http://www.rkimball.com) has been the lingua-franca of BI data modelers operating at the conceptual and logical levels. Newer techniques like Data Vault (http://www.danlinstedt.com/) also provide some interesting thoughts in building better logical models for Data Warehouses.
At the physical implementation level, both relational (ROLAP)and multi-dimensional (MOLAP) databases form the backbone to the BI infrastructure. Each of these techniques have their own strengths and weakness, hence BI data modelers need to be aware of their capabilities to ensure that the right decisions are taken for physicalization of the logical models.
Even among the relational OLAP vendors, traditionally dominated by row-major databases like Oracle, SQL Server etc. there are column-major relational databases of the likes of Sybase IQ, Vertica etc. gaining a lot of popularity with claims of being built ground-up for data warehousing. The physical layer is also seeing a lot of action with the entry of data warehousing appliance vendors like Netezza, Datallegro etc. (http://www.dmreview.com/article_sub.cfm?articleId=1009168).
The intent of this post can be summed up as - BI practitioners should:
a) Understand the BI/analytical goals of the enterprise before deciding the data modeling techniques - Make it extensible and future proof
b) Understand the current techniques that help envisage and build data models
c) Be on the look-out for new developments in the data modeling and database world - There is lot of interesting action happening in this area right now!!
Data Modeling is a fascinating area that combines functional knowledge with technology skills and a good data model goes a long way in ensuring success of enterprise wide BI initiatives.
Thanks for reading. Please do share your views / thoughts.
Posted by Karthikeyan Sankaran at 5:45 AM | Comments (0)
March 8, 2008
"Right" Time Data Integration - How "Real" can it get?
Data Integration in the BI sense, is all about, extracting data from multiple source systems, transforming them using business rules and loading it back into data repositories built to facilitate analysis, reporting, mining etc.
Given that the raw data has to be converted to a different form (subject-oriented rather than being process oriented) more amenable for analysis & decision-making, there are 2 basic questions to be answered:
1) From a business standpoint, how fast should the "data-information" conversion happen?
2) From a technology standpoint, how fast can the "data-information" conversion happen?
First question is related to the concept of "Right-Time" BI while the second one deals with "Real-Time" data integration. You can get a feel for this topic at the link below: http://www.tdwi.org/research/display.aspx?ID=7095
Traditionally, BI being used more for strategic decision-making, we were happy with the batch mode of data integration with periodicity of a day or later. But increasingly, business demands that the data to information conversion has to happen much faster and that technology has to support it.
Since the answer to the first question above from the business side, is fast becoming "as fast as possible", the focus has shifted to the technology side. Some solutions to the problem are highlighted below:
1) Enterprise Information Integration (EII) - The paradigm here is to "Leave the transaction data where it resides". Business Intelligence reporting/query/analytical tools have to seek data from the OLTP systems through a semantic layer that defines the required analytical relationships. This is probably as real time as you can get!
2) Active Data Warehousing - The most popular proponent of this approach is Teradata. This is the concept of "BI on the Fly". By intelligently combining the hardware and software power, tools like Teradata and other DW appliances can provide analytical outputs from transactional data with terrific performance.
3) BI with EAI Architecture - In the traditional approach to DW construction of integrating multiple sources through ETL tools, one area where I foresee a lot of activity is in the close interaction of EAI tools like IBM Websphere MQ, TIBCO etc. with data integration tools like Informatica etc. At this point in time, though the technology is available, there aren't too many places where messaging is embedded into the BI architectural landscape.
Bottom-line is that there is significant value gained by ensuring that raw business data is transformed to information by the BI infrastructure, as fast as possible, with the limits being prescribed by business imperatives. The best explanation I have come across to explain the value of information latency is the article by Richard Hackathorn (http://www.tdan.com/view-articles/5132).
Thanks for reading. Please do share your thoughts.
Posted by Karthikeyan Sankaran at 4:45 AM | Comments (1)
