March 13, 2009
It's the tool, stupid! Or is it?
Every couple of months or so, they run an article on Business Intelligence in one of the Sunday business papers here in Dublin. The tenor of these articles goes like this: Just pick the right BI tool, sit back, and relax. It will all sort itself out. This reminds me of the Fast = On database switch a lot of people are looking for when they are doing performance tuning. I am always amused by the sales tone of these articles (In fairness there are also some valid points in the article, but a lot of it is just boring sales pitch). In reality, the right tool only contributes about 5-10% to a successful DW/BI project. At the end of the day, the big three BI tool vendors (Oracle/Siebel, SAP/Business Objects, IBM/Cognos) offer pretty much the same functionality. Their offerings only differ in nuances.
Don't get me wrong. Of course I appreciate the presence of Business Intelligence in the mainstream media. However, this just gets across the wrong message, sets the wrong expectations, and eventually is damaging to the BI industry.
So then, what is actually important for a successful BI project?
From a technical point of view, this is without a shadow of a doubt the Data Warehouse. You may get away without a Data Warehouse in small pilot type BI projects that involve low volumes of data and just run against one data source. Howver, once it gets just a little bit more complex the BI tools run into all sorts of issues. Think of performance, data integration, data quality, concurrency, complexity (did you ever have to deal with the spider-web like chaos of a Business Objects Universe run against an OLTP system?), no or limited historical data, non-replicable data queries, performance (did we have that before?), and performance (again). Also tell me, how are you going to report against a recursive hierarchical relationship in a transactional system? Not easy, not easy.
In the words of Ralph Kimball:
"Periodically, there's distracting noise in the industry about the feasibility of bypassing dimensional modeling and data warehouse databases to simply query and analyze operational data directly. Vendors promise magical middleware that hides the complexity of the source system so that business users can theoretically query the system [...] eliminating the costly and time consuming extract, transformation, and load processing. Though middleware may be able to mask the underlying data structures, it doesn't address the inherent problems surrounding the performance [...].You may find middleware solutions are only capable of relatively light-on-thefly data transformations, leaving your data integration requirements dangling."
Ralph Kimball, The Data Warehouse Lifecycle Toolkit, p.238.
From a business point of view, the single most important criteria for a successful BI project is to align the BI initiative with the overall business strategy. BI has to support the core business processes, resulting either in cost reductions or increased profits.
"For any given company in any given industry, we should systematically evaluate its industry, strategy, and business design as a means of identifying potential BI opportunities".
Nancy & Steve Williams, The profict impact of Business Intelligence.
Of course, there are a lot of other factors that determine the success of a BI project (skill sets, implementation methodology, training etc.). Get the above two wrong though, and you are out.
February 2, 2009
One pass SCD2 load: How to load a Slowly Changing Dimension Type 2 with one SQL Merge statement in Oracle
The other day I came across a useful new feature in the Merge statement for SQL Server 2008. This new feature outputs merged rows for further processing, something which up until now (Oracle 188.8.131.52) is not possible in Oracle. This extra functionality can be used to load a Slowly Changing Dimension Type 2 in one SQL statement. As a result you have only one pass over the data, less logical I/O, and as a result improved performance. If you are interested how exactly the Merge statement can be used in SQL Server 2008 to load an SCD2 have a look at this article. Recently, this has also featured in Kimball's design tip 107. The whole thing was wrecking my head a bit: how can we achieve the same thing in Oracle? Finally, I came up with a solution. I have to admit it is not perfect, as this will only work if you can identify a changed record at source as either an Insert or an Update. So you need to know if the record is a new one or was just modified. In the vast majority of cases you should be able to have this information available at extract time. So if you extract your information from source via change data capture, timestamp created/timestamp modified, audit records , or Oracle Total Recall you should be good to go.
The full test case can be found at my Business Intelligence Quotient blog
January 9, 2009
Macro Environmental Business Intelligence: web mining, data mining, and text mining of external data sources with Oracle. Part I.
In my opinion, one of the trends for Business Intelligence in 2009 (and the years to come) will be the integration of externally available data (data not found within the organisation itself, e.g. data in magazines, the web, libraries etc.) into the data warehouse and into an organisationís business processes. Using BI to monitor the external environment that an organisation operates in, will grow in importance for decision making.
"Decision makers [...] need information about what is going on outside the organization as well as inside.[...] Macroenvironmental analysis [...] examines the economic, political, social, and technological events that influence an industry".
From: Document Warehousing and Text Mining: Techniques for Improving Business Operations, Marketing, and Sales p.4.
However, this is not fully understood by the wider Business Intelligence community, as can be seen from the quote below. (This is a quote from an article on BI in one of the local business weeklies here in Dublin):
"BI tools are fundamentally about using data which an organisation already has - whether in databases, CRM systems, financial and accounting packages, ERP systems or elsewhere".
This perspective is too narrow. While it is fundamental to use BI to mine and analyse data that an organisation owns, it is as important to integrate data from external sources such as the web to optimize the internal decision-making process. Organisations that understand this requirement will have the edge over their competitors. For executives to make informed decisions they need to be able to look at intra-organisational events as well as the competitive environment.
"Strategic management is the art and science of directing companies in light of events both inside and outside the organization. In addition to understanding their own operations, managers must understand the rest of the industry. For example, should a company try to be a low-cost producer or a best-cost producer? How can a company differentiate its product line? Should the focus be on the entire market or on a niche? Without understanding what others are doing, making decisions about these types of issues leads to unexpected results."
From: Document Warehousing and Text Mining: Techniques for Improving Business Operations, Marketing, and Sales.
Web mining, data mining and text mining techniques will be of fundamental importance to implement this new breed of BI.
In this series we will have a look at all three areas. In today's article I will show you, how we can implement web mining techniques with Oracle. In part two of this series we will then look at how we can use data mining techniques in general and survival analysis in particular to analyse macro environmental data from the web. Finally, in the third part we will look at how we can use text mining to classify and cluster the extracted data.
So, what we will do today, is harvest macro environmental business intelligence of real estate data. I thought it might be interesting to look at property related data because of the recent bursting of the property bubble. The site we will extract data from is property.ie.
The information we harvest can be used to (amongst other things)
- Identify areas where houses sell the quickest (have a short survival rate).
- Identify features of houses that sell the quickest.
- Find properties that are near other properties
- Create a taxonomy/classification to browse properties by features
- Monitor price increases or decreases.
- Use a combination of all of the above.
You can find the rest of the article and the Oracle case study at BI Quotient.