BeyeBLOGS | BeyeBLOGS Home | Get Your Own Blog

Main | February 2009 »

January 9, 2009

Macro Environmental Business Intelligence: web mining, data mining, and text mining of external data sources with Oracle. Part I.

In my opinion, one of the trends for Business Intelligence in 2009 (and the years to come) will be the integration of externally available data (data not found within the organisation itself, e.g. data in magazines, the web, libraries etc.) into the data warehouse and into an organisation’s business processes. Using BI to monitor the external environment that an organisation operates in, will grow in importance for decision making.

"Decision makers [...] need information about what is going on outside the organization as well as inside.[...] Macroenvironmental analysis [...] examines the economic, political, social, and technological events that influence an industry".
From: Document Warehousing and Text Mining: Techniques for Improving Business Operations, Marketing, and Sales p.4.

However, this is not fully understood by the wider Business Intelligence community, as can be seen from the quote below. (This is a quote from an article on BI in one of the local business weeklies here in Dublin):

"BI tools are fundamentally about using data which an organisation already has - whether in databases, CRM systems, financial and accounting packages, ERP systems or elsewhere".

This perspective is too narrow. While it is fundamental to use BI to mine and analyse data that an organisation owns, it is as important to integrate data from external sources such as the web to optimize the internal decision-making process. Organisations that understand this requirement will have the edge over their competitors. For executives to make informed decisions they need to be able to look at intra-organisational events as well as the competitive environment.

"Strategic management is the art and science of directing companies in light of events both inside and outside the organization. In addition to understanding their own operations, managers must understand the rest of the industry. For example, should a company try to be a low-cost producer or a best-cost producer? How can a company differentiate its product line? Should the focus be on the entire market or on a niche? Without understanding what others are doing, making decisions about these types of issues leads to unexpected results."
From: Document Warehousing and Text Mining: Techniques for Improving Business Operations, Marketing, and Sales.

Web mining, data mining and text mining techniques will be of fundamental importance to implement this new breed of BI.

In this series we will have a look at all three areas. In today's article I will show you, how we can implement web mining techniques with Oracle. In part two of this series we will then look at how we can use data mining techniques in general and survival analysis in particular to analyse macro environmental data from the web. Finally, in the third part we will look at how we can use text mining to classify and cluster the extracted data.

So, what we will do today, is harvest macro environmental business intelligence of real estate data. I thought it might be interesting to look at property related data because of the recent bursting of the property bubble. The site we will extract data from is property.ie.

The information we harvest can be used to (amongst other things)

- Identify areas where houses sell the quickest (have a short survival rate).
- Identify features of houses that sell the quickest.
- Find properties that are near other properties
- Create a taxonomy/classification to browse properties by features
- Monitor price increases or decreases.
- Use a combination of all of the above.

You can find the rest of the article and the Oracle case study at BI Quotient.

Posted by Uli Bethke at 12:00 PM | Comments (5)