BeyeBLOGS | BeyeBLOGS Home | Get Your Own Blog

July 27, 2008

Maximizing Data Warehouse ROI?- Keep Most Detailed Data

This is one of those common-sensical approaches to get the maximum out of your Data Warehouse. As you make your investments in Data Warehouse environment, you can enhance your ROI, by using this environment for diverse applications. Having most granular (or detailed) transaction level data is core to broad-basing the Data Warehouse applications.

Traditional use of Data Warehouse environment for the purpose of back-room analytics, is no longer applicable. It can now serve as a single reference source for any of your BI related information needs. Some examples are:

  1. Summary Analytics
  2. Enterprise Reporting (needs transaction level data)
  3. Performance Management
  4. Data Mining (needs transaction level data-mostly)
  5. Business Modeling (sophisticated models need transaction level data)
  6. Operational BI like single customer view for telemarketers, customer support (needs transaction level data)
  7. Ad-hoc queries by operational staff (needs transaction level data)
  8. Root Cause investigation for issues, where you need to drill down to problem areas (needs transaction level data)
  9. Business Applications with Embedded BI modules (may need transaction data)

There are reasons for using Data Warehouse as a single reference information source. This helps you to:

  1. Maintain consistency: If your summary MIS/analysis and enterprise reports are coming from different sources, you will struggle to keep the numbers in synch.
  2. If your production data needs an offline fix (like standardizing customer and product IDs), its better to do that data-fix in one place. If you have separate enterprise reporting and analysis platforms, you will need to do that data transformation at two places, instead of one.
  3. Data Auditability: A single information reference point having detailed data will provide a good audit-trail of your summary transactions/analysis.
  4. ETL synergy: If you have diverse systems, and you want to have some level of information integration, its better to do it at one place. Doing ETL for summary data warehouse and a detailed reporting database, will almost double your efforts.
  5. Overall platform ease: You maintain only one information infrastructure (administration, scheduling, publishing, performance tuning...).
  6. Ease of Change Management: Any change in your information requirements, or changes in your source systems will be managed and done at one place.

Then the question is- If this approach is so good, why many companies use Data Warehouse only for summary data? Keeping granular data in data-warehouse has its own challenges and has its own demands:

  1. Brings forth the real issues with transactional data: In summary data warehouses, you can ignore some of the transaction level data issues and do some patch-work to ensure that aggregated data has a level of acceptable quality. Bringing in granular data, will need more incisive surgery on your data issues. This will extend the time of implementation.
  2. ETL efforts go up: This is related to the first point. Your key plumbing task in DW will become larger and more complex.
  3. Existing robust and stable reporting and querying platforms: Why to fix, which ain't broken? etc...

My answer to the above reasons will be that while you can be flexible for functional level data marts, you should go for granular data from Day 1 for your enterprise data warehouse. If you are in a hurry, create a quick-fix data mart on the side. For an enterprise level data warehouse, you can start with few high-priority business themes, but ensure that they are designed for long-term usage (i.e. granular data).

You can refer field tips for BI and Data Warehouse in my portal Business Intelligence and Performance Management


Posted by Rajan Gupta at 10:30 PM | Comments (16)

July 22, 2008

Universal Data Warehouse Dimensions- Is it possible?

I have talked a lot about having universal & foundation dimensions in the data warehouse in my portal Business Intelligence and Performance Management Institute. In brief, Data Warehouse is typically a dimensional model, which is different from relational/OLTP model. In dimensional model, you have dimensions and their attributes in their own tables linking through a central 'Fact Table', which carries all the numbers (or measures).

Now, the idea is that one should have one standard dimension (and attributes) for a business entity. Lets say that there are 20 star-schemas (or cubes in OLAP lingo) in a data warehouse, and ten out of them are using customer dimension. The concept of foundation dimension is that all these ten cubes should be using an identical customer dimension. This helps in doing analytics across multiple cubes and significantly reduces the change and development effort.

One reality check one has to take is that for large organization, it is extremely difficult to have a single customer dimension. This is because they may be operating very diverse markets or products. Getting all the business heads to agree to a single dimension may take years. Many of my clients have asked on if these 'standard' dimensions are practically possible.

My answer to them is that yes its possible, if we do apply the following techniques:

Create Super-Sets: Take a convenient path of creating 'super-set' dimension, which can absorb diverse entitites. For example creating a super set customer dimension, which can absorb retail, corporate and group-customers.

Create different foundation dimensions: You do not need to create a single customer dimension for the whole organization. You can create 'retail customer' dimension, 'corporate customer dimension' etc... Both these dimensions will be different, but they will be universal within themselves. The assumption is that you will not be required to do cross-navigation across retail and corporate customers.

Start with high priority Dimensions: Instead of trying to make universal dimensions for all entities, identify top 5-6 which are most critical. Pareto principle equally applies here. The critical dimensions can be customer, vendor, product, location, sales channel etc...

Posted by Rajan Gupta at 1:15 AM | Comments (0)

July 20, 2008

Data Warehouse vs. BI

Data Warehouse and BI are considered synonymous. The reality is that Data Warehouse is one of the components of BI. There are many more components, which are needed to be in place for an actionable BI platform from an IT perspective. These elements are:

OLAP server- This is a multi-dimensional database, which provides extensive analytical capabilities. It sits between data-warehouse and the end-user tools. It picks data from Data Warehouse, summarize it and store it in its OLAP multi-dimensional database. The OLAP database is designed in a way that it helps analytics and other BI functions. OLAP has wide range of pre-built analytical functions, which can be used by users or application which are accessing it.

Enterprise Reporting- These set of tools, provide enterprise level (mostly scheduled) reporting. These tools take their data from OLAP or directly from Data Warehouse. OLAP typically has summary data. When you need detailed transaction level data, one will have to take it from Data Warehouse. In the past BI was typically used only for analysis. However, as Data Warehouse and OLAP combination is expanding its use, enterprise reporting tools have started using DW OLAP as the source.

Query and Analytics Tools- These are the tools, which enable you to do wide range of analysis. This typically deals with summary data (you would not look for individual transactions in our analytics). Many BI platforms is that they enable you to drill down to the transaction level, if you need to investigate into details. This means that in the back-end, you move from OLAP (summary) database to the detailed data warehouse database.

Performance Management Tools- This is the world of Dashboards, scorecards, setting standards, goals, reporting on the performance variance etc. This breed of tools, enable you to manage the functional or enterprise performance, and link it to analytics and enterprise reporting. Therefore, if you have a dip in performance, you can do analytics to find the root cause, and use reporting to list the transactions which are contributing to the same.

Data Mining Tools- While the above three end-user tools (reporting, analytics and performance management) are core to an organization's BI capability, Data Mining is the next level of sophistication. These are knowledge discovery tools, which generate patterns, trends and co-relations on the data.

Business Modeling tools- These tools enable you to create models (like pricing models, actuarial models, business planning models, sales projection model...).

Therefore, when your vendor states that they offer BI solution, do check on what components are they offering. While there are end-to-end BI platforms like SAS and Business Objects, there are many vendors, which provide competitive individual components.

For details around end-to-end BI, you can refer my portal Business Intelligence and Performance Management Institute.

Posted by Rajan Gupta at 9:45 PM | Comments (5)

July 15, 2008

Data Warehouse Infrastructure- Consider all layers

Estimating the Data Warehouse infrastructure is tricky. One can never take an intelligent guess on the number of users, the kind of queries and the kind of usage that Data-Warehouse will be put to.

Data Warehouse is one among many components of business Intelligencce platform. The other important components are:

  1. ETL tools (which extract and load data into Data Warehouse)
  2. OLAP Server (which picks data from Data Warehouse and load it in analysis friendly multi-dimensional form)
  3. End-user tools (like enterprise reporting, analytics tools, data mining tools...), which sit over OLAP and Data Warehouse to make use of the data stored in them. In other words, they create information out of the data.

Data Warehouse is core, as it provides the sanitized, integrated and consistent data to the end-user tools. These end user tools generally access this data through OLAP. They go to the data-warehouse, when they need transaction level data. The summary level data is generally available at OLAP level itself.

Given this back-ground, one has to understand that all query processing in business intelligence does not happen at Data Warehouse level. Out of 1000 users accessing business intelligence platform, only 100 may be accessing data warehouse. Rest of them may be accessing and querying the OLAP Database, or enterprise reports repository.

As you decide upon you Data Warehouse infrastructure needs (including licensing), you have to consider the load of ETL, and the volume of data which you will store. However, for the infrastructure related to the number of users, query load etc, one needs to take into account the entire architecture of OLAP and End-user tools.

More details are available in my recent field tip Data Warehouse Infrastructure Estimate in my portal Business Intelligence and Performance Management.

Posted by Rajan Gupta at 10:15 PM | Comments (4)

Data Warehouse can wait- Start with Data-Marts

An organization, how-much-so ever big or small it is, can wait for enterprise data warehouse. It should start with few critical Data-Marts. The reasons are:

An Enterprise Data Warehouse is a long term commitment: There are many imperatives (or foundations), which are key for a Data Warehouse. The examples of these imperatives are foundation or conformed dimensions, fine-grained granular data, comprehensive star-schemas etc...These elements need high level of readiness and investments to build these foundations. These foundations (though great for data marts as well) can be compromised for initial set of Data-marts.

Business Learning- Initial set of data-marts will provide great learning, less on the IT side and more on the business side. Here are the set of learnings from business side:

  • Creating business themes
  • Building Data-Mart Business Requirements
  • Building Dimensional Model
  • Testing of Data-Mart
  • Taking business decisions around the extraction and transformation
  • Generating the information out of the Data-Mart through end-user tools (like reporting and analytics application)

Examples of IT Learnings:

  • Extraction, Transformation and Loading design
  • Processing Load Management
  • Handling Data Explosion (data goes up exponentially as you add sparse fields- where most of the records are blank)
  • Change Management (end-to-end impact analysis if you make a change in the Data Mart Model)

Show-case for sponsors: A successful Data-Mart makessponsorship of a Data Warehouse much easier.

Quick-hit: A Data-mart is a quick hit and gives earlier gratification.

Non-Disruptive: It does not take away the attention of an organization from other big things.

You can refer my portal Business Intelligence and Performance Management for more details.

Posted by Rajan Gupta at 3:15 PM | Comments (30)

July 11, 2008

IBM Data Governance Prediction- My Anxiety

I came across a news item talking about the IBMs top 5 predictions for Data Governance ( These predictions are coming out of IBM Data Governance Council, comprising 50 top-notch companies from wide range of sectors.

While the predictions were interesting and point to a brighter future for data governance, this one raised some level of dissonance. As a general disclaimer, I beg pardon if I have mis-interpreted the text of this prediction. These are only initial thoughts, and I do plan to dig out more. Please cascade this blog post to your network as it can generate some healthy discussion.

The text of that prediction is-

'The role of the Chief Information Officer (CIO) will change making this corporate officer responsible for reporting on data quality and risk to the Board of Directors. The CIO will have the mandate to govern the use of information and report on the quality of the information provided to shareholders.'

My Dissonance- I don't agree that CIO role should be taking the ownership of reporting on information quality as well as governing the use of information. It may lead to role conflict and de-focus the CIO role. These are the questions, I will seek clarifications on to address my dissonance:

  1. CIO's core role is to be a strategic internal service partner to the business and operations to 'make it happen'. Should we be mixing the role of a service provider to a quasi-governance and oversight role.
  2. Will we not create a confusion between the roles of Audit, CFO, Internal Control and Data Steward (if you have one)? I feel that there can be enough roles to oversee and audit. The issue today is more towards owning and delivering on the tasks related to data governance.
  3. A CIO cannot have the mandate to report on the quality of the information provided to shareholder. This may set him/her up for failure. This has to be a business role, which encompasses not only IT, but the business processes, manual controls, compliance and regulatory checks outside of systems etc...Aren't we conflicting this role with that of a CFO and CEO? CIO may be responsible for certifying that the data lying in the systems is consistent. However, how can the CIO take the ownership to govern the manual adjustment figures entered by finance at the period-end processing??
  4. If something is wrong with the information provided to shareholders, where the buck will stop? Will it be the CIO or CFO?
  5. CIO cannot have the mandate to govern the use of information. The use of information is defined by the user access matrix and distribution lists of various reports and outputs made by the system. This access matrix has to be defined by the Business and internal control. How can a CIO decide on which groups and which functions can use that information?. Secondly, not all information is in the systems, and lot of it is manual.
  6. From my point of view, the data governance and quality needs more business ownership at all the levels, as Data Governance goes much beyond the system boundaries. Much of the data issues are either due to faulty data entered in the systems or a lack of robust business specifications for IT systems. Isn't the prediction seem to be recommeding the move in the other direction?

Looking forward to your comments. This post should be taken as invoking comments from readers, and inviting discussion. I will add more to this post. You may also refer my portal Our main theme has been that data quality and data governance is much more a business issue than an IT issue.

Posted by Rajan Gupta at 9:30 PM | Comments (0)

Business owned applications are a reality- Manage it

A real-life medium to large size organization will have hundreds (if not thousands) of small to medium sized 'applications' which are owned by business and are not on IT radar. The key reason is that IT is not able to (rightly so) meet all the business demands within the time and money constraints it has. Therefore, working units in the business create their own applications, which may range from excel based to a full-fledged IT platforms. Many a times, these business units have their own 'captive' IT units.

Many of these systems, over time grow, spread and become an important link within the business processes. While being critical, they don't have the level of robustness and reliability, which is inherent with IT-owned systems. This generates a financial, operational and compliance risk.

These applications also become an important part of your data quality and BI agenda. This is because they carry important and business critical information. In my experience, a fair proportion of effort on any enterprise level Data quality or BI initiative goes into mapping, extracting and transforming the data from these sets of apps.

The response of an organization may range from 'fight' to 'flight'. My recommendation is to accept the reality, formalize it and mange it. The informal business applications are here to stay and you cannot take away the reasons, which lead to their existence. Here are the steps one can follow:

  • Step 1- First of all, one needs to have a sponsorship from the owning business functions to open-up their world and let the teams working on Data Mapping or data quality program.
  • Step II- One can create a quick inventory list of all informal applications, and do a first level prioritization.
  • Step III- More detailed analysis of the inventory list by using a standard set of questionnaires. Some of the questions in that questionnaire would be:
    • Will the key business processes come to stand-still if the application does not work for one day, one week, and one month?
    • Does this application stores or processes the financial data?
    • Does this application stores or processes the data related to the privacy laws, like credit card numbers, personal contact details?
    • Does this application have a disaster recovery in place?
  • Step IV- Short-list the applications, where you to have the first go. Make a road-map to bring the critical applications into IT fold.
  • STEP V- Issue guidelines on the management of information applications. As part of these guidelines, you can include:
    • What can be part of the informal applications and what can't be.
    • Procedure of periodic check on the inventory
    • Procedure for aligning with IT principles and architecture for a given class of applications
    • Sign-off from IT on controls and quality related areas etc...

There are multiple benefits of this approach:

  • Business and IT can work collaboratively.
  • Awareness of risk is half the battle won. Once you know the soft spots, you can work on them.
  • Your Data Quality, Data Integration and BI initiatives will be smoother and efficient.

In other words, formalize this reality and you will be able to manage the risk much better. For more details on this subject, you may refer Business Applications are a reality- Manage it in my portal .

Posted by Rajan Gupta at 8:15 AM | Comments (4)

July 10, 2008

BI Service Providers- Big may not be the Best

Dear Readers! I have many links to my portal here, as I am struggling to cover a big subject in a single post.

As BI has picked-up pace, many IT service providers who have grown big through OLTP systems business, have started their 'BI practice'. Many a times, the 'OLTP' and 'ERP' DNA becomes a significant barrier in building a true-blue BI capability. Therefore, 'Big' may not be 'Best' here.

I have mentioned at some places in my portal , BI and Data Warehouse (as a key part of BI) require a different mind-set and capability-set to manage. Some of the reasons are as follows:

  • Business requirements are fluid and constantly changing.
  • Business Requirements are difficult to articulate and capture.
  • Dimensional Model is pretty different from OLTP data modeling
  • Needs in-depth domain expertise to model and design.
  • Load Management is unpredictable.
  • Testing is vastly different
  • The DW modeling has to be extensible and flexible , even if business does not ask for it.
  • Short attention span from stakeholders, as life can go on without BI (unlike an ERP).
  • Storage space and infrastructure needs are less predictable.
  • BI is 80% business and 20% IT (disclaimer- I am not short-changing IT, but emphasizing upon the criticality of business stake-holding)
  • Etc... Etc..

An OLTP-based Vendor has to understand these unique aspects, and bring that fundamental shift (as an economist will say 'macro-economic restructuring') in the skills and mind-sets.

I have had to struggle to find a service provider which can provide a mix of business, IT, Process and Modeling skills under one roof. The few which I was able to find were too exhorbitant to afford. Finally I had to resort to a combination of 2 to 3 service providers to complete the skill-basket.

You can refer Data Warehouse has unique challenges and Business Intelligence Vendor Evaluation to complete the picture.

Posted by Rajan Gupta at 1:15 PM | Comments (0)

July 9, 2008

Simple and Effective- Periodic Reports Rationalization

There are more simple steps an organization can take to manage its information than complex technology driven ones (though both are necessary). These steps do not need massive IT investments. They not only boost effectiveness, but also create a strong foundation for your IT-based BI platforms. Here is one of such steps:

On periodic basis, rationalize your reports and views in terms of

  • Reports not getting used
  • Duplicate or nearly duplicate reports
  • Report having mis-matching formulae.

You can address the above by de-activating the un-used reports, creating super-set reports, fixing formulae etc...

Key points to note are:

  • It takes little time. I have seen people rationalizing 200 reports in a single day, after few months of experience.
  • You don't need perfection. 70-75% achievement is good
  • This does not require funding, high-level sponsorship or a go-ahead etc. It can be driven by the CIO along with assigned IT and business analyst.

The critical success factor is regularity. If you do it well, when you go for your BI investments, you will have much smarter (and leaner) business requirements in place. If you have not been doing this, please try it once and share your feedback.

More details on this subject are in Periodic Rationalization & Prioritization of Information has multiple benefits in my portal


Posted by Rajan Gupta at 10:30 PM | Comments (2)

Open Source Business Intelligence- A fitment with Caution

You can reduce your BI costs, by gradually testing and adopting Open-Source BI in a select set of areas. There is a difference within open source and commercial open source. In this page we are talking about commercial open source, which is not free of cost but of minimal cost, with adequate support and services infrastructure. Pentaho and Jasper are examples of commercial open source.

I recommend the evaluation and use of Open Source BI for non-core BI areas like presentation & visualization tools, business modeling tools, web-based query and reporting tools (not enterprise reporting tools) etc. In other words, the open source BI is not recommended to be used for core production engines of your BI environment.

While big players like pentaho and Jasper have bagged some large contracts for core BI platforms, they will need many years of satisfactory field reports to generate wide-spread confidence.

The 'non-core' approach will give you a great cost advantage, as it save cost for large scale viewer licenses. It is also a way to test the robustness and support capability of these tools and Vendors. Some of the open source end-user tools are fairly competitive in terms of features. When you acquire a commercial open source, one can be more diligent on service and support factors.

In case you need some more detail on the same subject, you may refer Open Source BI- A cautious fit in your BI plans in my portal

Posted by Rajan Gupta at 11:15 AM | Comments (0)