« Hedge Fund Problem -2 | Main | Bad dimension choice complicates measures and trend analysis »
August 20, 2008
Microsoft BI and CubeFilesystem
This is my real experience analyzing several Microsoft BI implementations – especially in Financial Institutions.
Programmers have been used to using file system as a place where they store and retrieve data in raw form, and they do significant logic in the code in (c , Java etc) on top of it to get the required functionality. When given a new toy, MDX they used it exactly the same way – as a filesystem where they can build the logic on; instead of business-dimensional space where they should build the logic in! It is the structure of the dimensional space, and in its various measurements, and in definition of scales, in logic that is embedded in moving measures from scale to scale and in defining measures when all you have is data points that they should be concentrating on!
Not that its wrong. This approach though inefficient is a perfectly accurate way of accomplishing the task. But why do you need to write a program in MDX. You may as well use Java, and aggregate data structures landed to disk if that is how you want to proceed.
Update: Aug 21
The last few blogs have led to some emails from friends asking me if what they are doing isn't apt use of the tool what then is proper use of the tool. I refered them to by previous blogs but promissed them to summarize them here
0) Define the space you are going to work in ie next 123
1) Identify dimensions as entities holding business variables. Is the problem I am looking to measure / model contain this variable as an independent parameter - not is this artifact of data representation. So lookups, data processing artifacts of what ever kind are not dimensions. Next question is are all these variables independent or can they be grouped as abstract entities thereby decreasing my dimensionality - keep in mind world is not 100 dimensional
2) Ensure your dimensions are orthogonal - ie there are no relationships between dimensions. Only relationships allowed are hierarchies within dimensions, and relationships between attributes (e.g. code, desc)
3) Further abstract your dimensions at entity level to reduce dimensionality. It is OK if some attributes are empty. You are not trying to reduce space here.
4) Now you have defined an n dimensional space with multiple scales for the problem. Place the data points in the space.
5) Define measures that represent the observed data at the dimensional hierarchy level of data points.
6) Define the relationship between measures at higher levels of hierarchy and lower / observed levels of hierarchy. These could be simple addition of values in each of the dimensional cells or much more complex formulae.
Now you have defined and placed the data properly in MS BI Cubes. Remember you still need to verify results against data points since error caluculation and propagation is not part of above scheme. You may incorporate it into above but it may be non trivial.
You use MDX to model the data on top of the measures. ie when you need to calulate Mx**2 y Z**3. Not for finding x, y and z because you have defined your dimensions wrong and need to correct by making coordinate transformations on the fly....or in programming speak you have business logic not incorporated into data structures.
Enjoy.
Posted by Vijay at August 20, 2008 11:45 PM
