BeyeBLOGS | BeyeBLOGS Home | Get Your Own Blog

December 11, 2009

Columnar DBMS and DW-BI-Analytics

Is any customer happy with the query performance of current BI Applications?
Does any DBA tunes the DB enough to meet Reporting performance?
As per my experience, final Answer is a BIG NO ( Not referring to cubes)...................Tough to accept it, but it is the reality of today's DW BI world!
Let us discuss about the need of Column oriented RDBMS and how does it fit for DW BI.
Firstly, Current Row-oriented RDBMS designed for OLTP and used for DW BI as well
> Effectively handles the transaction with fewer rows of insert/update/deletes but In DW it requires fewer columns of data and vast number of rows in terms of aggregations to address business queries.
> it must read entire row to access the column data hence to address the business intelligence queries it creates huge I/O burdens.
> it is designed to handle numerous numbers of transactions that affects fewer rows in turn to address BI needs it further burdens the I/O sub systems in terms of Joins and huge number of rows. In addition DBA's create indexes, materialized views pre aggregated data and cubes to address the different needs that increases processing time and consumes more storage.
In DW each transaction involves more numbers of rows (in millions) but fewer columns. Hence we need the RDBMS that is designed for this purpose.
I believe that the answer is column-oriented RDBMS, how does it help in improving fetching performance?
> Data is organized and stored as columns
> it can fetch only the columns required for a query and instead of requiring separate indexes for querying the data, the data values in each column themselves forms an indexes reduces I/O, rapid access to data and simultaneously dramatically improves the query performance
> Simple column based approach seamlessly designed for Business intelligence needs.
> Better performance is automatically achievable through aggregation functions and the intelligent query engine at each column level instead of using aggregation tables and materialized views/views for this purpose.
> Size of the overall DW database comes down dramatically by eliminating column value duplication, mapping the description values to integers ( e.g. 123 - ABC corp. AB)
> Query performance increases dramatically because
> it fetches only the required columns ( it does not require to read whole row)
> uses fewer indexes ( no special indexes required out here) and hence reduce in size and maintenance
How does it helps in Loading (ETL)
> Loads as columns, eliminates repeated storage
> uses less/no indexes hence there is no need of handling indexes before loading it.
> Partitioning and parallelism helps to reduce the load time window
How does it address the other requirements of DW?
> current DW size heading towards thousands of terabytes in few years in Row-oriented DBMS, overall data size will be reduced at rest and on fly as well using the Column-oriented DBMS, In turn less network traffic and lesser I/O.
> Backup/ recovery - becomes comparatively simple due to lesser size of DB and parallelism
> Archive/Purge - Again the size of the database dramatically reduced and easily handled when compare to current row based RDBMS
It sounds great for DW, emerges as the best alternative against the generic RDBMS in coming years.
Definitely I believe that the column based RDBMS is the answer for all performance related issues in DW, Great relief for end users. There are few questions need to be addressed before it becomes selling point
1. Do the vendors ready to support EDW?
2. Are the vendors matured enough to address advance requirements?
3. Do vendors support all data types and advance functions required to address complex calculations?
4. do they have any case study or real-time implementation to support the benefits of column oriented RDBMS?
6. How do we address the challenges related to move of existing Data warehouse and all the referring BI applications?
7. what is the impact in the business organization ( as most of the big organization have done considerable work and investment in DW BI).
8. Still people could argue with support of well matured row based RDBMS as column based RDBMS is yet to mature...

Posted by Jay Chinnaswamy at 6:00 PM | Comments (3)