DATA-Beat A DATAllegro Blog on Data Warehouse Appliances

Site Map
DATAllegro Data Warehouse Appliances (877) 499-3282
From Terabytes to Petabytes

Blog Home

May 2008 Archives

May 1, 2008

Columns & Rows

Over the last couple of years, there's been a lot of talk about the advantages of column-oriented databases for data warehousing, with Michael Stonebraker from Vertica being particularly vociferous and bold in his claims that row-based databases are going to be completely replaced. As a vendor of a row-based database, I obviously have a vested interest in refuting his claims, but I'm going to try my best to be even-handed in discussing this issue.

The Claims
If you were new to database technology and read some of Stonebraker's articles, you might be forgiven for thinking that column-oriented databases were a completely new invention and were set to sweep row-oriented databases from the data warehousing market.

He claims that column-oriented databases are 10-50x faster than traditional row-oriented systems and offer significantly higher compression ratios, thereby bringing down the cost. Benchmarks against Oracle are usually put forward to back up these claims.

The Reality
The fact is that column-oriented databases have been around for some time. In the data warehousing market, long-established (but not very successful) examples include Sybase IQ and Sand.

There are some advantages of column-orientation for DW workloads. For example, data compresses slightly better when stored in columns (DATAllegro compresses between 2:1 and 6:1 depending on the content of the rows, whereas column-oriented systems claim 4:1 to 10:1). Also, some queries (i.e. those that only access a few columns) will perform better.

However, in most real-world implementations, these advantages don't make a great deal of difference.

At the end of the day, column orientation is just one approach to limiting the amount of data read for a given query. In effect, it's an extreme form of vertical partitioning of the data. In modern row-oriented systems such as DATAllegro, we use sophisticated horizontal partitioning to limit the number of rows read for each query. We're also working on clever usage of materialized view technology to limit the number of columns we need to read. The end result is very similar performance to that claimed by Stonebraker i.e. 10 to 50x that of traditional databases such as Oracle.



Whether you have a few terabytes of user data or hundreds, DATAllegro’s data warehouse appliances deliver a fast, flexible and affordable solution that allows a company’s data to grow
at the pace of its business. Based in Aliso Viejo, California, DATAllegro has offices throughout
the US as well as in Europe.

Learn More:
Data Warehouse Appliances | Data Warehousing Solutions | Contact Us: (949) 680-3000
______________________________________________________________________________________________
© 2008 DATAllegro Inc. All rights reserved. DATAllegro is a trademark of DATAllegro, Inc. All other companies are the trademark or registered trademark of their respective owners.