BeyeBLOGS | BeyeBLOGS Home | Get Your Own Blog

« Who I am and why I'm here. | Main | Netezza »

May 1, 2008

Columns & Rows

Over the last couple of years, there's been a lot of talk about the advantages of column-oriented databases for data warehousing, with Michael Stonebraker from Vertica being particularly vociferous and bold in his claims that row-based databases are going to be completely replaced. As a vendor of a row-based database, I obviously have a vested interest in refuting his claims, but I'm going to try my best to be even-handed in discussing this issue.

The Claims
If you were new to database technology and read some of Stonebraker's articles, you might be forgiven for thinking that column-oriented databases were a completely new invention and were set to sweep row-oriented databases from the data warehousing market.

He claims that column-oriented databases are 10-50x faster than traditional row-oriented systems and offer significantly higher compression ratios, thereby bringing down the cost. Benchmarks against Oracle are usually put forward to back up these claims.

The Reality
The fact is that column-oriented databases have been around for some time. In the data warehousing market, long-established (but not very successful) examples include Sybase IQ and Sand.

There are some advantages of column-orientation for DW workloads. For example, data compresses slightly better when stored in columns (DATAllegro compresses between 2:1 and 6:1 depending on the content of the rows, whereas column-oriented systems claim 4:1 to 10:1). Also, some queries (i.e. those that only access a few columns) will perform better.

However, in most real-world implementations, these advantages don't make a great deal of difference.

At the end of the day, column orientation is just one approach to limiting the amount of data read for a given query. In effect, it's an extreme form of vertical partitioning of the data. In modern row-oriented systems such as DATAllegro, we use sophisticated horizontal partitioning to limit the number of rows read for each query. We're also working on clever usage of materialized view technology to limit the number of columns we need to read. The end result is very similar performance to that claimed by Stonebraker i.e. 10 to 50x that of traditional databases such as Oracle.

Posted by DATAllegro at May 1, 2008 6:33 PM

Trackback Pings

TrackBack URL for this entry:
http://www.beyeblogs.com/mt/mt-tb.cgi/124

Comments

Not a very in-depth analysis of the column-oriented database here at all. There's more to it than compression. You can't draw many educated conclusions from this article.

Posted by: Jeff at May 27, 2008 1:31 PM

Post a comment




Remember Me?