BeyeBLOGS | BeyeBLOGS Home | Get Your Own Blog

« December 2009 | Main | February 2010 »

January 28, 2010

Why We’re So Positive

Regular readers know that the Sybase IQ blog is one of the most upbeat industry sites you will find anywhere. There’s a simple reason for that — we who contribute share an impossible-to-conceal enthusiasm for Sybase IQ and the possibilities it opens up for data warehousing and analytics users in a wide variety of business settings around the world. Sybase IQ powers some of the most sophisticated and challenging analytics environments on earth, and delivers unprecedented (and often unexpected) business value to just about everyone who uses it.

So, yes, there’s plenty to be excited about.

I was pondering all the reasons we’re so positive this morning as I listened to Sybase Chairman, CEO, and President John Chen on his quarterly earnings webcast. If you’re tired of fear and worry and economic doldrums, and if you’d like a refreshing shot of good news, go ahead: give it a listen.

Let me just point out a few key items from the CEO’s report:

Q4 2009 was the best quarter in Sybase history
It was the company’s ninth consecutive record quarter
It completed Sybase’s third consecutive record year

Our overall database license revenue grew 22% in 2009. In Q4, database license revenue was up a “mere” 9% over Q4 2008 — only because we did so spectacularly well in that quarter that it was pretty hard to top it! Sybase IQ, along with Sybase RAP, combined to make analytics a major driver of database license revenue both for the quarter and the year. Mr. Chen reports that our analytics business enjoyed solid double-digit growth in 2009.

Sybase IQ signed on 66 new customers in Q4. They join the ranks of our more than 1700 customers worldwide, the largest customer base (by far) of any specialty analytics server.

Q4 also saw the beta release of a new analytic solution for the telecom space, designed to operate in a cloud computing environment. Built on Sybase IQ, this solution is designed to provide operators with deeper insight into out-of-network messaging traffic patterns and customer behavior, as well as overall network performance. Sybase is uniquely positioned to enter this market because we already have the raw data in our hubs — more than 97% of SMS data is currently handled by Sybase. Moreover, Sybase IQ is designed specifically for high-performance analytics applications like this.

The only hint of a dark cloud on the horizon is the reality-check question that the CEO raises himself: how long can Sybase sustain numbers like these? In response, he provides some impressive numbers from industry analysts. Their forecasts indicate continued double-digit growth for the overall analytics market for some time to come.

And when John Chen says he expects to continue to beat that rate of growth…well, it’s hard to argue with a track record like his.

To that analysis I would add these two salient facts, as reported on this site a couple of months ago:

According to a recent report from the Data Warehousing Institute, 38% of organizations surveyed are practicing advanced analytics today, and 85% say they’ll be practicing it within three years.

Over that same three years, 48% of those organizations plan to completely replace their current data warehouse platform.

So it looks like big changes are coming, and soon. I’ll leave predicting the future of the market to the industry analysts and our CEO, but there’s one prediction I’m perfectly comfortable making:

For the foreseeable future, the contributors to (and readers of) this blog are going to have plenty to smile about.

Posted by Sybase IQ at 5:48 PM

January 6, 2010

Compression v Enumeration

… And the winner is …

Over the holiday last month, I came across a well-written article on column-oriented databases. The author, a BI consultant named Bojan Ciric, wrote an excellent summary of column-oriented databases, one you might want to read here, if you’re new to the topic.

In the article, the author cautions that column-oriented databases pay a penalty for decompression each time data is read. The presumption seems to be that if data is compressed in size, there must concomitantly be a de-compression step.

This is not exactly the case with Sybase IQ. More importantly, this discussion provides an opportunity to delve a bit into the ”how” of sophisticated column oriented databases, using Sybase IQ, and how it achieves both storage space reduction and query performance improvements, without incurring a noticeable penalty to decompress data as the example.

Sybase IQ is the industry’s leading column-oriented database, and, as Mr. Ciric describes, stores and retrieves by column, permitting very selective retrieval in response traditional SQL. But beneath the covers, there’s more to it. Sybase IQ enumerates column data into index structures prior to storage. This is different than compression, and more effective.

The resulting indices, when combined with an smart, index-aware SQL query processor, together yield dual benefits – a large reduction in the database storage footprint, and a net reduction in analytic and reporting query times, as compared to traditional row-based databases.

How’s this done?

Internal to Sybase IQ, and invisible to the query, most data types are enumerated, during load, insert or update, into one or more of a rich family of index structures. Indices are available in Sybase IQ optimized for low- and high-cardinality data, use in aggregations, use in ”like” operations, and for specific data types like date and time.

Invisibly, at least from the query’s perspective, enumeration reduces the size of the required structure quite dramatically – ranging from 10 to 90% depending on data type, cardinality and width. This directly drops the storage footprint, hence the confusion about whether the result is being achieved by compression, or more accurately, by enumeration.

The benefits of enumeration go beyond reduced reliance on de-compression. Most importantly, the resulting data structures, can be selectively retrieved – if a query predicate, whether explicit or deduced by the query engine affects only a fraction of a column, then only a fraction of that column’s index need be retrieved.

For low-cardinality data, the most commonly used index structures are embodied by : a) a list of occurrences within the column, which points to a series of b) a data structures that identifying occurrences of those values with the column. These create far smaller structures than the source data using only enumeration.

But what about high cardinality data, you ask?

For high cardinality data, whether textual or binary, Sybase IQ also enumerates these columns into index types that store these data efficiently, and explicitly use efficient decompression to reduce storage size. [If you find yourself asking if ”efficient de-compression” is an oxymoron, rest assured it’s not.]

Did I say compression? Yes I did. Sybase IQ does use low-cost LZW compression for all pages written to storage, achieving an additional benefit. LZW compression and decompression, when compared to storage latency, reduces the number of pages sufficiently that, in most cases, the cost of LZW de-compression is made up for by improvements in I/O performance.

To be specific, we measure only about 4 milliseconds per 128kb page to decompress on a typical modern CPU core using LZW. Yet, by doing so, we reduce the number of pages stored and retrieved such that storage latency, or its avoidance by reducing data volume stored, more than makes-up for the cost of the de-compression step.

The takeaway points:

Sybase IQ combines enumeration, an intelligent query processor with modest compression to achieve multiple benefits, without a ”visible” de-compression cost:

- Enumeration, like compression, can greatly reduce size of stored data, but without concomittant de-compression costs.
- Through enumeration, index structures are created that are directly accessible by the query processor and optimized to provide the query processor with the fastest-possible query plans
- The indexes into which column cells are enumerated can, by their design, permit selective retrieval, fetching only pages that contain data and parts of the index that are relevant to the particular query.
- Compression, while used, employes only a low-cost LZW algorithm, used for all pages, at a cost in terms of CPU time that is more than made up for by the reduction in overall I/O operations.

So, in reality, Mr. Citric is right, there is a cost due to de-compression in Sybase IQ and other column databases, but that cost is typically so small as to be washed-out by I/O latency improvements it makes possible. It is the enumeration, not compression that dramatically reduces the overall I/O, makes selective retrieval possible, speeds query processing time and slashes storage footprint.

Until next time,

Bill

Posted by Sybase IQ at 9:06 PM