The Global Analytic Appliance Leader

 





Welcome to the

Netezza Community
Thoughts from Inside the Box

« January 2008 | Main

April 28, 2008
Issue 19: The Compress Engine - The Netezza Philosophy



"‘To be is to do.’ - Immanuel Kant
"‘To do is to be.’ - Jean Paul Sartre
"‘Do-be-do-be-do’ - Frank Sinatra"

— Kurt Vonnegut, Jr. (Nov 1922 - Apr 2007)


In the news today: the Compress Engine
In 1783 Immanuel Kant wrote, "David Hume woke me up from my dogmatic slumbers," and revolutionized the way humanity thinks about metaphysics. Almost 220 years later, Netezza set out to achieve a similar goal — redefine analytics. When the first NPS® data warehouse appliance was introduced, the market released itself from yet another dogmatic slumber and realized that there is a different, better way to do data warehousing; a way without compromise, a way without limits.

Netezza has helped to reenergize the data warehouse market in creating and leading the data warehouse appliance category.

  • "Every time you turn around you see another industry that's facing a tidal wave of data and they need to understand what this data is saying. Many of them have data volumes in this range that they haven't been able to afford to analyze, as much as they'd like to. ... [Netezza] can deliver that analytic capability, and at a very attractive price." - Richard Winter, Winter Corporation, from Netezza will scale its appliance to petabyte range, InfoWorld (January 2008)

  • "This is what Netezza has done in the data warehousing market: it has totally changed the way that we think about data warehousing... So the bottom line is not just that Netezza’s entry into the market was a black swan event but that that event has not ceased to unfold." - from Netezza: a black swan by Philip Howard, Bloor Group (October 2007)

  • "Appliances are here to stay and are revolutionizing the data warehouse industry." - from Business Analytics Appliances Are Here to Stay, by Dan Veset, IDC (June 2006)

  • "The term data warehouse appliance was coined by Netezza, and this vendor has blazed a trail by proving the concept and educating the market." - from Defining the Data Warehouse Appliance, by Philip Russom, TDWI (August 2005)

Since 2002, Netezza has been repeatedly breaking the latency barrier and challenging the boundaries of data analytics. Since our first release, we have been continuously refuting the alleged mutual dependencies that became the building blocks of the industry’s dogmatic misconceptions; namely the expensive nature of performance, the necessary complexity of the analytics architecture and the unavoidable limits of scalability. With today’s announcement of the Compress Engine, Netezza disproves yet another myth — the inverse relationship between data compression and query performance.

The architectures of traditional data warehouses, steeped in a legacy of serving OLTP applications, were not designed to handle the ever-growing amounts of data combined with larger and more complex user workloads and shrinking data latency requirements that characterize the modern enterprise. Regulatory compliance, electronic commerce and the need to process and analyze all data in a matter of seconds has pushed the capabilities of traditional data warehouse systems to their limits. In reaction to the data capacity pressures, vendors introduced compression; not as an enhancement but as a compromise solution that allows for further data growth at the cost of processing performance.

Traditional compression approaches, used by several of the competing data warehouse vendors, typically result in performance degradation to accomplish the compression effect. Netezza’s addition to the FPGA-Accelerated Streaming Technology (FAST) Engines framework - Compress Engine - utilizes its innovative streaming architecture™ not only to increase the system’s storage capacity by 2-4X but actually boost overall streaming query performance by a factor of about 2X (100%). All this is achieved without requiring any tuning or administration, and it is in fact a software-only upgrade that enables Compress Engine on the Netezza appliance.

It’s actually really cool technology, obviously something we love to rave about. Late last year, I wrote about FAST Engines in this blog. We’ll use that as a starting point and dig a level deeper into how Compress Engine works. I’m sure it will tickle the fancy of the geek in you!

Compress Engine Columns

The NPS system employs a patent-pending method for compiling (yes, compiling) columnar data in all the tables of the database as it is being written to disk e.g. during load, insert or update operations. The process converts row-based data into column streams that are independently compiled to replace the original data in the columns with a stream of "instruction sets" for the FPGA. The "instructions" themselves are much smaller in size than the data they replace, resulting in a highly compressed data stream emerging from the process.

While the compression occurs on columnar data because of the inherent compressibility within database columns, the compressed data is reassembled in rows before being written to disk. Row-wise storage of tables avoids the data scan complexity associated with columnar stores and ensures that scanned data can be efficiently parsed and processed without the need to reconstitute it from multiple sources. The compressed data uses disk much more efficiently and increases the data density of NPS systems by 2-4X - in some cases substantially higher - allowing customers to scale their NPS data warehouse systems into the hundreds of terabytes of user data.

But if the NPS system’s data compression and scale brought the system’s performance to its knees or severely limited performance speedup due to compression (as it does on many of those other systems), it wouldn’t be so great, would it? The beauty of the Netezza way of providing data compression is that not only does it have no negative impact on performance, but it actually increases query performance by up to 100%!

Compress Engine Flow

As the compressed data is read off the disk, it is passed through the Compress Engine which applies the instructions embedded in the data stream to restore it to its original form. Our compilation algorithm ensures that this decompression process can be performed entirely in silicon, at wire speeds. Each physical block scanned from the disk can mushroom into 2 to 4 or more times its size in memory without incurring any overhead in processing time — i.e. 2 to 3 times more data is scanned in the same amount of time without any increase in system hardware! Our internal benchmark testing reflecting real customer configurations and workloads has shown an overall 2.2X increase in streaming query performance through the use of Compress Engine.

This software-only enhancement, enabled by our unique architecture, is only the beginning. As we continue to develop our platform, we are investigating further enhancements to the Compress Engine or the addition of new FAST engine(s), aimed at directly increasing streaming performance on the NPS system.

Our philosophy and aim is to continue to shake the industry out of its dogmatic slumbers by extending the price/performance advantages of our products; showing that there’s a different way to do data warehousing and advanced analytics. One where performance and scalability are neither the result of expense nor complexity, where you can get more performance from compression, where you do have the power to question everything™ ...


Share: del.icio.us Digg Furl ma.gnolia Netscape Newsvine reddit StumbleUpon Yahoo MyWeb  

Posted by Phil Francisco at 8:30 AM | Comments (0)

April 21, 2008
Issue 18: Teradata's "Me-too" Model 2500 – welcome to the Data Warehouse Appliance club ...finally



"Imitation is the sincerest of flattery."
Charles Caleb Colton (1780-1832), from his Lacon, Vol. I, published in 1820

Welcome to the Data Warehouse Appliance club -
another validation of an important, growing market segment

Well, well, well! "Only" eight years after Netezza coined the term and invented the market segment, Teradata today finally officially entered the Data Warehouse Appliance market. Though it’s a bit late, and certainly behind a number of other vendors, perhaps today’s entry will put an end to Teradata’s vacillating over whether they 'invented' the concept or not, were an appliance or not, or whatever. In the past couple of years, it seems Teradata spokespeople have gone out of their way to say their product was simultaneously a data warehouse appliance and absolutely not one — even booking appearances on panels of data warehouse appliance "vendors". Certainly their announcement is another validation that the role of Data Warehouse Appliances is an important and growing one not only in the current market, but for the future as well.

Derivative Marketing and a "Repackaged, Warmed-over" Product?

Netezza: Performance, Value, SimplicityTeradata is positioning this new product as being, "simple, powerful and cost-effective" — which to our way of thinking sounds much more than a little derivative from Netezza’s long-standing value proposition: "Performance, Value and Simplicity", but I’ll leave it to the reader to decide if you think so. Our reading of the Teradata announcement sounds like just another larger vendor’s "repackaging" alternative to respond to the competition. Like others before them such as IBM and Oracle, it appears that with the 2500 model Teradata has done nothing more than cobble together a collection of elements from the company’s model 5500 systems, repackaged and sold as an appliance.

Powerful. Um, How’s That Again?

And while anyone who is serious about the appliance segment of the data warehouse market (like Netezza) has focused on delivering systems that can scale to highly complex, enterprise-wide, high performance systems, we think the 2500 will struggle to deliver even modest performance for just 6 TB in a single equipment rack.

While Teradata is quoting just over 6 TB of user capacity per two nodes in this new system, let’s remember that they have been advising customers for the past year not to put more than 1.5 TB against each of those same dual-core CPU nodes. Which is it? Is the 2500 underpowered for its 6 TB data capacity per dual-node rack, or has Teradata been advising its model 5500 customers to pay at least 2X too much for their data warehouse systems for the past year?

Time will tell whether Teradata has made other compromises to the 2500 model in an attempt to limit its impact on its flagship products (5500 and the new 5550). Beyond its underpowered nodes, have they sacrificed anything else like workload management or system availability, or even the system's ability to handle highly-interactive, operational applications? As the days and weeks help raise the shroud covering the model 2500 further, we’ll know more. For now though, it just feels like "me-too" imitation.


Share: del.icio.us Digg Furl ma.gnolia Netscape Newsvine reddit StumbleUpon Yahoo MyWeb  

Posted by Phil Francisco at 9:30 AM | Comments (2)


© 2007 Netezza Corporation | Legal | Privacy | Safe Harbor | Site Map