BeyeBLOGS | BeyeBLOGS Home | Get Your Own Blog

« Columns & Rows | Main | Netezza's EMC deal »

May 19, 2008

Netezza

Netezza was the first disruptive data warehouse appliance vendor to hit the market.

When we started out in 2003, we took the position of saying that we were very similar to Netezza. However, over the last few years, our strategies have diverged a little. I therefore think it might be useful to explain the differences between the two products in an effort to help potential customers make a more informed choice.

Key Differences between Netezza & DATAllegro

Over the course of the next few months, I'll be going into detail on the differences between DATAllegro and Netezza. To kick things off, here's a brief summary:

The above spider chart rates DATAllegro and Netezza against other products in the market. The outer edge of the chart is "best in class". You might think that I've been self-serving in choosing the categories, but this is my blog after all! Let's go through each category so I can explain my reasoning:

Non-Proprietary HW

Netezza's snippet processing units (SPUs) are entirely proprietary, since they consist of a general purpose CPU, an FPGA and a hard disk drive on a custom blade. Netezza has occasionally made the argument that the underlying components are standard, off-the-shelf parts. While that's undoubtedly true, it's also true for pretty much any other piece of proprietary hardware. The only pieces of non-proprietary hardware in a Netezza appliance are the head units (standard HP servers) and the GigE network switches. As a result, Netezza scores very low in this category.

In contrast, a DATAllegro appliance uses completely standard servers (from Dell or Bull), storage (from EMC) and networking (from Cisco and Qlogic). Hence we get a best in class score in this area.

"DATAllegro: This firm's open, hardware-independent approach to the data warehouse appliance is catching on." Intelligent Enterprise, "36 Companies to Watch," January 2008.
Read Article

Non-Proprietary SW

Again, Netezza fares poorly in this category. While they started out by leveraging the Postgres open source database code, they've effectively rewritten most of the system at this point. The SPUs don't even run a mainstream OS such as Linux (although the head units do). Finally, Netezza’s software that runs on the head units is entirely proprietary.

Footprint

Netezza has basically one product line, with each rack holding 112 SPUs and 12.5TB of user data (with optional compression, this goes up to around 25TB).

DATAllegro takes a slightly different approach, with two different options in our range of data racks. Our DR1530 offers up to 30TB per rack and is therefore slightly better in terms of user data footprint than Netezza. We also offer a DR200 for very large-scale systems that offers 200TB of user data per rack. We're therefore better on this metric.

Watts per TB

Our DR1530 is slightly worse than Netezza on power consumption per TB, but our DR200 is significantly better, so I'll score us even on this point.

Price per TB

The latest information we have is that Netezza is list priced at around $100k per TB, which puts them in the mid-range of comparable offerings. Depending on the data racks used, our list prices are between $8k and $50k per TB, which is clearly substantially cheaper than Netezza.

Install Time

Netezza used to have a significant advantage over us in this area. However, our v3 product, which became available a little more than year ago, closed the gap and can now be installed in just a few hours.

Physical Design

Physical database design is very straightforward in Netezza. The DBA has to simply decide which column to distribute each table on across the SPUs. As data is loaded, the Netezza system automatically populates zone maps, which are effectively statistics for each 3MB page in the table. The query optimizer makes use of the Zone Maps when deciding which pages to read in order to satisfy each query.

There's no doubt that we lag behind Netezza a little in this area, although we are catching up fast and are already ahead of all of the other contenders. Like Netezza, we don’t generally require indexes (although they are available). Also like Netezza, the DBA must choose a distribution column to spread each table across the nodes. In addition, the DBA must decide how to set the multi-level partitioning up on our system. This is usually very straightforward and familiar to any experienced DBA. For example, in most installations, the fact tables will be date ranged and then hash partitioned by the foreign key of the largest dimension table. Other tables will typically just be hashed on their primary key. The process is simple and usually just takes a couple of hours. Changing the partitioning scheme is also generally very straightforward and very fast.

ETL & BI Compatibility

Both products are essentially the same in this area - i.e. they work with all mainstream BI and ETL tools.

Scalability

This is an area where DATAllegro has a big advantage. Whereas Netezza currently maxes out at 200TB, we already have production systems that exceed 400TB. In addition, our DR200 racks can be used to build data warehouses of more than 10PB of user data at very low cost.

Sequential Performance

Due to the way our compression code works, DATAllegro’s current products are optimized for performance under heavy concurrency. The end result is that we don't use the full power of the platform when running one query at a time. This can be a problem in proof of concepts against Netezza, since the first results people often look at are simple sequential query runs. There's no doubt that Netezza is very strong in this area. However, we don't feel this reflects real-world workloads.

Mixed Workload

In contrast to our performance under simple sequential query runs, our platform performs extremely well under a complex mixed workload with heavy concurrency.

There are several reasons for this:

  1. The compression code in our platform is optimized for use under concurrency.
  2. The Infiniband backbone in our appliances uses minimal CPU power for data movements, compared with the GbE connections used in rival appliances such as those from Netezza. As a result, there's more CPU power left over for running queries.
  3. InfiniBand can also move data around 10 times faster than GbE, which is a huge advantage for some complex queries.
  4. Our sophisticated use of multi-level partitioning and clever workload management allows us to run a mixture of short queries and long queries very efficiently and with very consistent query times as seen by users.
  5. Netezza's FPGAs run out of silicon real estate at around 16-20 concurrent queries. The end result is less consistency in query run times as the system comes under load and starts queuing.

The end result is that we're consistently beating other platforms when running a complex workload. Even Teradata can't compete with us on this metric.

"One [differentiator where DATAllegro] does well is in situations of mixed workloads, where as well as queries there are concurrent loads and even updates happening to the database." Andy on Enterprise Software, "A Lively Data Warehouse Appliance," February 2008.
Read Blog

Load Speeds

We've experienced consistently faster load performance than Netezza in all recent POCs, especially in near-real-time scenarios.

Distributed DW

In most large enterprises, data warehousing is a distributed problem - business units need to create data marts that meet their own requirements and SLAs while fitting in with data governance and other requirements that cut across the entire enterprise. Until recently, building an effective, large scale, distributed DW to match the shape and needs of the business was impossible, due to the low speed of data movement between centralized hubs and business unit specific spokes.

Last year, DATAllegro introduced its grid technology that provides centralized metadata management for a collection of appliances, together with very high speed, parallel data movement between the appliances. The technology has already been deployed successfully in a number of very large-scale enterprise DW implementations.

No other DW vendor has an answer to this challenge.

"All together, the hub-and-spoke grid approach is a concept that puts DATAllegro on a different playing field than other database vendors. Rather than trying to build the single fastest database system, this approach focuses on building the most effective enterprise data management infrastructure, which is ultimately more important than the single fastest system." Tom Briggs, Full Table Scan, "Getting to Know DATAllegro, Part II," May 2008.
Read Blog

In DBMS Analytics

Netezza stirred up the market last year when it announced the availability of in database analytics - effectively allowing third-party code to run on the SPUs, thereby taking advantage of the massively parallel architecture.

The amusing part of this is that user defined functions (UDFs) have been available in most database products for years - even in MPP systems such as DATAllegro. Hence, Netezza's 'innovation' in this area was more in the area of marketing than actual technology. Having said that, they have backed up their UDFs with some interesting tools and partnerships, so I'll give them a lead in this area.

Checking it out for yourself

The above analysis is obviously somewhat subjective and yes, I'll admit it, biased. However, there's an easy way to find out how the two products stack up against your requirements and that's to run a proof-of-concept using your own data and queries.

Adding DATAllegro to an existing POC adds very little effort to the overall process from the customer's perspective, since we do all of the work. Also, we've seen Netezza respond to competition from DATAllegro with some very heavy discounts, so you might save yourself a lot of money, no matter whom you choose in the end!

For materials related to concepts discussed in this blog entry, click the following links:

White Paper: Hub-and-Spoke: Getting the Data Warehouse Wheel Rolling
White Paper: Data Warehouse Appliances: The Benefits of an Open, Non-Proprietary Platform
White Paper: Using Grid Technology to Build a Hub-and-spoke Data Warehouse Architecture

Posted by DATAllegro at May 19, 2008 10:21 PM

Trackback Pings

TrackBack URL for this entry:
http://www.beyeblogs.com/mt/mt-tb.cgi/125

Comments

Congratulations on your Blog.

Some key concepts reviewed are non-proprietary SW and HW plus TCO related to performance.

Moore's Law, although 40 years old continues to fuel innovation and technology competition from companies like yours.

Keep up the good work. Looking forward to version 3.2

Posted by: Lee Martin at May 23, 2008 3:33 PM

As you say very biased indeed !

I'm not sure I would ever trust one vendors comparison of anothers technology :-)

Surely the proof of the pudding is in the success of the technology in the market place, DATAllegro has been around for a while now and while I agree the technology has appeal I don't see this materialising into actual customers.

I only ever see TEOCO mentioned as a Datallegro customer yet Netezza has a long list of blue chip customers, do you have anymore customers ?, if so how many and when are you going to publicise your list customers so that you can gain some market credibility ?

Posted by: Harry at May 24, 2008 10:13 AM

Stuart,

DATAllegro has always frustrated me with it's lack of customer information or references. In stark contrast the Netezza website is full of customer lists, quotes and testimonials.

Posted by: Brian Ganly at May 28, 2008 9:25 AM

Many of DATAllegro's customers consider our product a competitive advantage, and therefore request not to be disclosed. Here's one exception.

http://www.dbms2.com/2008/05/23/data-warehouse-appliance-power-user-teoco/

Posted by: Stuart Frost at June 11, 2008 5:40 PM

Post a comment




Remember Me?