The Global Analytic Appliance Leader

 





Welcome to the

Netezza Community
Thoughts from Inside the Box

« December 2006 | Main | October 2007 »

January 29, 2007
Issue 7: Partner User Conference Update

by Vishal Daga - Netezza, Director of Partner Marketing

"Give me golf clubs, fresh air and a beautiful partner, and you can keep the clubs and the fresh air." - Jack Benny, comedian, author and actor (1894-1974)

There are a few partner user conferences and groups coming up, which prompted me to reflect on ones we had attended at the end of 2006. Netezza participated in the Business Objects and SPSS annual user conferences and both of these events were quite successful - they sparked conversations and fostered introductions. Events like these are really beneficial because they provide a forum that lets us build greater awareness within our partner communities. Ultimately, this results in the development of stronger alliances and accelerates the development of more compelling joint value propositions that leverage the performance and/or simplicity of Netezza in new ways. I have provided some highlights from these events below:

  • Business Objects Insight Americas 2006: We had one of our appliances at this event and were able to showcase a concept solution that demonstrated what an integrated BI solution could look like. The particular solution we demoed packaged Business Objects Reporting and Analysis Applications along with Netezza's data warehouse appliance and delivered in one system a pre-integrated complete BI environment that was capable of addressing the needs of many mid-market customers. In addition, Durgesh Das, BI Manager at CompuCredit also presented a compelling case study as to why his organization selected Netezza. Durgesh touched on many user anecdotes around performance improvements and administrative simplicity that highlight the real impact of Netezza.

    The links below lead to video vignettes of CompuCredit users in different roles - Business Analyst, DBAs, IT managers - talking about the value of Netezza from their individual perspectives. Pretty compelling stuff!

  • SPSS Directions 2006: According to many, the next big thing in BI will be around data mining and predictive analytics - not just reporting on data but actually using historical data to predict what will happen in the future. Clementine, SPSS's data mining product, delivers optimized connectivity to Netezza today, so Clementine customers can really take advantage of Netezza's performance to tackle their predictive analytic needs. We had many engaging conversations with both Netezza and SPSS customers at this event who were looking to develop/deploy deeper more effective predictive analytic capabilities. Netezza and SPSS continue to work closely together to develop tighter integration capabilities that will help to put a new level of predictive analytic capabilities in the reach of enterprise organizations.
On to the next ones!

Vishal Daga

Share: del.icio.us Digg Furl ma.gnolia Netscape Newsvine reddit StumbleUpon Yahoo MyWeb  

Posted by Phil Francisco at 11:30 PM | Comments (0)

January 15, 2007
Issue 6: Spotlighting FPGAs, last of 3

Performance Multipliers for Data Stream Processing

Yes, star crossed in pleasure the stream flows on by
Yes, as we're sated in leisure, we watch it fly.

And time waits for no one, and it won't wait for me
And time waits for no one, and it won't wait for me.

Time can tear down a building or destroy a woman's face
Hours are like diamonds, don't let them waste.

Time waits for no one, no favours has he
Time waits for no one, and he won't wait for me.


- The Rolling Stones, Time Waits for No One (Jagger/Richards),
from the album, "It's Only Rock 'n Roll" (1973)

We've dedicated the last several postings to the Field Programmable Gate Array (FPGA) - a key performance multiplier in the NPS® system architecture. Last time out I talked about the market growth of FPGAs as a mainstream technology in multiple applications settings outside of data warehousing.

This is the last of a three-part series on FPGAs, spanning the following topics:
  • "So, What Is an FPGA?" - aimed at providing a most-basic introductory primer of the technology, its capabilities and its promise (posted 28th November).
  • "FPGAs in the Mainstream & Some of Their Practical Uses" - a look at the use of FPGA technology across a broad swathe of market applications. (posted 20th December)
  • "OK - How Does Netezza Get a Performance Edge from FPGAs & What Does the Future Hold?" - linking FPGA capabilities to the benefits it brings to the NPS system and possible future directions it could enable.

Today, we'll dive in a bit into how FPGAs enable high performance at low cost in the NPS appliance, and what types of applications the technology may enable for the NPS in the future.

OK - So how does Netezza get a performance edge from FPGAs?
A critical element of Netezza's architecture is the implementation of direct-attach storage in a massively parallel array of query processing elements. Called Snippet Processing Units (SPUs), these query processing elements collocate CPU, memory and FPGA with each disk drive. The SPUs are arranged in an array that can be as small as several dozen or as large as nearly a thousand in today's NPS systems.

A critical component of overall data warehouse performance lies in the disk bandwidth that can be applied to a given problem and in turn, the level of processing horsepower that can be applied to that data. In short-hand terms, Netezza refers to its architectural approach as "bringing the query to the data." Rather than moving vast amounts of data across high-speed interconnecting (and sometimes non-blocking) networks as other systems do, the NPS system reduces the data to the information essential to the query as close to the disk source as possible.

The focus of the architecture is to enable streaming processing of the data: eliminating unneeded data as early as possible and processing the rest as rapidly as it can be read from the disk drives. That's where the FPGA comes in. The FPGA in a Netezza SPU has two primary roles.

In the first, it acts as the disk controller, controlling all of the disk read and write activities on the SPU.

In the second, the FPGA efficiently applies low-level database primitives, offloading significant work from other processing elements in the system. As table data streams from the disk on the SPU, the FPGA applies the transaction visibility list (only transactions that were current in the database at the start of the query are visible to it) and then applies the appropriate column projection and row restriction rules. Then only data that satisfies the visibility, projection and restriction rules is sent from the FPGA to the memory and CPU on the SPU for additional processing, if necessary.

Adding to the performance boost provided by the FPGA in general, another important system feature known as "Zone Map" is realized in a software module of the NPS system known as the storage manager. We think of Zone Maps as an anti-Index in Netezza, telling the system what data not to read. For each numerical column, the Zone Map can take advantage of any natural ordering of the data in the table (e.g., date, customer number, order number, etc.) and reduce the number of data blocks read in response to a query to only those required. For example, if a query were looking for information about transactions that took place between the beginning of September and end of October, the Zone Map function of the storage manager would direct the FPGA to read only those data blocks containing records from September or October, thereby eliminating the need to perform a full disk scan for each query.

The FPGA implements the read of the appropriate disk blocks and additionally filters and projects only data relevant to the query. This can improve query-processing rates by two or more orders of magnitude.

FPGA as performance multiplier: an example
As an example, consider the following simple SQL query:

Select state, gender, age, count(*) From 8 billion Row Table

Where dob < '04/01/2000' And dob > '12/31/1999' And zip = 32605

Group by state, gender, age;

In this example, the storage manager and FPGA would use Zone Maps to first limit the disk read to only those disk extents with dates of birth occurring in the three-month period of January through March 2000, rather than the full table. Then, when the data was read from the disk, the FPGA would further restrict the rows of data returned to those records within the three-month range and a zip code matching the query and finally, the column data projected to the memory and CPU would be limited to only state, gender and age information of each record. If the table in question contained 100 or more columns for each record, this could represent less than 3% of the column data. If one assumes the table in question contained birthdate information for just the last seven years, this would dramatically reduce the row-count of data delivered to memory/CPU as well - specifically by more than 25:1, or 3 months out 84.

Overall, for this example, the combination of Zone Maps with FPGA projecting and filtering of the data would result in just 0.1% of the full table data being sent to the memory and CPU for additional processing.

From this, you can see how the FPGA acts as a Performance Multiplier for query processing. Before a single CPU cycle or RAM memory location has been used, the FPGA has reduced the overall data required for processing by as much as multiple orders of magnitude.

And what does the future hold?
As suggested by Keith Underwood of Sandia National Labs, the price-performance and power efficiency look like they will enjoy an order of magnitude advantage over the 'x86' CPU technology roadmaps by the end of the decade. Using its performance and I/O advantages, FPGA vendors are already able to embed CPU core technology (Xilinx - "Embedded Processing" & DSP-FPGA.com - "FPGAs - Poised to play in embedded applications") directly inside an FPGA.

Projected FPGA Roadmap Capabilities

Source: Composite of FPGA Vendors' Historical & Roadmap Data

We at Netezza fully expect the FPGA advantage to increase over time. Based on suppliers' and research technology roadmaps, by the end of the decade we are anticipating 5X enhancements in each of the following areas:

  • cost
  • available logic
  • functionality per unit of power
  • speed

Xilinx' Powerful Virtex2Pro FPGA

Source: JPL/NASA Tech Brief, p. 12

The result will be extended, differentiated functionality introduced into current and/or future versions of FPGA technology, further increasing the price/performance and capability advantages of the NPS data warehouse appliance. Possibilities for expanded functionality include, but are certainly not limited to, in-line, streaming data compilation or encoding, advanced filtering and analytic logic operations ("Legacy FPGA Designs Can Be Migrated to Achieve Better Performance"); and even much more powerful pre-processing of query data by embedding CPU processing capabilities directly within the device ("FPGA Advances Pave The Way Toward True SoC Solutions"). If, how and when these may be manifest in the Netezza technology roadmap is still to be seen. However on the strength of the FPGA technology roadmap and the technology's significant benefit to the streaming processing needs of data warehousing, it's clear to us that the FPGA will continue to play a major role with Netezza for the foreseeable future.

The technology trends for high-performance systems is clear. In more and more industry domains ("In Praise of FPGAs"), low-power programmable logic devices are going to act as either performance accelerators or even the primary performance engine. By offering high performance, low power requirements and highly-flexible reprogrammability, the use of FPGAs promise to continue as a strong industry trend.

In short, we believe that the advantages that FPGA technology brings to the NPS system have 'legs'. We plan to continue to exploit those advantages for the benefit of our customers and don't intend to hide them "under a bushel" any longer.

Share: del.icio.us Digg Furl ma.gnolia Netscape Newsvine reddit StumbleUpon Yahoo MyWeb  

Posted by Phil Francisco at 11:00 PM | Comments (0)

January 1, 2007
Issue 3: What Is a DWA, Anyway?


Data Warehouse Appliances: definition & evolving place in the market

"It depends on what the meaning of the word 'is' is." - Bill Clinton, President of the USA (1993-2001)

As our first act of commenting on the industry, we'd like to address a topic that has seemingly stirred up quite a bit of emotion and controversy of late: just what IS a Data Warehouse Appliance (or "DWA" for the acronym-inclined)?

But first, the punch line: it's not definition of what a DWA is that matters, but - taking things a bit further - what deploying a DWA will mean to customers who use them in their analytics and BI scenarios.

Plenty of opinions to go around
In the world of BI and data warehousing, if there's one area that's nearly become an industry segment unto itself these days, it is the field of those industry analysts, pundits and other experts trying to define just what a "Data Warehouse Appliance" really is.

It's no wonder. Over the past three-plus years, the Data Warehouse Appliance market has blossomed, indeed. It has become a significant and growing segment of the data warehouse systems market. Since Netezza's initial entry & coining of the terminology for this space in 2002/2003, a number of new entrants (from industry behemoths to the smallest, new start-ups) have tried to stake their claims to it. Enter the industry pundits, to help us all by defining and making sense of things.

A growing market segment that's "here to stay"
Claims from long-standing incumbent data warehouse systems providers notwithstanding, I would hazard that in analyzing clippings from before 2002, you would be hard-pressed to find any references to a DWA in the media or analysts' market predictions about the future. Certainly, companies have built systems expressly for use as data warehouses in the past but my searches have not revealed any claims on the notion of an "appliance" in this space before the dawn of the 21st century.

Now we have an established and growing market category for data warehouse systems. According to IDC's Dan Vesset, "IDC expects the market for DW appliances to grow at a CAGR of 70% over the next 5 years from the estimated 2005 level of $75 million."

How does Netezza see the definition?
Today, with the definition of a data warehouse appliance is seemingly crying out for clarity, with a growing number of vendors' marketing claims making things more hazy. As the pioneer and the recognized global leader in the DWA market with over 75 paying customers under our belt, when it came to defining just what a DWA is we felt, "Who is more qualified than us?" - so we decided to weigh in with our views, as follow.

We define data warehouse appliances as follows:
  • Purpose-built for performance - from a single vendor; combining server, database, storage and network in an architecturally-integrated system built specifically for high-performance data warehousing. This includes dedicated hardware for processing large data volumes faster than any other data warehouse solution in the market.
  • Simple to use - like a kitchen appliance, this should be dramatically easier than traditional systems. Easy to install, deploy and maintain - with installation in hours and the ability to have a large DW up and running in a day or so. No tuning, indexing, partitioning, aggregations, etc. required.
  • Low acquisition and ongoing costs - appliances are just less costly to own and maintain - even for a large EDW implementation of 100 terabytes or more.
  • Enterprise compatibility - high availability; plug n' play integration; standards-based interfaces; fully integrated with all major Data Integration, Business Intelligence and advanced Data Analytics vendors.
  • Low power, cooling and space consumption - delivering high-performance in a compact footprint without blowing your data center's budget for electrical power and without forcing your IT director to implement "skip-a-row" equipment patterns to manage the data center cooling.

The key operable point here is that a DWA is fundamentally performance-driven. It allows businesses to have more clarity and more depth of analysis across ALL of their data much faster than they have been able to in the past. The fact that a DWA also delivers simplicity and economy is putting that performance well within reach for most enterprises.

Simply put, a true data warehouse appliance will put the high-performance of a super computer into an enterprise's data center at a cost-effective price point. And it will do so with an ease of installation, use and maintenance that will make much more powerful analyses and more rapid development of ideas possible than other systems can provide.

DWA Measures - according to the experts
A recent TDWI survey indicated a majority of members surveyed understand that a data warehouse appliance is defined as server hardware and database software built specifically for data warehousing - not just a bundle of commodity hardware and generic software - and that the benefits of this approach are greater performance and lower cost. But there have been many attempts to define and measure the "goodness" of DWAs. The table below contains but a few.

Furthermore, Robin Bloor and Philip Howard of the Bloor Group have set off down a path to make the definition and benefits of various DWA approaches more clear - aiming to do so even more completely in early November.

Source
Characteristics
Benefits
Philip Russom
TDWI
Survey Results:
  • Server h/w & DBMS s/w built specifically to be a DW platform (53%)
  • Any server h/w & DBMS s/w bundled to create a DW platform (14%)
  • Either definition (13%)
  • Don't know (19%)
  • Pre-tuned for DW Use
  • Fast Query Performance
  • Reduced System Integration
  • Fast Installation
Dan Vesset:
IDC
Two Primary Types for Data Warehouses:
  • Complete Stack DWA
    (combined h/w & DBMS s/w)
  • Virtual DWA
    (DBMS s/w bundled with clustered commodity h/w)
  • High Performance & Scalability
  • Lower Total Cost of Ownership
  • Lower Maintenance Costs
  • Highly Scalable Business Analytics Platform
Dan Linstedt:
TDWI/Myers-Holum (Mar06) & TDWI/Myers-Holum (Sep06)
Multiple entries, but most recently:
  • Web-based Thin client GUI admin
  • API for reporting, logging, admin, etc.
  • Embed s/w at h/w & firmware levels
  • Capable of transformation, data mining, loading & reporting
  • Notify admins & end-users of suspected security breaches
  • Web-enabled firmware updates
  • Truly plug & play
  • NOT part of a cluster, IS part of grid
  • Self-contained
  • Nine 9's uptime
  • (Near) linear scalability
  • High availability
  • Fast loading
  • Compression & Encryption
  • Plug & Play MPP units
  • SQL query interfaces
  • Super fast data access
  • Low cost per TB options
  • Plug & play fail-over
  • Automatic self-updating
  • Remote monitoring
  • Compliance for data
Charles Garry:
DMReview
  • Combined price/performance of...processors, open-source software and low cost disk storage in a single cabinet
  • Purpose-built with massive #s of CPUs to handle analysis against terabytes of data quickly and simply
  • Total Cost of Ownership: The Key Differentiator
  • Faster time-to-production & time-to-value
  • Easier maintenance with "Load and Go" simplicity - with no required physical db design, tuning, hints or indexes
Kim Stanick:
Baseline Consulting
  • Packaged solution of h/w & s/w that is pre-configured to perform DW workloads consistently well, out of the box
  • Acquired as a single unit rather than a collection of components to be assembled
  • Communicates via open standards (i.e., ODBC & SQL-92)
  • "Pre-integrated high performance": engineered for optimal performance on typical DW workloads
  • Enables enterprise IT group to offload engineering & tuning burden to the DWA vendor's design
  • "Data warehousing hitting its stride" means DWAs appeal to a broader set of companies
  • Just like the evolution of the auto: "You can build your own car, but most people don't because they are readily available, affordable and get the basic job done."
Mike Schiff:
TDWI/MAS Strategies
  • Pre-integrated h/w, DBMS s/w & storage
  • Optimized for very rapid query & retrieval
  • High performance
  • Low-cost
  • Quick to implement
  • Ease of use
  • Reduced DBA Support Requirements
  • "A proven offering"

Why not bundles or "balanced" blade-servers?
Simply grouping multiple systems in loose affiliations won't really answer the mail here. The inefficient movement of huge blocks of data for analysis adversely limits performance; and the complexity of managing disparate systems and each of their upgrade and compliance paths alone will make this approach difficult to manage. But so will the fact that the systems will evolve independently and not necessarily in alignment with one another.

Unless it is rearchitected to specifically address data warehousing, a "shrink-wrapped" bundling of products from among a major player's broader suite of systems will be similarly performance and operationally limited. And it too will have to deal with the effects of each product's evolution pulling in a different direction.

Where is this all going & why does the DWA definition even matter?
The real issue of course, is that, to enterprise customers, the "true definition" of a DWA doesn't really matter at all; what matters is the impact that taking a DWA approach to their data warehousing needs can have on their businesses.

What we've seen from customers' use of the NPS product family is that DWAs are changing the way businesses use their warehouse data today and in the near term, including the following -

  • enabling deep, unconstrained analytics on all of their business data, even in extremely busy mixed-workload scenarios;
  • changing the way they think about the staffing to support it and opening up the development of whole new advanced analytics applications;
  • changing the way they purchase data warehouse infrastructure; and
  • helping mid-tier business solve critical data warehouse needs in compact, fully-contained business solution appliances.

In the longer term, DWAs will fundamentally change the way people operate their businesses.

Look for us to provide more on this and other of our views about the future of DWAs in upcoming postings.

Share: del.icio.us Digg Furl ma.gnolia Netscape Newsvine reddit StumbleUpon Yahoo MyWeb  

Posted by Phil Francisco at 1:30 PM | Comments (0)


© 2007 Netezza Corporation | Legal | Privacy | Safe Harbor | Site Map