The Global Analytic Appliance Leader

 





Welcome to the

Netezza Community
Thoughts from Inside the Box

« November 2006 | Main | January 2007 »

December 20, 2006
Issue 5: Spotlighting FPGAs, part 2 of 3

Performance Multipliers for Data Stream Processing

“Time is but the stream I go a-fishing in.”Henry David Thoreau, (1817-1862) from Walden (1854)

We’re taking a little time to discuss an important facet of the NPS® system architecture – the Field Programmable Gate Array (FPGA), which is a critical performance multiplier in the context of the NPS appliance. Last time out I spent some time discussing what an FPGA is and what its general role is in stream-based processing.

This is the second in a three-part series on FPGAs, spanning the following topics:
  • “So, What Is an FPGA?” – aimed at providing a most-basic introductory primer of the technology, its capabilities and its promise (posted 28th November).
  • “FPGAs in the Mainstream & Some of Their Practical Uses” – a look at the use of FPGA technology across a broad swathe of market applications.
  • “OK – How Does Netezza Get a Performance Edge from FPGAs & What Does the Future Hold?” – linking FPGA capabilities to the benefits it brings to the NPS system and possible future directions it could enable.

Today, I’d like to look at other mainstream applications of the FPGA and its growing role in the market.

Recent News About FPGA Adoption
The applications space for FPGA technology is expanding rapidly. Nearly concurrently with my last posting, William Fellows of The 451 Group (subscription required) published an article discussing a segment of the FPGA applications market very different from data warehousing that virtually makes the case for this series postings. Here are a few snippets of Mr. Fellows’ 28th November article (emphasis mine):

[snip]
“At a high level, FPGAs offer massive parallelism, have a high GFLOPs potential, and their technology curve exceeds Moore's Law. FPGAs offer application-specific acceleration. They can be (re)programmed at the factory or (as the name implies) in the field at a customer site. They are essentially specialized functional units that solve a specific problem, and are typically implemented as an algorithm-specific compute device – as a coprocessor to a conventional CPU motherboard.

[snip]
“In one sense, FPGAs are very much a commodity item and already have a range of uses. The ability to reprogram FPGAs has led them to be widely used by hardware designers for prototyping ASICs, for software-designed radio, aerospace and defense systems, medical imaging, bioinformatics and more. And while FPGAs are inherently about 10 times slower than CPUs (clock speeds are typically 150-200MHz), they can offer up to 100 times performance improvements on calculations optimized to run on them.

[snip]
“So why FPGAs, and why now? One view is that more sophisticated technology means financial services organizations are more susceptible to being arbitraged, and FPGAs can help prevent this, or trade to take advantage of it. UK consultant Detica has a couple of FPGA engagements at the very earliest stages of evaluation. Its pitch is that FPGAs can handle algorithmic trading processing requirements on the order of 150,000 transactions per second. It argues FPGAs are also a natural way of handling streaming environments. For example, they are already widely used for voice and video streaming.

[snip]
“[Chris] Swan [VP, IT R&D at Credit Suisse] argues FPGAs could be a tempting solution because, by encoding straight to dedicated hardware rather than running on general-purpose CPUs, they make possible performance levels that are three orders of magnitude better. In addition, because this applies more to performance/watt than performance/chip, it could potentially collapse a 1,000-node grid into an expansion card within a trader's workstation.”
[snip]

FPGAs in the Mainstream
It’s important to note that this is no small, specialized technology market, but one that is very much in the high-performance and consumer mainstream. Recently Bryan Lewis, a vice president and chief analyst at Gartner Dataquest projected (“Gartner: FPGA/PLD market to grow 14 percent in '06”) that the market for FPGAs and PLDs will grow from $3.2B in 2005 to $6.7B by 2010 – including 14% growth in 2006 and another 18% in 2007. Overall they project approximately a 16% compound annual growth rate through the end of the decade.

“FPGAs have had the fastest growth for the last five years and that will continue for the foreseeable future” – Bryan Lewis, Gartner.

Some of the Mainstream/Practical Uses for FPGAs
The FPGA has blossomed in recent years to take on a key role in driving cost-effective high performance in a broad sweep of applications (Altera, Lattice Semiconductor & Xilinx). Here are just a few:

And with the NPS data warehouse appliance from Netezza, FPGAs are also playing an important role in delivering high performance in low-power, cost-effective data warehouse systems.

Common Traits and Common Benefits
What the Netezza implementation and the great majority of the uses mentioned earlier share is that they deal with data streams; filtering or performing functions on data as it streams through the device at high speed. The FPGA is able execute its functionality on the data, without interrupting the flow of data through them – in effect, adding an in-line performance boost. In this way, they can greatly accelerate performance in these applications over “brute force” CPU-based processing.

Conversely, these systems tend not to rely on the FPGA for recursive algorithms or processing requiring access of data from cache, memory or disk in non-sequential modes that might be more well-suited to CPU technologies.

Another trait many of these FPGA implementations have in common is that the very “field programmability” of the FPGA device gives the implementations themselves a high degree of design agility – allowing for easy and fast reprogramming of the functionality of the devices. In addition, all of these applications lean on FPGA technology for performance at low power – making possible highly scalable high-performance solutions without breaking power and cooling budgets in the process. This was evident in the commentary by Chris Swan of Credit Suisse in the above 451 Group story quotes and in the discussion of the 1000-node MPP RAMP project in my previous posting. Particularly in networking, HPC and data warehousing, the FPGA provides speed and low power consumption combined with design agility – essentially supporting reprogramming of the functionality of the FPGA at start-up time.

FPGAs and Their High Performance Punch
But one of the most important uses of the FPGA technology is as an application-specific performance enhancer. In HPC, FPGA technology is typically used to provide a performance boost. SGI® touts the use of reconfigurable performance of its RASC™ module as a performance accelerator (emphasis mine).

"SGI® RASC™ (Reconfigurable Application Specific Computing) technology leverages the power of FPGAs which utilize gate array technology that can be reconfigured by the user for optimal performance on a specific algorithm. Unlike traditional processors, which are serial in nature, FPGAs are inherently parallel, allowing multiple functions to be performed simultaneously. Therefore, users whose applications spend a majority of their run time working on a set of specific algorithms can dramatically reduce application run time by custom configuring the RASC module to accelerate application run-time. This reconfigurable technology is particularly beneficial when running data-intensive applications critical to oil and gas exploration, defense and intelligence, bioinformatics, medical imaging, broadcast media, and other data-dependent industries."

Here’s a report on the effectiveness of FPGAs for supercomputing from an April 2005 FPGA Journal article discussing the new Cray XD1:


The Cray XD1, one of the latest innovations from the world's best-known supercomputer manufacturer, leverages Xilinx's FPGA technology to provide massive algorithm acceleration through hardware-based implementation of compute-intensive algorithmic tasks. While we in the editorial community were idly debating whether FPGAs might be useful as reconfigurable computing engines after all, Cray was busy at work back in the lab building the thing.

"We are continually researching new ways to gain greater application performance for our customers," says Geert Wenes, business manager responsible for emerging markets at Cray. "With the Cray XD1 direct connect architecture combined with the new generation of FPGAs, we saw an opportunity to gain orders of magnitude speed-up for some of our customers' most challenging applications. Applications that are highly parallel on a fine-grained level and spend much of their computation time on integer and fixed point calculations, such as adaptive optics simulations, seismic imaging, or even molecular docking applications in life sciences stand to gain 10 times or more overall application performance improvement with FPGA application acceleration. In many cases, such speed-ups are necessary to make the application a viable one for our customers."

And there are numerous others, even including Advanced Micro Devices and its use of accelerating co-processors based on FPGA technology with the current AMD64 line of processor boards. Here are a few snippets of a 19th June Electronic Engineering Times article on the topic ("Programmable chips rev critical algorithms"):

"The first two companies to offer socket-compatible coprocessors for AMD64 Opteron processor sockets, DRC Computer Corp. and XtremeData Inc., are delivering programmable solutions that can accelerate time-critical algorithms. These coprocessors leverage the flexibility of Xilinx and Altera FPGAs, so that they can be configured to accelerate graphics, XML, floating-point, video transcoding and other applications.

"Although the latest AMD64 processors offer topnotch performance, when it comes to specialized operations such as graphics, XML operations and video transcoding, they deliver good, but less than stellar, performance. To achieve improved system performance, Advanced Micro Devices Inc. has opened its processor socket interface as part of the just-released Torrenza platform to allow companies like DRC, XtremeData and others to develop and deploy application-specific coprocessors to work alongside AMD64 CPUs in multisocket processor systems.

[snip]
"In one possible scenario, an FPGA-based hardware accelerator used in medical CT imaging might run the overall application 10 times faster when each 3GHz AMD Opteron processor is coupled with an FPGA. The result is significant system-level savings for power, space and cost. 'The key to acceleration is parallelism of the algorithm implementation in the FPGA, so that even when the FPGA operates in the subgigahertz range, it can outperform a multigigahertz CPU,' said [XtremeData CEO Ravi] Chandran."

Next Time: FPGAs Move Into Data Warehousing and the NPS Data Warehouse Appliance
The uses for the FPGA have expanded greatly in recent years to take on a key role in driving cost-effective high-performance in a broad sweep of applications. I’ve also made some suggestions in this and my previous posting about how the same sort of technology is a key performance multiplier in the NPS data warehouse appliance. In coming days, I’ll be posting the 3rd and final installment of this series on FPGAs, discussing the benefits that accrue to the Netezza system and what additional benefits of the FPGA might be possible as we progress the product forward in the future. Share: del.icio.us Digg Furl ma.gnolia Netscape Newsvine reddit StumbleUpon Yahoo MyWeb  

Posted by Phil Francisco at 6:00 PM | Comments (0)


© 2007 Netezza Corporation | Legal | Privacy | Safe Harbor | Site Map