BeyeBLOGS | BeyeBLOGS Home | Get Your Own Blog

« January 2010 | Main | March 2010 »

February 26, 2010

Sybase IQ Data Loading – a Multitude of Speedy Options

Sybase IQ Data Loading – A Multitude of Speedy Options

We tout the high performing query processing and analytics capabilities in Sybase IQ, but Sybase IQ also excels when it comes to loading data.  Here is a picture that shows the wide variety of options that are available to you:

 loadpic

Along the bottom of the picture are the various data sources you can load data from.  Above the data sources, are the different loading methods.  If a method is positioned above a data source, then it can load data from that source.  For example, the ”INSERT...LOCATION” method is designed to load data directly from databases.  ETL can load data from either files or databases.  The color of the box indicates the relative speeds of the loading method – red is suitable for smaller tables, yellow is faster for larger data sets, green is very fast, and blue is the fastest.  There are a few options that are slow, and that you should avoid if you have performance requirements.  ”INSERT...VALUES” is a SQL insert statement that loads data a row at a time.  Sybase’s columnar architecture lends itself to threaded loads that load columns individually and in parallel.  ”INSERT...VALUES”, ”LOAD TABLE (sequential)”, and direct trickle loads from RepServer are row based operations, and do not take advantage of CPU threading.

”LOAD TABLE” is the suggested loading approach for fast batch loading from flat files.  With ”LOAD TABLE”, Sybase IQ can sustain 15 – 30GB of raw data processed per CPU core per hour.  An 8 core machine should average 120 – 240GB/hour of raw data loaded.

”INSERT...LOCATION” is another recommended method that performs very well.  This is like a SQL insert command that opens an Open Client connection to a remote server, and inserts data based on select criteria.  It is not as fast as ”LOAD TABLE”, but does not require the disk space and time to export data to files prior to loading.

ETL (Extract Transform and Load) is a good option if you need to transform data prior to loading.  Many commercial ETL/ELT products work with Sybase IQ using standard interfaces.  And Sybase ETL is specifically tailored to loading Sybase IQ from heterogeneous sources, utilizing the high performing bulk ”LOAD TABLE” command.

To load data incrementally as it changes, RepServer is the recommended option.  RepServer is Sybase’s mature data replication technology.  When combined with a staging area, micro-batch loading can be employed to make this method both powerful and fast.

Finally, for those ultra low latency scenarios that exist particularly within the financial services industry, there is Sybase RAP – Real-Time Analytics Platform.  With RAP, data is captured from real time data streams and batch loaded into the RAPStore, which is based on Sybase IQ.  RAP has been load tested at 1,000,000 messages/second on IBM Power6 8-core servers.

Customers tell us that over half the development effort for an analytics server is in setting up the processes to load and maintain state of the data in the store.  With a broad array of options, this task should become a smaller fraction of the effort, allowing you to focus on gathering the business intelligence that you value.

Until later,

Courtney Claussen

Posted by Sybase IQ at 11:36 PM

February 11, 2010

Sybase IQ 15.1 proves its mettle

In yet another display of industry-leading performance, Sybase IQ Analytics Server, the world’s #1 column-based DBMS, came out on top in the 1TB category of the industry standard TPC-H benchmark. Sybase IQ 15.1 clocked 102,375 queries per hour (QpPH) on an HP Proliant DL 785 G6 server running Red Hat Enterprise Linux 5.3 on 6-core AMD Opteron processors as outlined in the TPC-H report. This is the #1 performance number in the 1 TB category among non-clustered servers on the Linux platform.

Sybase IQ 15.1 has been architected from the ground up to harness the power of the latest generation of multi-core chip technologies to deliver industry-leading performance. The state-of-the-art query processor inside Sybase IQ 15.1 looks for every opportunity to parallelize all operations in a query plan. The parallelization techniques cover both horizontal data flows, which would include scans and joins, and vertical data flows, which would include flows into consumer nodes such as groupings from producer nodes such as joins. In this benchmark, Sybase IQ, took full advantage of the 48 core AMD chip set in the industry-leading HP DL 785 servers.

This result, along with the strength of Sybase and HP brands and partnership, clearly demonstrate a differentiated value proposition in the data warehousing and analytics market. I believe that the Sybase and HP joint reference solution is yet another testament to the strength of open, flexible, best-of-breed approach that can not only perform well, but also, unlike appliance and other fixed-configuration options, enable customers to apply new technologies quickly and easily to assure that their analytic servers grow with their requirements.

Posted by Sybase IQ at 5:24 PM

In a ”Stream” of Good News: Sybase acquires Aleri Technology

Well, the good news is coming faster than snowflakes in Washington, DC. Last week, Sybase announced acquisition of the assets of Aleri, Inc., to bolster Sybase’s capabilities and underscore our commitment to real-time analytics and BI. The added capabilities will be immediate good news to our users in financial services and will play a role in the growth of real-time BI across the intelligence, communications, marketing, retail, energy, healthcare and other sectors.

As many of you know, Sybase offers a stream processor as an option to Sybase RAP – our real-time analytics product. That option, Sybase CEP or Complex Event Processor, is based upon technology from Coral8, who were recently merged into Aleri. The combination of Aleri and Coral8 formed a powerhouse in the CEP arena and those products are now part of the Sybase analytics portfolio.

Low-latency analysis is the hallmark of CEP technology, and slashes the “time to knowledge”. Using a CEP, users can run time-critical event filtering, analysis, aggregation and selective persistence over data streams as they arrive, rather than having to delay the analysis until after the event streams have been captured, transformed and loaded.

In addition to slashing latency, CEP can also help to reduce warehouse storage requirements and help tame problems that were heretofore too costly to solve. By conducting analysis of a stream as it arrives, much of needed analysis and aggregation can be done without persisting details to the warehouse at all. If your data has no retention requirements, other than your analysis needs, you may be able to use CEP to greatly reduce your warehouse costs.

As society and systems that support it create information at an ever-accelerating pace, BI requirements will grow commensurately. This means increasing demands for algorithmic trading, measurement of risk, tracking of behaviors and transactions, detection and analysis of network outages, health care trend tracking, detection of terrorist threats can be met, often in near-real-time. We suspect few BI teams will escape increasingly stringent latency requirements in the near future, and Aleri CEP products will help Sybase deliver up on your needs for low-latency, high-volume analytics.

For more information on the acquisition and its significance, here are 4 sources you might want to review:
- Sybase’s announcement of the acquisition.
- Neil McGovern’s blog on Risk analytics with coverage of the announcement
- Sybase RAP product pages
- Aleri’s web site, which provides a broad range of information about the products, markets, application and is a great interim reference as we move to integrate Aleri’s web content into Sybase.com.

Watch Sybase.com and this blog where we’ll keep you up to date on Sybase’s progress integrating Aleri and enhancing the real-time capabilities of Sybase analytics product family.

Bill

Posted by Sybase IQ at 4:52 PM

February 3, 2010

Deriving Business Intelligence with BIRT

Access to your data and confidence in its integrity is vital. Without the ability to derive intelligence from your data, however, it is just data. Rich visualization tools, such as BIRT, are critical for getting the most from the data you collect and manage daily. And you want those tools to be accessible and easy to use. Recently, I created a common type of BI report, called a Master/Detail report, using BIRT. I was impressed by the capabilities of the tool, and how quickly I was able to build my report.

BIRT is an open source, Eclipse-based reporting system, with features such as report layout, data access and scripting. Sybase continues to stay true to its goal of providing a complete and integrated tooling platform with Sybase WorkSpace. If you have WorkSpace Data Analytics or WorkSpace Data Analytics Enterprise licensing, the BIRT report designer is automatically installed when you install Sybase WorkSpace 2.5. You can also download and install it from the Eclipse Web site.

My goal was to visualize and understand some order processing data stored in my Sybase IQ database. I wanted to create a bar chart of order quantities categorized by country. Then, I wanted to view the details about the orders for a particular country by clicking on a bar in my bar chart, and bringing up a table of order breakdowns by year/quarter and product category:

 both

I started by creating my master report – a bar chart showing order totals by country. To accomplish this, I created a data source (a connection to my Sybase IQ database), a data set (a SQL query), and added a bar chart element from the BIRT palette to my report canvas. Then I associated data elements delivered from my query with the axes of my bar chart. I wanted countries to show up on the X axis, and order totals on my Y axis. A simple drag and drop of data items onto the bar chart skeleton was all I needed to do:

master

Then I created my detail report. My detail report is like a spreadsheet, with time along the Y axis, and product categories along the X axis. The cells in the chart contain the product totals. To build this report, I used a report element called a dynamic cross tab. This is like a pivot table in Excel. It is not a static chart, but is built upon a data cube. The data cube in turn is based upon a SQL query or stored procedure call. Once you have the data in the cube, you can dynamically move data elements from the cube into various locations in the report. You can easily change what is displayed along the different axes, and what appears in the cells:

 detail

Now that I had the master and detail reports, I linked them together by configuring ”interactivity” on the bars of the bar chart in the master report.  I specified ”drill down on mouse click”, passing a parameter of country name:

 link

Once I had configured interactivity, I tested the reports within the tool.  A click on a bar brought up the correct detail report for the particular country.  This was all very easy to accomplish, and in no time, I had a valuable BI report.  So, spice up your analytics with Sybase WorkSpace and BIRT, and derive intelligence from your data now. 

You can read more about BIRT in this blog from Ray Gans, Community Manager for the BIRT Community:

http://blogs.sybase.com/sybaseiq/2010/02/guest-blogger-ray-gans-community-manager-for-the-birt-community/

Courtney Claussen

Posted by Sybase IQ at 8:25 PM

February 2, 2010

Guest Blogger: Ray Gans, Community Manager for the BIRT Community

Hello Sybase community!

My name is Ray Gans. I am the Community Manager for the BIRT community, and employed by Actuate Corporation – the primary developer of BIRT. My role is to support the BIRT developer and IT professional community by acting as their liaison to Actuate and ensuring the community’s voice is heard.

Ray Gans, Community Manager for BIRT Exchange

Ray Gans, Community Manager for the BIRT Exchange

Background on BIRT
Actuate, along with the Eclipse Foundation, founded BIRT in 2004 as the Business Intelligence and Reporting Tools project which today is one of the top and most successful projects in the Eclipse world. Actuate continues to staff the majority of BIRT’s open source contributors and thought leaders as the technology continues to evolve. BIRT has been downloaded well over 6.5 million times (through 2008) and we estimate that 500,000 developers have used BIRT over the last 5 years.

Actuate is very pleased that Sybase chose to include BIRT in their Sybase Workspace for Data Analytics as the reporting plug-in for Sybase IQ analytics server! These two great technology products should complement each other very well and I expect to see some great innovative data visualizations from developers, as well as from the sales and professional services teams who use these products. We have formed a partnership with Sybase and launched a series of initiatives including listing their product for purchase on our BIRT Exchange Marketplace (more details below), as well as exposing each other’s community of users to share and tap into their knowledge and information.

BIRT Exchange
In 2007 Actuate launched BIRT Exchange as a community site to address the needs of BIRT users, i.e., software developers and IT professionals, who utilize BIRT in their work as Web 2.0 application developers, report writers, ISVs, consultants and system integrators. Since its inception, BIRT Exchange has attracted over 25,000 registered members and is now the largest on-line BIRT community. Actuate and BIRT Exchange offer software products that leverage and enhance open source BIRT through its data visualization designers, viewers and deployment tools available as free-trial downloads from the site, as well as services and support offerings for BIRT-related and other technologies.

The BIRT Exchange community site hosts a community blog, newsletter, active BIRT discussion and support forums as well as DevShare, a large and growing collection of user contributed code, tutorials, webinars, whitepapers and various tips & techniques, that are freely available to all BIRT users.

A new feature on BIRT Exchange is the Open Marketplace. This site allows independent developers, ISVs and other software vendors to sell and distribute their BIRT-related applications, components, templates and other materials for free or purchase. In fact, Sybase Workspace Data Analytics is now available for purchase through our Marketplace!

Our plans are to make BIRT Exchange a one-stop-shop for all things BIRT. To that end, we are very excited about our partnership with Sybase and the Sybase IQ product. I am very impressed by the good integration of BIRT in Sybase IQ and I look forward to a great relationship between our two user communities.

Sybase IQ developers and users are more than welcome to visit and make use of the materials, forums and other assets found on BIRT Exchange. We hope to see you there so don’t hesitate to ask for assistance and/or share your experiences with other developers.

Ray Gans
Community Manager for the BIRT Community

Posted by Sybase IQ at 10:00 PM

Complexity as the Root of Failure? Here! Here!

I just read a whitepaper that I rather liked called “The IT Complexity Crisis: Danger and Opportunity” by Roger Sessions, the CTO of ObjectWatch. I haven’t met Mr. Sessions, but I like his theme. He offers the paper, and just as interesting, many of the many comments that emerged across the blog-o-sphere on this blog page.

While written from the perspective of a softwere development company, perhaps undertaking a large-scale software development such as large SOA-based applications, the basic thoughts apply more broadly, particularly Mr. Sessions reiteration of Glass’s law – that a 25% increase in functionality yields a 100% increase in complexity, and that complexity “seems to track nicely with system failure.”

Management of complexity has long been one of the very toughest issues in our industry. It’s always been true that “engineering exuberance” for our shiny new creations often blinds us to how complex they are, or may become. Moreover, only the most disciplined can’t fathom how opaque they’ll be to someone other than ourselves and our teams in years hence.

In tough times, the siren’s song of fast results and quick ROI becomes even more irresistible. Downturns press IT for immediate results, often at the expense of durable, long-lasting architectures. As such, the trap is “sprung” when temptation leads to bad architectures and those decisions become “baked into” the infrastructure.

When evolving BI infrastructure, a common version of this trap arises around the complexity that seems to appear inevitably when using “universal” databases. While lots and lots of organizations chose to base BI on such databases years ago, others find that, while tempting platform because they are already owned, they don’t often scale well beyond certain natural limits. Some of this is a complexity issue.

Use of universal databases as BI platforms often sets into motion a series of layered investments [read: complexities] into the BI infrastructure in the form of additional servers, complex transformations, cumbersome ETL processes and large but post-processing workloads before analytics and reporting can be run. Dependence on very specific data schemas to avoid performance issues results in the maintenance of numerous materialized views, summary tables and aggregates to deliver solutions. As user needs expand in many directions, pre-computed objects are used in-turn to for extraction into ever-more-complex OLAP cubes to further assure reasonable BI performance for more complex queries. Wow, that’s a lot of layers, er… complexity.

History amply demonstrates that we *can* conceive and build vastly complex systems, ones where complexity, and hence comprehensibility, clobber maintainability and become the achilles heel, sowing the seeds of an “epic fail” as my teenagers would say.

If you’re seeing long backlogs for changing or adding BI applications, rapidly-growing footprint in the SAN or on the server floor, exhaustion of batch windows for loading and aggregation, inability to deliver low-latency analytics to your business user, complexity may indeed be sowing the seeds future trouble.

Sybase IQ and our ecosystem of BI partners offer an alternative – a straightforward analytics-optimzed infrastructure delivering flexible data schema, minimized use of pre-aggregations, capable of running fast queries and complex analytics together within the database server, that runs on commodity hardware. By eliminating the many interlocking layers of functionality that live atop the universal database when optimized for BI, Sybase IQ reduces complexity and thereby greatly enhances maintainability. For many of our customers, Sybase IQ has meant the difference between a quick implementation followed by a tortured existence, and a quick implementation leading to continued scalability and success.

Give Mr. Sessions paper a read – while the geo-political implications are a long ways from what we [or at least I] do in BI, the direct relationship between functionality, complexity and failure are a stark reminder of the value of simplicity.

Bill

Posted by Sybase IQ at 9:17 PM