« Sybase IQ 15.1 proves its mettle | Main | Sybase in Magic Quadrant for Data Warehouse Database Management Systems »
February 26, 2010
Sybase IQ Data Loading – a Multitude of Speedy Options
Sybase IQ Data Loading – A Multitude of Speedy Options
We tout the high performing query processing and analytics capabilities in Sybase IQ, but Sybase IQ also excels when it comes to loading data. Here is a picture that shows the wide variety of options that are available to you:

Along the bottom of the picture are the various data sources you can load data from. Above the data sources, are the different loading methods. If a method is positioned above a data source, then it can load data from that source. For example, the ”INSERT...LOCATION” method is designed to load data directly from databases. ETL can load data from either files or databases. The color of the box indicates the relative speeds of the loading method – red is suitable for smaller tables, yellow is faster for larger data sets, green is very fast, and blue is the fastest. There are a few options that are slow, and that you should avoid if you have performance requirements. ”INSERT...VALUES” is a SQL insert statement that loads data a row at a time. Sybase’s columnar architecture lends itself to threaded loads that load columns individually and in parallel. ”INSERT...VALUES”, ”LOAD TABLE (sequential)”, and direct trickle loads from RepServer are row based operations, and do not take advantage of CPU threading.
”LOAD TABLE” is the suggested loading approach for fast batch loading from flat files. With ”LOAD TABLE”, Sybase IQ can sustain 15 – 30GB of raw data processed per CPU core per hour. An 8 core machine should average 120 – 240GB/hour of raw data loaded.
”INSERT...LOCATION” is another recommended method that performs very well. This is like a SQL insert command that opens an Open Client connection to a remote server, and inserts data based on select criteria. It is not as fast as ”LOAD TABLE”, but does not require the disk space and time to export data to files prior to loading.
ETL (Extract Transform and Load) is a good option if you need to transform data prior to loading. Many commercial ETL/ELT products work with Sybase IQ using standard interfaces. And Sybase ETL is specifically tailored to loading Sybase IQ from heterogeneous sources, utilizing the high performing bulk ”LOAD TABLE” command.
To load data incrementally as it changes, RepServer is the recommended option. RepServer is Sybase’s mature data replication technology. When combined with a staging area, micro-batch loading can be employed to make this method both powerful and fast.
Finally, for those ultra low latency scenarios that exist particularly within the financial services industry, there is Sybase RAP – Real-Time Analytics Platform. With RAP, data is captured from real time data streams and batch loaded into the RAPStore, which is based on Sybase IQ. RAP has been load tested at 1,000,000 messages/second on IBM Power6 8-core servers.
Customers tell us that over half the development effort for an analytics server is in setting up the processes to load and maintain state of the data in the store. With a broad array of options, this task should become a smaller fraction of the effort, allowing you to focus on gathering the business intelligence that you value.
Until later,
Courtney Claussen
Posted by Sybase IQ at February 26, 2010 11:36 PM
