BeyeBLOGS | BeyeBLOGS Home | Get Your Own Blog

« November 2006 | Main | March 2007 »

February 1, 2007

Ab Initio Profiler

I read the documentation for the Profiling tool from Ab Initio. I've worked with their ETL tool before, but not had the chance to use the profiler. They are possibly coming for a demo in the next month or so; and if it looks useful, we will try to convince the people using Ab Initio to pick up the Profiler as well.

Apparently, it not only collects statistics, but does analysis across and within data sets for dependencies and correlation; as well as being able to generate transformation code to use for validation.

Looking through the docs, the only thing I can think of that its missing is trending analysis; and it may just be because I missed it.

Added Feb. 1, 2007:

We had the demo, see the next entry. I realized that I did't say much about the Profiler in that entry, and it seems more relevant here.

The Ab Initio Data Profiler is better than others I have seen. You can profile an entire dataset, or a sample thereof (with parameters), or you can profile a single run of a graph against that data; allowing you to schedule the heavy load of a full profile, and then keep it up to date piece by piece.

We did run into problems with our initial profile runs on some data, but that is because our server we can run the profiler from is in a different location from where the data is, and we were filling up the pipe between. We were able to throttle the process and keep from hogging the pipe, but it makes things run quite slowly.

What was really impressive was that we got a pre-view of the demo about 3 or 4 weeks prior, and my manager asked about analyzing trends between runs of the incremental profile, and using that trending to determine, in the map, wheter we wanted to load the data to the target. They weren't able to do it at the time, but when they brought the tool in for the demo this week, they had a way to accomplish that task.

I know, it was probably something they were already working on, but it was impressive none the less.

Share: Digg Furl ma.gnolia Netscape Newsvine reddit StumbleUpon Yahoo MyWeb  

Posted by RDM at 9:45 AM | Comments (5)

Ab Initio Demo

The folks from Ab Initio were here Tuesday to give us a demo of their product.

It was a great demo. I have worked with Ab Initio before, and all the great things I remember liking about it were still there, plus a few new features.

After working with both Informatica and Ab Initio, I have to say that I prefer Ab Initio for several reasons.
Ab Initio is easier to work with.
Informatica has the PowerCenter Designer; where you put together the mappings of the data from source to target, and enter the business rules for transformation. But to define a source there is another "tab" that you have to switch to, and the targets are defined on another tab; and the reusable pieces of maps (called mapplets) are on yet another tab. Then the connection between a logical source definition in the tool, and the actual table/file in the computer is in a completely separate tool; called the Workflow Manager. Then, when you run the thing; there is yet another application that you use to monitor the execution. If you want to see the data you are operating upon, you have to use some other tool to get to it (I use TOAD to see the data we have on Oracle, Teradata SQL assistant, and QMF for our DB2 source). If you want to see what each individual component is up to? You are out of luck. *If* you can get the debugger to run, you might be able to track what the components are up to, but beware, of you have too many components on the map, the debugger won't even load. In Informatica, parallelism is left to the physical hardware implementation at the network level. (That is, to get parallelism in Informatica, you require more than one server to run it on).

With Ab Initio, it is all in one place. You drag and drop the components in the window of the Graphical Development Environment (GDE); there are database table components and file components and myriad transformation components. Then, to define the input columns, you can import DDL, or double click on the component and use a text edit mode to enter it. Same thing for the output definitions. Plus, you can also put a URL or the database connection information into the component and actually browse the data you are defining the DDL for. The tool gives you a visual indication when the information in the component is not complete enough for the graph to execute. Then, when you are ready to run it, click on a button and it starts to execute; no window switching. Plus, you can see the record counts as each component processes, so you can see which components are working as expected; or if there are bottlenecks in your process. And thats not even mentioning the debugger, which is a quantum leap beyond the execution data. Another huge advantage of Ab Initio is that parallelism is built in from the graph level. With Ab Initio, there are components to "Partition" and "Departition" a data flow; which allows the programmer to insert parallelism in the process at the graph level. And between graphs or between checkpoints within graphs you can land your data flows to "multifile" data sets on disk.

There is even more, especially when it comes to metadata and the tools related to it, but I'm afraid my posting would stray too far toward a rant at that point.

We will not be going whole hog to Ab Initio, because we have significant sunk cost in Informatica. But we will be using Ab Initio as much as we can (sharing an implementation at the Corporate office); especially for metadata.

Share: Digg Furl ma.gnolia Netscape Newsvine reddit StumbleUpon Yahoo MyWeb  

Posted by RDM at 9:45 AM | Comments (0)