« October 2009 | Main | December 2009 »
November 24, 2009
A practitioner's view on data integration worst practices
![]()
In my last blog, I spoke about some of the best practices in data integration. Today, I'd like to briefly talk about some of the worst practices. These insights come again from Phil Watt, who was the keynote speaker at our Nov 19 webinar on affordable, end-to-end data integration lifecycle management.
Paul stated that constantly changing requirements can make project delivery very difficult. He also highlighted the issue of ignoring reality, meaning that everybody who is actually involved in the project should be seen as a key stakeholder and their opinion should be valued. He said that he has seen many times where people are ignoring what the delivery team has been saying and go on to say "well we got to deliver in 3 months come what may." He went on to discuss a recent situation with a client whose company spent a high proportion of their budget on a traditional data integration tool and then didn't have any budget left to implement an important suggestion that would have made the project much more valuable. So again, it is important that one picks an affordable technology that enables you to deliver the most value to your business within your budget constraints.
Paul also mentioned that he has seen worst practices where projects fail to involve the business early on in the process. He highlighted the fact that sometimes people on the project obsess over trivial details that really have little or no impact on the project. And lastly, one of the biggest mistakes one can make is by failing to deliver early and often to keep the project stakeholders interested with regular deliverables.
Click here to play back our archived webinar on end-to-end data integration lifecycle management.
- Michael Waclawiczek, VP Marketing
Posted by expressor software at 12:30 PM | Comments (1)
A practitioner's view on data integration best practices
![]()
As part of last week's webinar on end-to-end data integration lifecycle managment, Phil Watt from Emunio Consulting spoke about best practices in managing a data integration lifecycle based on his experiences as a senior data integration consultant who has worked on many ETL/DI projects over the years.
Phil said that it is essential to have strong business sponsorship and involvement on any meaningful data integration project. He also emphasized the importance of strong governance and stated that one doesn't have to be overly bureaucratic when managing a project but felt that an appropriate level of governance is essential for every "successful" project. In this context, he also underscored the importance of understanding and communicating the roles and responsibilities throughout the team and making sure that the business understands their responsibilities as well.
He went on to say that data profiling and discovery are absolutely crucial and time and time again he has seen that companies forgo this step and then come to realize the problems in the data as they show their work to the business. He also referred to the need for strong metadata management and the need to adapt best practices design patterns. He also talked about the typical 80/20 rule where 80% of the data integration effort can often be accomplished in 20% of the time, so long as the project team focuses on the high value business requirements first and delivers on less important capabilities as time permits.
Click here to play back the entire archived webinar.
Michael Waclawiczek
VP, Marketing
Posted by expressor software at 12:15 PM | Comments (0)
November 19, 2009
changing the rules of the game
![]()
Over the past two weeks we briefed analysts at Gartner Group, Forrester Research and TDWI about our business coming off a strong quarter. End user customers looking for data integration solutions are fortunate to have these analysts. They are smart, work hard and really make an effort to understand IT challenges particularly as it relates to being responsive to the business. So I asked them all a simple question: "What is the biggest problem out there? Is it handling exploding volumes of data? Is it processing complex data? Is it providing tools that enable best practices? Is it providing usable solutions for non-technical users? I was hoping for a few yeses, since expressor addresses all of these pain points. I got a different answer - "it's cost." Data Integration vendors charge too much for their software!
You'd think there would be an easy answer to that. Let's just drop the price and we'll sell more volume. But software vendors don't think that way. Take the recent dialogue between our VP of marketing and his equivalent at Talend: http://blog.expressor-software.com/data-integration/expressor-and-talend-debate-pricing-models-transparency/. Talend wants to compete with the big guys and play by their rules. After all, data integration is really hard, and customers like to do many POCs over many months and blah, blah, blah. Where's the innovation here?
All vendors should be committed to changing buying behavior by making data integration simple and extremely usable by making the hard things easy and the impossible things possible. And then charge a fair price. My hunch is that more customers will buy and they'll buy quicker.
-- Bob Potter, president and CEO
Posted by expressor software at 12:15 PM | Comments (0)
November 17, 2009
Is your company's process of collecting and analyzing clinical trial study data too manual and time consuming?
![]()
Clinical trials play an important role in medical research to answer specific questions about vaccines or new therapies or new ways of using known treatments. They are being conducted to determine whether new drugs or treatments are both safe and effective and are seen as being the fastest and safest way to find treatments that work in people. It is therefore critical for pharmaceutical companies and their partners to be able to efficiently collect and analyze the data resulting from various trial phases as well as data collected from similar studies. Moreover, companies need to be able to integrate this data with detailed patient data to minimize the drug's risks and optimize its usage as well as its commercial success.
Having worked with several prospective customers in this space, it is astonishing to us that the process of collecting data from clinical trials and related studies remains very labor-intensive and time consuming even so the likes of Informatica have been selling into this market for years. What we see is that most companies are still relying on home-grown and sub-optimal vendor ETL solutions to integrate clinical study and patient data from a variety of sources for reporting and analysis purposes. Due to architectural limitations, converting source data in XML documents and other complex data formats remains a daunting task for today's ETL tools. Users tell us that the mapping process of the XML from sources to targets and delivering the application take an unacceptable amount of time. They also say that the data mappings are very brittle, making it difficult to easily incorporate additional clinical data sources and to reuse mapping rules within or across different mapping projects. And ask them, if they are satisfied with the XML data processing performance of their incumbent ETL tool!
Unlike other ETL and data integration tools, we have demonstrated to several clients over the past year that our innovative semantic data integration system is well suited to overcome many of data integration challenges they face. We have shown in a number of proof of concepts that our parallel data processing engine can process the even most demanding XML documents at impressive speed. Adding new data sources can be done quickly with our smart semantics approach. And expressor allows them to seamlessly integrate their existing operational systems and databases.
Michael Waclawiczek
VP, Marketing and Product Management
Posted by expressor software at 8:15 AM | Comments (0)
November 11, 2009
after purchasing, please read the documentation
My first job out of college was working for a company that made what they called the 'bag.' The 'bag' was a cellular/GPS device designed to track parolees and warn when they might be doing something they should not.
It was 1995 and I could not have asked for a better first job. The system was completely real-time. The operating system was Windows NT 3.51. We were writing our own 'bag' operating system and here I was right out of school.
Then reality hit. The GPS record format.
Nowadays it might be easier (I have not kept up on it), but 15 years ago it was a binary fiasco. The use of 3-byte and 5-byte integers for altitude, time, latitude and longitude information, among other fields, was used to reduce the size of the data frame. They did not teach this stuff in school!
For those of you not intimately familiar with integer sizes and class, there are no common computer languages that handle odd-length integer values greater than 1. Meaning that 1, 2, 4 and 8 bytes are handled natively and 3, 5 and 7 bytes must be handled with conversion code. In fact, only two ETL vendors that I know of can handle this format, expressor software being one of them.
The reality is that most data integration tools can only handle relational, XML (barely), and delimited/fixed flat files. Sure, they can handle queues and SalesForce but these are all just hacks. If you read the documentation it is easy to see the limitations and there are many caveats and limitations when not using RDBMS and flat files.
In fact, it is not until you actual buy the product until you begin to discover all the things you simply just cannot do even though you were told you could. My favorite example, and this is so true, is that most data integration vendors include a statement such as, 'If you need to sort, perform the sort in the RDBMS as it is faster than our sort.' Really? You're kidding right? Well what about this flat file? What about this XML? What about my 3-byte integers? What about my complex HL7, EDI or legacy data?
-------
Dear Customer,
Sorry we were not completely honest.
Regards,
Your Data Integration Software Vendor
p.s. Please come to *****World in the Fall!
--------
Actually, there is really no need to worry about formats like HL7, EDI, and SalesForce data. The reason being is that these are all add-on products for which you need to pay even more. And when you buy them and read the documentation you find the truth. That they really don't work as advertised.
- John Russell, chief scientist and co-founder
Posted by expressor software at 8:30 AM | Comments (0)
November 10, 2009
expressor uniquely positioned to address 'big data' challenges of SQL Server community
As Michael's observations indicate, PASS Summit 2009 was a very successful trip for expressor. Microsoft SQL Server continues to deliver on the needs of the mid-market. As a technologist, I have always been impressed with Microsoft's commitment to innovation and improvement of its own products. Other attendees and exhibitors at PASS Summit 2009 shared in this enthusiasm.
During the event, several members from the SSIS team 'wandered' over to the expressor booth to learn more. With several patents pending we were unafraid to assist them in this endeavor! As I demonstrated the product they became more intrigued and began using the word 'innovative." We thank them for their confidence in us!
Many of the attendees we met had similar challenges: growing data volumes, increasing data complexity, increasing metadata management requirements, complex transformations, shrinking processing windows. Despite the fact that many attendees were from the under-served SMB market, many were encountering these 'big data' challenges. These users have outgrown simpler integration tools and have either propagated messy and costly development practices or were even compelled to license Informatica or DataStage at significant cost. Many have been using SQL Server for transactional systems but are just beginning to develop data warehouses and analytical capabilities. Many are challenged with integrating non-Microsoft sources and targets and are forced to build sub-optimal solutions because native connectivity does not exist in their tools. Many need to write complex transformations but are burdened with the development and maintenance costs associated with business rules that are not reusable. They have a strong desire to manage their data and applications from a semantic layer but don't have the tools to do so. Many require the ability to build flexible and dynamic data integration or ETL applications because the source of data being processed is flexible and dynamic. And they prefer to process their data integration or ETL applications from non-Windows platforms such as Linux but have not been able to find an affordable yet powerful integration platform. While SQL Server is meeting their RDBMS needs, these users were faced with significant and costly challenges in integrating with SQL Server.
Most encouraging is that expressor is uniquely positioned to address the end-user needs of SQL Server (and other RDBMSs) as described above. Our core engine was built for high-performance and complex data handling. The types of data volumes described to us at PASS would not represent any challenge to expressor. Our pre-built transformation operators are very powerful and provide many 'record level' data transformation capabilities missing from homegrown solutions and many tools. Our fully-integrated transformation language, expressor datascript, is built for speed and powerful custom transformations at the field level. Our semantic integration model is core to the product and facilitates the re-use of data mappings by using a business dictionary for data integration development; it also allows business rules to be defined and managed independent of any data application but easily re-used wherever needed. And our our-of-the-box repository captures design-time and run-time metadata allowing for complete visibility into data integration applications and deployments including impact analysis and data lineage. Most of these inherent features simply do not exist in mid-market ETL tools and exist only as poorly-integrated bolt-ons to enterprise-priced ETL products. Most important to the SQL Server users and attendees at the PASS Summit 2009, these are all core features in the expressor solution, are designed to work together, and are affordable.
We are extremely excited about our ability to address ETL and data integration needs with SQL Server customers.
Steve Frechette, VP engineering
Posted by expressor software at 7:30 AM | Comments (0)
November 6, 2009
SSIS users positive about expressor
We've just returned from a very successful trip to the Pass Summit 2009, where we were gold sponsors and exhibitors at this premier Microsoft SQL Server conference. Overall we were very impressed with the quality of the audience and excited about speaking with dozens of SSIS users during our two-day exhibit. What we've found is that there is a sizable percentage of Microsoft SSIS users (30 - 40% in our estimate) that could greatly benefit from our software solution as they encounter performance, data transformation complexity and/or data connectivity issues with SSIS. We gave many expressor product demos and felt very good about the positive response we got from various users, including a couple of Microsoft SSIS developers, who swung by our booth to check us out. We've also learned that none of these SSIS users would ever go back to hand coding their ETL jobs nor would they consider using any of the high-priced traditional ETL/DI systems such as Informatica or DataStage. This opens up a great opportunity for our affordable, high-performance expressor system to enable SSIS customers to augment their data integration tasks with expressor where appropriate or upgrade to expressor if one of their existing applications is 'running out of gas.' Either way, expressor is the perfect solution for SSIS customers to turbocharge their ETL applications in a Microsoft-friendly development and deployment environment.
For more on our expressor for SSIS users, download the brochure
we prepared for this event.
- Michael Waclawiczek , VP marketing
Posted by expressor software at 12:45 PM | Comments (0)
November 4, 2009
A new edition of Flow-Based Programming is in the works
Yesterday I learned that J. Paul Morrison has begun work on a new edition of his 1994 book Flow-Based Programming. Morrison began working with flow-based programming systems (FBPs) at IBM around 1969. He has developed several FBPs, including Advanced Modular Programming System (AMPS, 1969-70) and Data Flow Development Manager (DFDM, late 1980s, with Wayne Stevens). Flow-Based Programming touches on nearly all of the issues important to FBPs, such as parameterization, deadlock avoidance and buffering, checkpointing and transactional behavior and much more. The book does miss a few topics; data partitioning isn't mentioned, and data modeling is not given the attention it deserves. But overall it's a terrific book.
Morrison's book is a great reminder that the ideas behind today's FBPs have been around for a long time - because they work!
The full text of the first edition of the book is available on Morrison's web site, in PDF and HTML formats.
-- Jerry Callen, engineering
Posted by expressor software at 2:30 PM | Comments (0)
