« November 2009 | Main | January 2010 »
December 15, 2009
Semantic rationalization blog series: part 3 - the specifics of abstraction
![]()
In my last blog, I finished up my thoughts on the importance of involving non-ETL-evelopers in the data integration process. Now let me talk more about how expressor addresses the challenges involved in accomplishing this goal.
We have patented a number of concepts related to our solution, describing a business abstraction layer where the steward, analyst, and developer can work together and contribute activities that utilize their individual skill sets, all through graphical interfaces. The curve ball here is that although expressor is based on this forward-thinking architecture, we must also allow a single individual to perform all these activities in the event that his or her organization is not ready for 'pipelined' development or needs to execute 'quick and dirty' projects.
The remainder of this blog focuses on the functionality we believe is necessary to achieve the 'abstraction layer' described above.
Fundamental to this approach is the need to refer to data items by their business names rather than their physical names. This capability allows us to lower the 'communication impedance' between the technical staff involved in developing the solution and the business staff involved in defining the requirements, validating the implementation and using the resulting solution.
expressor provides multiple ways for the steward role to introduce business definitions into our metadata repository: the expressor initiator, our metadata bulk loading tool, provides mechanisms to either bulk load definitions from a logical model or organically grow the businesses ontology through the available database schemas. The expressor administrator, a Web-based GUI tool, provides an interface to enter definitions and terms in a transactional model. Regardless of the approach chosen, the system 'learns' how the organization defines its important data and over time and is able to recognize new external data by its business definition rather than its physical description.
There are a number of internal mechanisms involved in this process - from simple matching to sound-ex type analysis, to combinatorial analysis, to proximity searching - that are used both to identify possible correlations between the external names and business definitions as well as to evaluate the likelihood that the names are related and recommend which relationships are most likely the correct ones. The true power of this functionality becomes evident as more data is added to the system in subsequent projects, but is also evident after the initial project.
That's it for today. In the next blog, I'll expand on the benefits of our abstraction layer.
- Michael Ruland, field engineering
Posted by expressor software at 10:00 AM | Comments (0)
December 14, 2009
Unix as a platform for DI is fading quickly
![]()
I am spending significant time and effort right now talking to our customers, partners, and prospects about their computing needs for 2010 and beyond. One of the questions I ask them in my surveys is how important Unix is for running their data integration applications. And what I am hearing loud and clear is that most current Unix shops already have or are planning to move to Linux during 2010.
As you would expect, they are planning to transition away from Unix for a variety of reasons, most of which have to do with hardware licensing and maintenance costs, in addition to the fact that Linux servers are now being offered by all major hardware vendors. And the wide choice of Linux offerings gives companies plenty of options to choose the best price/performance platform for their data processing needs.
This is good news for innovative data integration vendors like expressor who prefer to use their engineering resources for new feature development rather than in support of obsolete hardware platforms. Our product strategy for 2010 is to continue to support expressor on Sun Solaris and IBM AIX platforms - but we strongly encourage our customers and prospects to move to Linux during that timeframe.
- Michael Waclawiczek, VP marketing
Posted by expressor software at 10:15 AM | Comments (0)
December 8, 2009
Semantic rationalization blog series: part 2 - business rules and the Semantic Web
![]()
In my previous blog on semantic rationalization, I introduced you to our semantic rationalization concepts, our goals to make data integration more affordable and easier to use, and I drew an analogy between pipeline processing and collaborative development. I also highlighted the fact that the ETL developer often tends to be the bottleneck on an ETL project, which is why we believe it is so important that other user roles can contribute to the project.
Another domain of tasks that could benefit from contributions of non-ETL-developers is management of rules used during the data integration project. The holy grail of ETL tools, tracing back to Bill Inmon's Prism Solutions days, was the idea of having business analysts create the 'business rules' that would drive the ETL process. Unfortunately, this has proven beyond the capability of today's ETL technologies. Most tools require the business analyst to learn C, Java, ksh, or SQL to build these rules, an unrealistic expectation in most business environments, where Excel is the 'lingua franca.'
While there are business rule engines available now, these typically don't easily integrate with ETL tools, technically or 'religiously.' Even if we can't provide an interface for the business analyst to develop rules (I believe that this is, in fact, technically feasible), we should at least provide an interface that allows analysts to review and validate the rules created by the developer or other analysts - but rules engines typically can't help with this.
At this point (assuming you've read my previous blog on this topic) I hope you now have a reasonable understanding of the problems expressor is trying to address through semantic rationalization. On the surface, they are significantly different than the problems addressed by the use of semantics in Web 2.0, which may seem unfortunate, given that the names are so similar. In fact, on closer inspection, the problems are remarkably similar, and expressor's solution might eventually be used in combination with the semantic Web approach in the context of data integration. But that is a topic for a different, more abstract discussion.
So enough theory for now. Next time, I'll start digging into the actual mechanics of the expressor semantic data integration system.
- Michael Ruland, field engineering
Posted by expressor software at 1:00 PM | Comments (0)
what's so magic about Gartner's latest Magic Quadrant update?
![]()
Gartner recently updated their Magic Quadrant (MQ) for data integration tools and with the exception of one new vendor, who was included based on their free download open source business practices, little has changed, including the top right-hand quadrant. As you would expect, the MQ leadership quadrant continues to be controlled by Informatica and IBM DataStage, who've become the safe bets, not because these tools are innovative and/or affordable, but because they represent the status quo.
I have stated in a number of earlier blogs that there is no doubt in the eyes of industry analysts, pundits, and vendors alike that the tides are changing in favor of much more affordable data integration tools like expressor. Based on that premise, you should expect that the MQ picture will look totally different three years from now.
Back to Gartner's MQ: the MQ inclusion criteria are that a vendor must generate at least $20 million of annual software license revenue from data integration tools or maintain at least 300 maintenance-paying customers for their data integration tools. Although expressor is aggressively expanding its customer base on a quarter by quarter basis, neither of these two inclusion criteria applies to us yet. To accelerate the process we could always open-source our code and charge for support, but would that help us execute on our vision? I don't think so. Our company's mission is to truly redefine data integration by developing and marketing a next-generation data integration system based on breakthrough usability and a metadata-driven semantic foundation. And in our opinion, open source and breakthrough usability just don't go together well! Our end user is not a Java developer.
Our goal is not to develop yet another me-too product but a revolutionary system with a totally new data integration paradigm that overcomes the fundamental deficiencies of today's ETL and data integration tools (including the open source tools). In this context, let me highlight some of the key issues with today's technologies that we are addressing with our solution:
I could go on telling you about all the limitations of existing ETL and DI technologies. It's hard to believe, but even MQ-leading quadrant DI tools still have issues handling complex (XML) data and are challenged by throughput performance tests. So you ask why? The short answer is that today's leading data integration tools (Informatica, etc) have been built on 15-year old architectures for metadata management and data processing. It's as simple as that!
In case you didn't know, expressor was recognized earlier this year by Gartner as a Cool Vendor in Data Management and Integration due to our strong, innovative meta-driven solution. Gartner analysts are very favorable towards our unique, patent-pending approach to data integration and they are being briefed frequently by us on our product roadmap and aggressive customer acquisition goals for this year and 2010. Watch our space.
- Michael Waclawiczek, VP marketing
Posted by expressor software at 12:45 PM | Comments (0)
December 4, 2009
Semantic rationalization blog series: part 1 - philosophy and approach
![]()
Over the next few weeks, I'll be posting a series of blogs here to answer the most frequently asked question we get here at expressor: 'What is semantic rationalization?'
It's obviously a big differentiator for us - but it's also a concept or phrase not easily parsed to determine a meaning. So we hope you find this helpful.
In this first installment, I'll discuss the philosophy and approach behind our vision for semantic rationalization, then dive into more detail in subsequent entries.
Semantic rationalization, from the expressor point of view, consists of the mechanisms required to construct a business abstraction layer in which multiple user roles can contribute to delivering and maintaining a data integration application. Our branded marketing term for this is 'smart semantics.' This is a very different concept than that employed by earlier generation ETL tools - and it is fundamental to how we intend to make data integration simpler.
To understand our approach we need to first understand the business and technical goals that expressor is targeting and then dissect the functionality involved in building the solution. Our primary business goal is to make data integration much more affordable. Most people we talk to already agree that our revolutionary pricing model - which is based on the business value delivered and the hardware it runs on - has achieved that objective.
Our fundamental technical goals are to make data integration significantly easier and to allow more individuals with differing business experience to participate in the data integration process. An analogy here is the way that technology solved the challenge of the data volume explosion by including parallel processing in ETL tools. Just as pipeline parallelism allows multiple processors to work independently on their specific tasks while contributing to complete the final product, allowing multiple participants to work and contribute their individual expertise in parallel during the development of a data integration application helps them build better applications more quickly.
If we look at the functionality involved in delivering a data integration application and where the current ETL / data integration technologies tend to bottleneck, we see that most often the ETL developer is the limiting resource in the process. So it's not surprising why vendors like us are focused on increasing the developer's ability to deliver solutions faster - very much like hardware and chip manufacturers are focused on developing faster processors. This is a good thing - improving the developer's productivity is very important since these folks are usually very expensive and good ones are in limited supply. But it's not the whole story.
Using another technology analogy, if we look at how processors were sped up, we find that inside the typical CPU there are a number of microprocessors performing a myriad of support tasks for the instruction processor. There are microprocessors which decode addresses, pre-fetch instructions and data, predict execution logic, etc. - all of which are vital to improving the CPU's overall performance.
The data integration process can be improved in the same way. We can envision having someone who is responsible for the data (a data steward role) defining the business name of a particular data item, as well as how that data is used in the business. This concept is remarkably similar to the idea of master data, and just like with master data management, a common mistake is assuming that all data is critical and needs to very tightly managed - resulting in a massive effort before a project can begin.
We believe that a much better approach is to allow this ontology to grow organically over time. A key requirement for turning this approach into reality is the ability to change previous decisions and/or correct mistakes easily. This is actually a much more complex requirement than one might typically imagine. It brings up questions like 'what happens to data integration decisions previously made when the definition about the way a datum is used in the business is changed?' Clearly a sophisticated impact analysis mechanism is required along with a well-defined scope on implementing any updates necessitated by the change.
On that point, let me wrap up this entry. Next time, I'll discuss the creation of business rules and look at the connection between semantic rationalization and the semantic Web.
- Mike Ruland, field engineering
Posted by expressor software at 11:45 AM | Comments (0)
December 1, 2009
new customer: Sybron Dental Specialties
![]()
Click here to read the details of our announcement today that Sybron Dental Specialties has chosen the expressor semantic data integration system to improve the quality of its customer data warehouse.
Here's an excerpt:
'We were impressed by expressor's ability to quickly integrate the wide variety of customer data we need to consolidate from our distributors around the world,' said Ron Malerstein, VP of sales, SDS. 'expressor's affordable pricing model also made it possible for us to get started today - with a lower total cost of ownership than any other solution we considered.'
Posted by expressor software at 9:30 AM | Comments (0)
why does the market need yet another generation of data integration tools?
Last week I was asked the question why the market needs yet another generation of data integration tools when so many G2000 companies already have standardized on a 'best-of-class' data integration solution from the likes of Informatica or IBM. This is a fair question but reminds me of a parallel when companies in the '90s began bringing Unix servers in house after they had already standardized on IBM mainframes or similar 'computing monsters.' It's true that IBM mainframes are still around and won't go away anytime soon - but server-centric computing has become the norm rather than the exception in today's IT environment.
So why did server-centric computing succeed and replace the big monolithic mainframe? For the most part because of better price/performance although other reasons like newer, and more open OS standards like Unix and Linux played an important role as well.
A similar argument can be made for next-generation data integration products like expressor, which deliver significantly better price/performance than the likes of Informatica. At one 4th or 5th of the price of Informatica you get even better processing performance than PowerCenter. A claim we have proven at several accounts now where we beat Informatica on data processing performance and our ability to handle emerging data formats such as complex, hierarchical XML documents much easier and faster than they do.
Without doubt there will be a shift in our market over the coming years. The current status quo of selling high-priced data integration software will come to an end and will remain only acceptable for very special, high-end, mission-critical applications, where companies will continue to play it safe and go with mature technology.
IT buyers are beginning to recognize that they don't have to buy these expensive products any longer for many or most of their data integration projects. And when that mindset kicks in across board, the market will rapidly move away from the mainframe mentality of buying, development and deployment of data integration technology. They will come to realize that smaller, more modern, easier to use, and affordable data integration technology is the way to go for most of their projects.
Welcome to the 'server-centric' world of ETL and data integration. It's around the corner! And welcome to expressor.
Michael Waclawiczek, VP marketing
Posted by expressor software at 9:15 AM | Comments (0)
