BeyeBLOGS | BeyeBLOGS Home | Get Your Own Blog

May 29, 2008

Did I get data quality in my data migration??

What IS data quality ?

Your project is complete. You have carefully spent your budget and countless man-days on moving all that data from across your organization into your new database. You had a team of Data Migration specialists cleaning that data as it went across your network to its new home. You can relax now, right?

Well, maybe. In some ways, your organizational data repositories (and that can mean databases, documents, spreadsheets, flat files, web pages, external data providers and so on) are like Switzerland itself - a miraculous coming together over many years of a very diverse population of different perspectives, beliefs, and languages that create a harmonious whole. This population lives under a common identity, but they have retained their traditions, foods, languages and perspectives. That is what diversity is all about, and is one of the things that makes Switzerland, and perhaps your company, such a strong, unified whole. However, with diversity, in this case, data diversity, comes complexity

So what's my point? Good question. I find that during large (or small) migration projects, it is an assumption that Data Quality is part of what is delivered. However, in an environment such as the above - which is all but inevitable in your organization over time - your data in all those different repositories contains redundancies and...let's just say...data that is lacking in the accuracy that one should expect.

In general, no matter how good or well-intentioned your hard-working Data Migration specialists are, they can only do what they can with the tools they have. In a 'vanilla' data migration project, they probably don't have the bandwidth (i.e. time or budget)...or possibly even the technology to really DO data quality within a reasonable time period. 'Vanilla' Data Migration itself IS by its very nature, extremely complex...and the out-of-the box ETL (or ELT, EAI ... Exx) tools are not generally equipped to do geocoding, intelligent duplicate elimination and other hardcore data quality processes without substantial innovative customization.

I am a data migration architect as one of my specialized disciplines, and I focus on process quality as my first deliverable. However, in all of the projects I have worked on, there has just not been a budget for true data quality transformation. It is likely that in most cases, the clients believe that data quality is implicitly delivered in the project. Unfortunately, despite our best efforts, apart from the standard data quality processes we data migration developers apply, this is not the case in most projects. Data Quality is often a project in itself.

The (simplistic) CUSTOMER example

Certain data quality issues can be resolved using vanilla tools – some simple redundancies caused by slight variations in a customer's name, for example:

Wade Walker
Wade WALKER
wade walker

can be resolved. However, add in some typos, phonetic spelling, linguistic phonetic differences, abbreviations things become a bit more complex:

Waid Walker
Wadi Walker
W. Walker
Wade Wlker
Weed Wacker (I always hated that one!)
...you get the idea.

...and yes - I have seen all of these bastardizations of my name - and more - appearing on envelopes addressed to me over the years...

Standard data migration tools don't normally have the built-in processes, or intelligence to deal with such variations without extensive customization. The address information for a given customer can work wonders for recognizing duplicates…but of course, there are variations, obsolete data, typos, etc which also further complicate things. But at least, we can say that we probably have a substantial improvement!

This is why the leading migration tools have formed partnerships with data quality vendors - these actually become plug-ins to the vanilla migration product. Now we are talking about the ability to "clean" / "scrub" our data. Some of the vendors offer separate complimentary products that audit data to determine content and assess data quality. Unfortunately, more often than not, these are either eliminated due to cost constraints, or the client hasn't been offered the option!

Think I've said enough on the subject for now. I would be interested in your comments.

Wade Walker
Methodata
WEBSITE : www.methodata.com

View Wade Walker's profile on LinkedIn

Share: del.icio.us Digg Furl ma.gnolia Netscape Newsvine reddit StumbleUpon Yahoo MyWeb  

Posted by Wade Walker at 9:34 AM | Comments (3)

Introduction

Generally, it seems that Switzerland is a couple of years behind North America in terms of technology, at least from the perspective of Business Intelligence.

I have been working in the field since 1999, and moved over here in 2001. In that time, the area of BI seems to have matured substantially, with most large organizations having some form of fairly sophisticated BI infrastructure.

In 2004, I started a small consulting company here named Methodata (www.methodata.com), which specializes in Business Intelligence, Data Migration and Analytics.

In this blog I hope to discuss issues that I encounter working with my clients, consultants and colleagues.

In addition, I would like to explore some other topics that interest me, such as the link between the now-growing field of Web Analytics, the Data Warehouse, and the Web-enabled Data Warehouse.

I hope that with your reader comments, this blog will become a valuable forum, both for its readers, and for myself.


Wade Walker
Methodata
WEBSITE : www.methodata.com

View Wade Walker's profile on LinkedIn

Share: del.icio.us Digg Furl ma.gnolia Netscape Newsvine reddit StumbleUpon Yahoo MyWeb  

Posted by Wade Walker at 9:34 AM | Comments (0)

Web Analytics, BI and a disappointing viewpoint...

Out of interest of furthering the service offerings my company can provide to my clients, I am currently doing a University course on Web Analytics.

In the readings for the course, I came across a statement that can only be, in my opinion, the product of tunnel vision - one of the most short-sighted and fundamentally erroneous statements I have seen in some time. I was suprised to see this quoted with such apparent enthusiasm in our readings

[QUOTE CREDITS:]
[Content contributed by Jim Sterne, Target Marketing of Santa Barbara.]
[Edited by Erika Lindroth, The Weather Channel Interactive, Inc. ] :

"At the 2005 Emetrics Summit in London, Bob Chatham from Forrester Research described what it means to be the key. He told the assemblage that we are the leaders of tomorrow – and he wasn't just preaching to the choir to curry favor – he made sense. "

"Chatham told us that "web analytics" would eventually be subsumed into business intelligence, thereby changing the game. Instead of giant data warehouses being sifted in hopes of finding patterns, it would be the likes of us web analysts in charge. Having been immersed in the fine art of process optimization, we would be the ones calling the shots.

"We have exercised and built up our muscles optimizing prospect acquisition and lead management. We are optimizing prospect persuasion and conversion. We are tweaking customer services and drilling down to root causes for so many processes across so many departments and divisions that we are in a unique position to know what makes the customer and the company tick. That, said Bob Chatham, is what will make us excellent candidates for the executive ladder over time. "

"Instead"? Sooooo...Web Analytics is BI X.0? Is Web Analytics really going to revolutionize the art of Business Intelligence so significantly?

I think this is an excellent example of what happens when someone seen as a leader in a field becomes too engrossed in what he is evangelizing...he becomes blind to the bigger picture.

The fact is that Web Analytics, though impressive in its power to aggregate user behaviour and use this to optimize website profitability, it is by nature a limited field. You are able to track user behaviour – generally anonymous at that – through a single customer-facing channel. Web Analytics only adds value to the web channel.

"Giant Data Warehouses", however, are repositories of cross-organizational data, in most cases that extracted from up to hundreds of disparate data sources – Legacy systems, ERPs, CRM systems, finance, operations, HR, desktop apps, web services, external sources – and loaded into a database of a very specific architectural design optimized to return query results on the huge amounts of data very quickly.

Further, this data will certainly have different meanings across and organization - what does "Customer" mean? How do we define this? Part of the process is to work closely with the business to define common business definitions of business entities...so all that data of all that depth and breadth and richness is based on common meanings that have been agreed to by key stakeholders. We can mine the data to identify unknown customer segments. We can do Predictive Modelling. That is some powerful Business Intelligence.

But wait - let's make it even better - let's take those Web-specific data sources that power our Web Analytics Apps, and add that to the existing Data Warehouse, passing through the same business rules to ensure heterogeneous data has a single meaning. Now we are talking Business Intelligence – organization wide, multi-source data. Plug BI's powerful analytical tools into our database, and with some targeted, business-driven KPI's, and we have another, very powerful means of driving profitability

Complicated? Expensive? Prone to failure? Big "Yes" to all of the above. For the same reasons as in Web Analytics projects. However, Web Analytics could be said to be proportionally less expensive – same basic cost range for the analytics tool, but less demand for investment in multiple software licenses from different vendors (possibly), less complex data massage (or not...) and shorter time to implement. And that in itself is a strong argument in favour of Web Analytics - reduced time to market. However, you won’t have the spectrum of information you have in a well-implemented Data Warehouse.

I believe that Web Analytics is a complement to BI. It can be integrated into a dashboard, or can stand alone to guide developers and webmasters to optimize content. It does have an affect on our database architecture - we must adapt the design of the database to integrate web data. But does it "change the game"? No - it makes it more interesting. And as a Business Intelligence professional, I welcome another tool that will add value to my service offering and to my clients.

BI analysts are already "tweaking customer services and drilling down to root causes for so many processes across so many departments and divisions"...and we've been doing it for a long time. On data that crosses organizational boundaries.

I agree with the above quote that Analytics resources are "well placed to climb the executive ladder". But I think this applies to the larger BI Analytics group...not specifically Web Analytics experts.


Wade Walker
Methodata
WEBSITE : www.methodata.com

View Wade Walker's profile on LinkedIn

Share: del.icio.us Digg Furl ma.gnolia Netscape Newsvine reddit StumbleUpon Yahoo MyWeb  

Posted by Wade Walker at 9:33 AM | Comments (0)