BeyeBLOGS | BeyeBLOGS Home | Get Your Own Blog

« Introduction | Main | Balance of Power : Consultant <--> Client »

May 29, 2008

Did I get data quality in my data migration??

What IS data quality ?

Your project is complete. You have carefully spent your budget and countless man-days on moving all that data from across your organization into your new database. You had a team of Data Migration specialists cleaning that data as it went across your network to its new home. You can relax now, right?

Well, maybe. In some ways, your organizational data repositories (and that can mean databases, documents, spreadsheets, flat files, web pages, external data providers and so on) are like Switzerland itself - a miraculous coming together over many years of a very diverse population of different perspectives, beliefs, and languages that create a harmonious whole. This population lives under a common identity, but they have retained their traditions, foods, languages and perspectives. That is what diversity is all about, and is one of the things that makes Switzerland, and perhaps your company, such a strong, unified whole. However, with diversity, in this case, data diversity, comes complexity

So what's my point? Good question. I find that during large (or small) migration projects, it is an assumption that Data Quality is part of what is delivered. However, in an environment such as the above - which is all but inevitable in your organization over time - your data in all those different repositories contains redundancies and...let's just say...data that is lacking in the accuracy that one should expect.

In general, no matter how good or well-intentioned your hard-working Data Migration specialists are, they can only do what they can with the tools they have. In a 'vanilla' data migration project, they probably don't have the bandwidth (i.e. time or budget)...or possibly even the technology to really DO data quality within a reasonable time period. 'Vanilla' Data Migration itself IS by its very nature, extremely complex...and the out-of-the box ETL (or ELT, EAI ... Exx) tools are not generally equipped to do geocoding, intelligent duplicate elimination and other hardcore data quality processes without substantial innovative customization.

I am a data migration architect as one of my specialized disciplines, and I focus on process quality as my first deliverable. However, in all of the projects I have worked on, there has just not been a budget for true data quality transformation. It is likely that in most cases, the clients believe that data quality is implicitly delivered in the project. Unfortunately, despite our best efforts, apart from the standard data quality processes we data migration developers apply, this is not the case in most projects. Data Quality is often a project in itself.

The (simplistic) CUSTOMER example

Certain data quality issues can be resolved using vanilla tools – some simple redundancies caused by slight variations in a customer's name, for example:

Wade Walker
Wade WALKER
wade walker

can be resolved. However, add in some typos, phonetic spelling, linguistic phonetic differences, abbreviations things become a bit more complex:

Waid Walker
Wadi Walker
W. Walker
Wade Wlker
Weed Wacker (I always hated that one!)
...you get the idea.

...and yes - I have seen all of these bastardizations of my name - and more - appearing on envelopes addressed to me over the years...

Standard data migration tools don't normally have the built-in processes, or intelligence to deal with such variations without extensive customization. The address information for a given customer can work wonders for recognizing duplicates…but of course, there are variations, obsolete data, typos, etc which also further complicate things. But at least, we can say that we probably have a substantial improvement!

This is why the leading migration tools have formed partnerships with data quality vendors - these actually become plug-ins to the vanilla migration product. Now we are talking about the ability to "clean" / "scrub" our data. Some of the vendors offer separate complimentary products that audit data to determine content and assess data quality. Unfortunately, more often than not, these are either eliminated due to cost constraints, or the client hasn't been offered the option!

Think I've said enough on the subject for now. I would be interested in your comments.

Wade Walker
Methodata
WEBSITE : www.methodata.com

View Wade Walker's profile on LinkedIn

Posted by Wade Walker at May 29, 2008 9:34 AM

Comments

Agree with the sentiments expressed in the blog and that not paying sufficient attention to DQ can take the shine off a good migration project, but only if the business users are not properly engaged and don't have expectations realistically set.

Successfully migrating data and delivering the dependent business transformation is a blend of getting many aspects right, and I do subscribe to Johny M's view that organisations will only pay for what they believe they can afford at the time and no more.

Posted by: Tony Sceales at May 30, 2008 6:19 PM

sildenafil citrate …•4§A€)\,

Posted by: Anethythymn at May 14, 2011 10:41 AM

Post a comment




Remember Me?