« Introduction | Main
May 29, 2008
Did I get data quality in my data migration??
What IS data quality ?
Your project is complete. You have carefully spent your budget and countless man-days on moving all that data from across your organization into your new database. You had a team of Data Migration specialists cleaning that data as it went across your network to its new home. You can relax now, right?
Well, maybe. In some ways, your organizational data repositories (and that can mean databases, documents, spreadsheets, flat files, web pages, external data providers and so on) are like Switzerland itself - a miraculous coming together over many years of a very diverse population of different perspectives, beliefs, and languages that create a harmonious whole. This population lives under a common identity, but they have retained their traditions, foods, languages and perspectives. That is what diversity is all about, and is one of the things that makes Switzerland, and perhaps your company, such a strong, unified whole. However, with diversity, in this case, data diversity, comes complexity
So what's my point? Good question. I find that during large (or small) migration projects, it is an assumption that Data Quality is part of what is delivered. However, in an environment such as the above - which is all but inevitable in your organization over time - your data in all those different repositories contains redundancies and...let's just say...data that is lacking in the accuracy that one should expect.
In general, no matter how good or well-intentioned your hard-working Data Migration specialists are, they can only do what they can with the tools they have. In a 'vanilla' data migration project, they probably don't have the bandwidth (i.e. time or budget)...or possibly even the technology to really DO data quality within a reasonable time period. 'Vanilla' Data Migration itself IS by its very nature, extremely complex...and the out-of-the box ETL (or ELT, EAI ... Exx) tools are not generally equipped to do geocoding, intelligent duplicate elimination and other hardcore data quality processes without substantial innovative customization.
I am a data migration architect as one of my specialized disciplines, and I focus on process quality as my first deliverable. However, in all of the projects I have worked on, there has just not been a budget for true data quality transformation. It is likely that in most cases, the clients believe that data quality is implicitly delivered in the project. Unfortunately, despite our best efforts, apart from the standard data quality processes we data migration developers apply, this is not the case in most projects. Data Quality is often a project in itself.
The (simplistic) CUSTOMER example
Certain data quality issues can be resolved using vanilla tools – some simple redundancies caused by slight variations in a customer's name, for example:
Wade Walker
Wade WALKER
wade walker
can be resolved. However, add in some typos, phonetic spelling, linguistic phonetic differences, abbreviations things become a bit more complex:
Waid Walker
Wadi Walker
W. Walker
Wade Wlker
Weed Wacker (I always hated that one!)
...you get the idea.
...and yes - I have seen all of these bastardizations of my name - and more - appearing on envelopes addressed to me over the years...
Standard data migration tools don't normally have the built-in processes, or intelligence to deal with such variations without extensive customization. The address information for a given customer can work wonders for recognizing duplicates…but of course, there are variations, obsolete data, typos, etc which also further complicate things. But at least, we can say that we probably have a substantial improvement!
This is why the leading migration tools have formed partnerships with data quality vendors - these actually become plug-ins to the vanilla migration product. Now we are talking about the ability to "clean" / "scrub" our data. Some of the vendors offer separate complimentary products that audit data to determine content and assess data quality. Unfortunately, more often than not, these are either eliminated due to cost constraints, or the client hasn't been offered the option!
Think I've said enough on the subject for now. I would be interested in your comments.
Wade Walker
Methodata
WEBSITE : www.methodata.com
Posted by Wade Walker at May 29, 2008 9:34 AM
Comments
Yes, a common problem but where I see a lot of specialists slipping up is in a failure to sufficiently educate the customer, getting them bought into the DQ process and clarifying that it is THEIR data and their responsibility
The key to resolving this is via the inclusion of a "Data Quality Rules" process (John Morris, author of Practical Data Migration covers this really well) where the business is integrated into the prioritisation and decision-making process.
Only when the business can observe the impacts of not managing DQ will they realise. If they're not "sold" the benefits, they won't buy, just human nature.
Most businesses pay far more for DQ than they expected (90% according to stats) through project failures or delays so the money is always there.
Plus, most of the DQ vendors are willing to do deals to lease software at a more attractive rate for a project duration so there is really no reason not to manage DQ effectively in a DM project.
Incidentally, we just ran a coaching session with John Morris on this very topic on DataMigrationPro.com, the podcast will be going up shortly on the site.
Regards
Dylan
djones-at-datamigrationpro.com
Founder - www.DataMigrationPro.com
Posted by: Dylan Jones at May 30, 2008 2:01 AM
Great blog. I’d agree though that on the one hand a data migration exercise is a great time to sort out data issues that have been languishing in legacy data sets for years, but on the other we never have enough time or resources to reach the Olympian hights of perfect quality data. What is needed is prioritisation. Last night (29th May 2008) I gave a web coaching session on data quality and prioritisation under the auspices of the social networking site www.datamigrationpro.com. Don’t worry if you missed it – the pod cast will be available on their site shortly.
For anyone with a passion for data migration (like I have) Data Migration Pro is, to my knowledge, the only site of its kind dedicated to data migration. So come and join us and share your insights with us. It’s time data migration got the specialist recognition it deserved.
Johny Morris (author “Practical Data Migration”)
Posted by: Johny Morris at May 30, 2008 6:43 AM
Agree with the sentiments expressed in the blog and that not paying sufficient attention to DQ can take the shine off a good migration project, but only if the business users are not properly engaged and don't have expectations realistically set.
Successfully migrating data and delivering the dependent business transformation is a blend of getting many aspects right, and I do subscribe to Johny M's view that organisations will only pay for what they believe they can afford at the time and no more.
Posted by: Tony Sceales at May 30, 2008 6:19 PM
