BeyeBLOGS | BeyeBLOGS Home | Get Your Own Blog

« January 2006 | Main | March 2006 »

February 22, 2006

Talking Dirty

Visit my full blog at www.dqview.com

Many people (including acknowledged data quality gurus) appear to have a very restricted view of what constitutes "dirty" data and what you can do to improve it.  I was reading an article recently that expounded the case for cleaning-up dirty data, but never ventured beyond tried and tested examples of historical data and alternative versions of names.  In my experience, dirty data can contain a host of hidden knowledge - we need to think beyond the idea of merely cleaning data and understand how we can actually turn "dirty" data into a valuable asset.

For example, take a look at most customer databases and you'll find evidence of how text fields, including names and addresses, are used to store additional pieces of information which are otherwise not catered for.  For instance, call centre staff will often use the customer name to store notes about when to call a customer, how to contact them or even personal comments about them:

Most data quality software provides little help in understanding and improving the quality of this data.  What's needed is the ability to profile and analyse the contents of free-text fields beyond a simple count of the number of times each full-field value occurs.  The new generation of data quality solutions provides users with the ability to analyse the contents of text fields and extract valuable knowledge from it.

Organisations that can understand dirty data and extract golden nuggets of information from it have the power to turn what was once viewed purely as a liability into a valuable asset.

Copyright © 2006 Steve Tuck - All Rights Reserved

Share: del.icio.us Digg Furl ma.gnolia Netscape Newsvine reddit StumbleUpon Yahoo MyWeb  

Posted by Steve Tuck at 12:30 PM | TrackBack

Garbage Out, Garbage In

Visit my full blog at www.dqview.com

The expression Garbage In, Garbage Out or GIGO is older than I am!  It dates back to the days of punched cards (which I just missed when I started my IT career in 1984) and first appeared in the OED in 1964; nonetheless it's as true today as it was then.

What occurs to me is that, with regards to Data Quality, the expression works equally well in reverse: Garbage Out, Garbage In.  I'd be a very wealthy man if I had a penny for every time a user decided that it didn't much matter what they entered in a computer system because "it's full of rubbish already".  And many's the time that I've met people who feel that it's pointless doing anything about information quality because the users will just screw it up again in the future.

Data quality software has a role to play in helping organisations understand, improve, protect and control the quality of the information they hold - but it's not a silver bullet.  These activities have to extend beyond the technology into the processes and ethos of the organisation and it's not at all easy to change attitudes in many environments.

I have been privileged to meet some amazing people who have done wonders by evangelising and championing data quality in their organisations.  But for every one of them, there's a bunch or people with a GOGI mentality that say "what's the point".

Every organisation seems to have a bunch of people with a GOGI mentality saying there's no point in trying to do anything about data quality.  Thankfully, I nearly always find someone (often a lone voice) who has been willing to stick their neck out and evangelise and champion data quality.  If you're one of those people you have my admiration and respect: you're doing the right thing.

Copyright © 2006 Steve Tuck - All Rights Reserved

Share: del.icio.us Digg Furl ma.gnolia Netscape Newsvine reddit StumbleUpon Yahoo MyWeb  

Posted by Steve Tuck at 8:15 AM | TrackBack