BeyeBLOGS | BeyeBLOGS Home | Get Your Own Blog

« Garbage Out, Garbage In | Main | This Week - San Francisco : Next week - Who Knows! »

February 22, 2006

Talking Dirty

Visit my full blog at www.dqview.com

Many people (including acknowledged data quality gurus) appear to have a very restricted view of what constitutes "dirty" data and what you can do to improve it.  I was reading an article recently that expounded the case for cleaning-up dirty data, but never ventured beyond tried and tested examples of historical data and alternative versions of names.  In my experience, dirty data can contain a host of hidden knowledge - we need to think beyond the idea of merely cleaning data and understand how we can actually turn "dirty" data into a valuable asset.

For example, take a look at most customer databases and you'll find evidence of how text fields, including names and addresses, are used to store additional pieces of information which are otherwise not catered for.  For instance, call centre staff will often use the customer name to store notes about when to call a customer, how to contact them or even personal comments about them:

Most data quality software provides little help in understanding and improving the quality of this data.  What's needed is the ability to profile and analyse the contents of free-text fields beyond a simple count of the number of times each full-field value occurs.  The new generation of data quality solutions provides users with the ability to analyse the contents of text fields and extract valuable knowledge from it.

Organisations that can understand dirty data and extract golden nuggets of information from it have the power to turn what was once viewed purely as a liability into a valuable asset.

Copyright © 2006 Steve Tuck - All Rights Reserved

Posted by Steve Tuck at February 22, 2006 12:30 PM

Trackback Pings

TrackBack URL for this entry:
http://www.beyeblogs.com/mt/mt-tb.cgi/64