BeyeBLOGS | BeyeBLOGS Home | Get Your Own Blog

« The Agility of Touch-It / Take-It | Main | assert(datawarehouse.data.is_correct()) »

December 8, 2009

Data Quality - A Family Affair

Grandma's lesson about taking responsibility for data quality.


When I was a young child, we spent every Thanksgiving with my paternal grandparents in Denver. There are two particularly memorable things about those visits. First, even into the late 1980's, my grandparents didn't own their own telephone. They rented their phone from the telephone company. It was the same rotary dial phone they'd had for years, hanging in their kitchen, with an extra long handset cord attached so they could stretch across the dining room or kitchen while still talking on the phone. Second was the important lesson that I learned about doing dishes by hand.

Doing dishes by hand is ideally a three person job: one to wash, one to rinse, and one to dry. The lesson that my grandmother taught me about washing dishes was that the drier is the person accountable for making sure the dishes were clean when they went back into the cupboard.

As data warehousing professionals, we spend a fair amount of time and energy arguing that data quality is something that has to be fixed up stream, by applications. My grandmother would insist that sending the dishes back to the washer is not our only option.

If a dish comes to the drier not quite clean, there are three options:

Ideally the dishes come to us clean and ready to dry. It's a lot less work to dry off some steaming droplets of water and put a nice clean warm dish away in the cupboard than it is to notice that little bit of bread from the stuffing that didn't quite get cleaned and have to use the tip of your fingernail through a dish cloth to get the crumb off.

What are the downsides of sending the dish back through to be rewashed from the beginning:

Perhaps the same is true in terms of data quality. If a transaction moves from system to system and doesn't come out the other end quite exactly clean, because some of those business processes in the middle aren't quite exactly flawless, is it always the best choice to go back to the beginning to find just where things went wrong and correct them there?

I'm not suggesting that any application is allowed to be intentionally lazy about data quality, or should not correct issues that are identified. Rather, I'm suggesting that we make sure we all continue to see data quality as our responsibility and not merely blame up stream systems when there is something that could be done at various points in the chain to ensure quality information is used for decision making.

(Reposted from: Sharpening Stones)

Posted by Paul Boal at December 8, 2009 1:15 AM

Comments

Post a comment




Remember Me?