« The Agility of Touch-It / Take-It | Main | assert(datawarehouse.data.is_correct()) »
December 8, 2009
Data Quality - A Family Affair
Grandma's lesson about taking responsibility for data quality.
When I was a young child, we spent every Thanksgiving with my paternal grandparents in Denver. There are two particularly memorable things about those visits. First, even into the late 1980's, my grandparents didn't own their own telephone. They rented their phone from the telephone company. It was the same rotary dial phone they'd had for years, hanging in their kitchen, with an extra long handset cord attached so they could stretch across the dining room or kitchen while still talking on the phone. Second was the important lesson that I learned about doing dishes by hand.
Doing dishes by hand is ideally a three person job: one to wash, one to rinse, and one to dry. The lesson that my grandmother taught me about washing dishes was that the drier is the person accountable for making sure the dishes were clean when they went back into the cupboard.
As data warehousing professionals, we spend a fair amount of time and energy arguing that data quality is something that has to be fixed up stream, by applications. My grandmother would insist that sending the dishes back to the washer is not our only option.
If a dish comes to the drier not quite clean, there are three options:
- send the dish back to the washer to be cleaned again from the beginning with soap;
- send the dish back to the rinser to have the mess rinsed off with some hot water; or
- use a little extra effort and wipe off the mess with your dish cloth.
Ideally the dishes come to us clean and ready to dry. It's a lot less work to dry off some steaming droplets of water and put a nice clean warm dish away in the cupboard than it is to notice that little bit of bread from the stuffing that didn't quite get cleaned and have to use the tip of your fingernail through a dish cloth to get the crumb off.
What are the downsides of sending the dish back through to be rewashed from the beginning:
- the washer has to stop in middle of scrubbing that big pan to rewash the plate;
- the plate takes longer, overall, to be rewashed, rerinsed, and redried;
- both the washer and rinser have to redo work.
I'm not suggesting that any application is allowed to be intentionally lazy about data quality, or should not correct issues that are identified. Rather, I'm suggesting that we make sure we all continue to see data quality as our responsibility and not merely blame up stream systems when there is something that could be done at various points in the chain to ensure quality information is used for decision making.
(Reposted from: Sharpening Stones)
Posted by Paul Boal at December 8, 2009 1:15 AM
