« Surnames and nicknames, Mr. Trump, an unusual phonebook, and data quality. | Main | Predictive Analytics for the Excel user ... and that means for you. »
January 30, 2007
Data Quality and Data Stupidity
"But this isn't about data quality, it's about data stupidity!" It was difficult to disagree. A customer had narrowly avoided dispatching several tons of architectural goods from Seattle to Portland, Maine rather than Portland, Oregon. The address cleansing system had done its best, and only the truck driver (I would prefer to say lorry, but must localize) noticed at the last moment. I had been telling the client for some time that his creaking data quality system was not adequate and that data quality is a lifecycle, a process, a metier, a way of life, a calling ... but for him it was just a function between processing orders and scheduling deliveries.
Here, on the b-eye network, some time ago, Claudia Imhoff was blogging about data quality. I liked this observation: "While there certainly is useful technology to help with data quality, so much of the assurance part is still heavily dependent on the human being (in this case, usually a business person) eyeballing the cleaned up data to verify its 'quality'."
Our truck driver did the eyeballing in this case, and good that he did. The costs of data stupidity can be high indeed, especially when business are so large that they cannot eyeball every order, or every action. Take the case of the Halifax Bank of Scotland. Stephanie MacLaughlin asked them for a copy of her bank statement. She received the bank statements of 75000 customers, delivered in 5 packages to her doorstep. HBOS said the incident was "isolated." I should hope so too. They might have added that it was stupid in the extreme.
(Readers of my post about Scottish names and addresses are perhaps wondering if the mistake is understandable. I assure you dear reader, that in a country of 5 million, there are not nearly 75000 Stephanie MacLaughlins.)
Of course, many businesses have quite careful rules in place for validating and catching small errors. However, in the cases described here, the errors were actually so great (each in their own way) that I doubt anyone had thought of putting a rule in place to catch the error. The HBOS error went unchecked until poor Ms McLaughlin received her mail. And there was no business rule preventing my customer from shipping to Portland, Maine, just a truck driver who, quite reasonably, did not think he could get there and back from Seattle in time for dinner.
The problem is how to check for all the various forms of stupidity that can arise. Often we can only do so retroactively. Indeed the mistakes are, as the bank said, isolated. Yet they are critically serious when they occur. I have no doubt that HBOS will now have a business rule implemented somewhere, anywhere, to check for such an issue again.
One approach I have been trying out with another customer - a car dealership - is to use a data mining model to check orders before shipment. The mining APIs in SQL Server 2005 are fast enough to be called on an order by order basis, and can be embedded into apps. The business simply builds a clustering model from its existing records, and new orders are checked to see if they fit the known clusters of order types. If they do not, they can be flagged and reviewed. The system is excellent for capturing stupidity. In fact, running it against historical records, it correctly identified over 90% of orders that had to be subsequently cancelled, returned or corrected due to errors, including orders for 1000s of parts instead of 10s, and many over- and under-billings.
I think this idea has legs. Perhaps my next book should be "Data Quality: The Stupidity Dimension."
Posted by Donald Farmer at January 30, 2007 12:00 AM
