BeyeBLOGS | BeyeBLOGS Home | Get Your Own Blog

Main | February 2006 »

January 25, 2006

Does the Pope have a Dangerous Dog?

Visit my full blog at www.dqview.com

That may seem a strange question to ask, but it's one that I remember an IT Project Manager once boasting that their system could answer. What she meant was that their customer management system could support the data because "His Holiness The Pope" was included in the drop-down list of personal titles and they had a check box to flag customers who had a vicious canine.

The project team in this case had spent a lot of time researching personal titles and come up with a list of several hundred; but while they had included the "Her Majesty The Queen" as well as the Pope, they'd not thought about those people that have composite titles such as "Rev. Dr." I can state with confidence that this particular company does not count either Her Majesty or His Holiness as customers, but they do have customers with composite titles.

They also recognised that it would be useful for staff to know if a customer's pet might pose a threat should they have cause to visit. But how exactly did they expect to collect this information in the first place?

It's all very well to try to capture data in a structured way, but why list hundreds of titles when a handful cover the vast majority of the population? My advice is to list the common ones (Mr, Mrs, Miss, Ms, Dr, Rev) and then allow for Others through a free-text field. Using the list reduces the number of typographical errors made in entering standard titles and the free-text field will allow for anything else to be entered exactly as the customer wants it.

This isn't rocket science, it's just a pragmatic approach to dealing with data entry screens and validation.

Copyright © 2006 Steve Tuck - All Rights Reserved

Share: del.icio.us Digg Furl ma.gnolia Netscape Newsvine reddit StumbleUpon Yahoo MyWeb  

Posted by Steve Tuck at 1:45 PM | TrackBack

Data Quality - Cause or Symptom?

Visit my full blog at www.dqview.com

It seems to me, that if you work hard enough, you could make a pitch for just about any problem being a data quality problem. But you don't have to work very hard with this example BBC News - Probe into Japan share sale error - instead of selling one share for 610,000 yen, a trader at Mizuho Securities entered the figures the wrong way around and tried to sell 610,000 shares at just 1 yen each.

Of course this is a data quality problem - the figures are clearly wrong, but this is only a symptom of the original problem, not the cause. If the trader had checked the price for these shares, he would have seen that the figure he was using was totally at odds with the normal trading range. If I was a process expert rather than a data quality one, I'd be claiming this as an example of poor business process.

So is that it? Should we quit wasting time trying to resolve data quality issues and put all of our efforts into process improvement instead? My answer is an emphatic no. Poor data quality is often a symptom of poor business processes, but improving and protecting the quality of an organization's information asset requires a rigorous data-centric approach.

Data validation should form part of any business process, not be regarded as something completely separate. We're used to systems validating data on a field by field basis when we enter it, but this rarely goes beyond making sure that the correct fields are populated and a valid format. This is not always enough, as Mizuho Securities discovered.

Imagine if the trading system had checked the share price, spotted the inconsistency and prevented the erroneous sale from proceeding. Now is that a process improvement or a data quality validation?

Copyright © 2006 Steve Tuck - All Rights Reserved

Share: del.icio.us Digg Furl ma.gnolia Netscape Newsvine reddit StumbleUpon Yahoo MyWeb  

Posted by Steve Tuck at 1:15 PM | TrackBack

User error

Visit my full blog at www.dqview.com

When The Data Warehousing Institute asked in a survey "where does dirty data come from?" the main cause cited was sloppy data entry. But my experience is that it's sometimes unfair to blame the users; let me give you an example.

I was asked to look at some problem addresses for a UK-based client's data migration project. The dodgy records were coming from the company's CRM system and the users entering the data were being blamed for the poor quality.  When I looked at the data, I spotted a trend - all of the information was there, just in the wrong order, so I asked to see the data entry screen.

I talked to some of the data entry staff, and watched them enter some new customer records.  Every record they entered looked fine; the addresses on the screen read perfectly.  The problem was the screen layout and the fields that they we putting the address into.

For some reason best known to the CRM system vendor, the address was represented as low-level elements, which appeared on the screen in a 2-column tabular format.  The data entry staff have no idea what a dependant thoroughfare or a double dependent locality are, so they simply entered the address as they would expect to see it on an envelope, using the fields in the left-hand column.

The problem was compounded by the fact that the fields weren't in the order that they occur in a correctly formatted address.  During the migration, the addresses were rebuilt, but this time they followed the Royal Mail's standards, in short the address was put back together in a different order.

So who should we blame for these data quality issues?  Should we put it down to "user error" or should be look to the people responsible for the poorly thought through, and over-engineered CRM system?

Copyright © 2006 Steve Tuck - All Rights Reserved

Share: del.icio.us Digg Furl ma.gnolia Netscape Newsvine reddit StumbleUpon Yahoo MyWeb  

Posted by Steve Tuck at 12:45 PM | TrackBack

What is your data trying to tell you?

Visit my full blog at www.dqview.com

When I start work on a project with a new client, I often hear the same anecdotes repeated again and again. The reason is that these examples are symptoms of the underlying problems; they demonstrate the pain that can result from poor data quality. The normal situation is that no detailed analysis of the data has been performed and there are no metrics showing the cost of non-quality.

You can learn so much from your data, just make the effort to understand it. The anecdotes are high-profile examples of problems that exist, but you need to open your eyes to the full extent of your exposure to poor quality data. Data Profiling has a role to play, whether you choose to use a tool or write your own queries, but there's a lot more to understanding your data. The interpretation of the Data Profiling results is critical; what do the results tell you about your data, your procedures and systems?

Dn_understand_detail_4The method I use to understand data has 3 main steps. First, it requires me to identify who the key stakeholders are and understand what data entities and attributes are critical to the success of the business. This allows me to define what work needs to be done and plan what resources are needed. The second step is to Measure & Analyse the data quality to understand what impact non-quality data is having. This is an iterative process involving Business Impact Workshops, where we define key metrics and rules, and Information Quality Assessments, where we apply those rules and measure the data quality.

The final step is to present the findings. I insist on doing this formally and involving all of the stakeholders, if someone has claimed executive responsibility for data quality this is when they prove that they meant it. I've never gone through this process without unearthing something new and of interest to the sponsor. The usual response is for them to ask what they can do to improve the data quality, but I also point out the importance of protecting good data and having control of their data quality on an ongoing basis.

Copyright © 2006 Steve Tuck - All Rights Reserved

Share: del.icio.us Digg Furl ma.gnolia Netscape Newsvine reddit StumbleUpon Yahoo MyWeb  

Posted by Steve Tuck at 12:45 PM | TrackBack

Retailers need to clean up

Visit my full blog at www.dqview.com

The Global Data Synchronisation (GDS) Network is an Internet-based supply chain initiative that was founded by two international standards groups; UCC (Uniform Code Council) and EAN International. Industry heavyweights, including Tesco, Asda Wallmart, Proctor & Gamble, Unilever, Cadbury Trebor Bassett and Kraft, are among the retailers and suppliers calling for other industry players to sign up to the GDS Network. However, if so many companies are to synchronise their data, it is crucial that it is clean, accurate and up-to-date in order for GDS to work effectively.

Currently, the information which appears when products are scanned differs according to the retailer, but GDS would mean the same standardised data is used on all products, which should cut supply costs by millions. Standardising product codes across the whole supply chain has significant advantages – the relationship between the supplier and the stores would be streamlined, resulting in faster delivery times, better stock control and improved reporting on sales and revenue.

With almost 50% of the UK grocery trade expected to adopt the network within months, this represents an extraordinary opportunity. However, retailers must ensure that the data within their own organisation is correctly aligned in order to benefit from GDS benefits further down the line.

Retailers need to carry out a systematic audit to establish exactly where any discrepancies, omissions or duplications in their data lie. Then the information should be cleaned and consolidated before it is used if the anticipated benefits of GDS are to be realised.

Copyright © 2006 Steve Tuck - All Rights Reserved

Share: del.icio.us Digg Furl ma.gnolia Netscape Newsvine reddit StumbleUpon Yahoo MyWeb  

Posted by Steve Tuck at 12:45 PM | TrackBack

The resurrection of Mr. Smith

Visit my full blog at www.dqview.com

"Nothing in life is certain except death and taxes" - Benjamin Franklin

The truth of the second of these is undeniable, but you could be forgiven for doubting the first if you worked in the branch office of some banks. How would you react if, as a bank teller, your computer records showed that the customer standing in front of you supposedly died a year ago?  “Um,… how are you feeling today Mr. Smith, you’re looking a little pale?”

What leads to this situation is often a muddle of processes, and people using workarounds to beat the system.  For instance, I’ve discovered that a common practice in some banks is to flag a favoured customer as deceased so that they can close a savings account and withdraw money without a penalty.

In other cases the confusion has come as the result of genuine bereavement.  Rather than comply with documented procedures, following the death of a customer somebody has decided that it is more expedient to over-type the original customer’s details with the name of the person who is granted probate.  The one field that can’t be changed by anyone once it has been entered is the date of death; so there it sits, alongside someone else's details.

Given that this is such a sensitive topic, I am, on the one hand, astonished at how some people are willing to act so flippantly, but I also understand why people find these workarounds so useful.  The consequence of their action may be something that can be regarded as a data quality problem, but unless the inadequacies of the underlying processes are resolved, any fix of the data will not be sustainable.

Share: del.icio.us Digg Furl ma.gnolia Netscape Newsvine reddit StumbleUpon Yahoo MyWeb  

Posted by Steve Tuck at 12:45 PM | TrackBack

Data Quality and UK plc

Visit my full blog at www.dqview.com

Yet another UK Government IT project has been delayed due to data quality problems: Computing: Data quality problems halt latest police pilots of firearms database.

Will they ever learn? Then again, they're not the only people to have had problems with data quality; survey after survey shows that up to 85% of IT projects suffer from delay or are cancelled due to data quality issues, but they don't all make the headlines.

Lord Marlesford summed up a common frustration , especially considering the government's aspirations for future IT projects:

‘If the Home Office really is incapable, over a period of eight years, of computerising something as straightforward as a few hundred thousand firearms records, then it does suggest that they do not have a hope of making a success of the introduction of the national identity card scheme.’

Let me know what you think - are all government projects doomed to failure?

Copyright © 2006 Steve Tuck - All Rights Reserved

Share: del.icio.us Digg Furl ma.gnolia Netscape Newsvine reddit StumbleUpon Yahoo MyWeb  

Posted by Steve Tuck at 12:45 PM | TrackBack

January 1, 2006

dq:view - Steve Tuck on data quality - about me

Data management and information quality has been the focus of my career since 1992. As a Project Manager working for Nationwide Building Society I was responsible for creating a single view of the customer by integrating data from disparate product systems into a master data hub.

Since that time, I have worked for leading vendors and consultancies in the UK and Europe, all the time specialising in the "art" of understanding, improving and protecting the quality and value of organisations' data assets and providing control to the business. I've worked with clients on successful data integration projects in the Financial Services, Telecommunications, Utilities, Retail and FMCG sectors; most of those clients are well-known household names.

I'm passionate about data quality and believe that too many organisations are struggling with managing information; transforming, cleaning and matching data using tools that have changed little in all the years I've been working in this space. That's why I joined Datanomic as Chief Strategy Officer in 2005. At Datanomic we believe that information management is a business issue and we're developing and delivering a new generation of products that have a fresh approach – one where the data owner doesn't need to have years of technical experience in order to manage the data asset.

I am also a Charter Member of the International Association of Information and Data Quality (IAIDQ - www.iaidq.org) and Secretary of its UK Community of Practice.

I live in a small Oxfordshire village with my wife, two children and a dog.

Steve ;o)

Copyright © 2006 Steve Tuck - All Rights Reserved

Share: del.icio.us Digg Furl ma.gnolia Netscape Newsvine reddit StumbleUpon Yahoo MyWeb  

Posted by Steve Tuck at 1:00 PM | TrackBack