April 18, 2011

Trouble at the top


Several weeks back now, I presented at IRM’s collocated European Master Data Management Summit and Data Governance Conference. This was my second IRM event, having also spoken at their European Data Warehouse and Business Intelligence Conference back in 2010. The conference was impeccably arranged and the range of speakers was both impressive and interesting. However, as always happens to me, my ability to attend meetings was curtailed by both work commitments and my own preparations. One of these years I will go to all the days of a seminar and listen to a wider variety of speakers.

Anyway, my talk – entitled Making Business Intelligence an Integral part of your Data Quality Programme – was based on themes I had introduced in Using BI to drive improvements in data quality and developed in Who should be accountable for data quality?. It centred on the four-pillar framework that I introduced in the latter article (yes I do have a fetish for four-pillar frameworks as per):

The four pillars of improved data quality

Given my lack of exposure to the event as a whole, I will restrict myself to writing about a comment that came up in the question section of my slot. As per my article on presenting in public, I try to always allow time at the end for questions as this can often be the most interesting part of the talk; for delegates and for me. My IRM slot was 45 minutes this time round, so I turned things over to the audience after speaking for half-an-hour.

There were a number of good questions and I did my best to answer them, based on past experience of both what had worked and what had been less successful. However, one comment stuck in my mind. For obvious reasons, I will not identify either the delegate, or the organisation that she worked for; but I also had a brief follow-up conversation with her afterwards.

She explained that her organisation had in place a formal data governance process and that a lot of time and effort had been put into communicating with the people who actually entered data. In common with my first pillar, this had focused on educating people as to the importance of data quality and how this fed into the organisation’s objectives; a textbook example of how to do things, on which the lady in question should be congratulated. However, she also faced an issue; one that is probably more common than any of us information professionals would care to admit. Her problem was not at the bottom, or in the middle of her organisation, but at the top.

So how many miles per gallon do you get out of that?

In particular, though data governance and a thorough and consistent approach to both the entry of data and transformation of this to information were all embedded into the organisation; this did not prevent the leaders of each division having their own people take the resulting information, load it into Excel and “improve” it by “adjusting anomalies”, “smoothing out variations”, “allowing for the impact of exceptional items”, “better reflecting the opinions of field operatives” and the whole panoply of euphemisms for changing figures so that they tell a more convenient story.

In one sense this was rather depressing, someone having got so much right, but still facing challenges. However, it also chimes with another theme that I have stressed many times under the banner of cultural transformation; it is crucially important than any information initiative either has, or works assiduously to establish, the active support of all echelons of the organisation. In some of my most successful BI/DW work, I have had the benefit of the direct support of the CEO. Equally, it is is very important to ensure that the highest levels of your organisation buy in before commencing on a stepped-change to its information capabilities.

I am way overdue employing another sporting analogy - odd however how must of my rugby-related ones tend to be non-explicit

My experience is that enhanced information can have enormous payback. But it is risky to embark on an information programme without this being explicitly recognised by the senior management team. If you avoid laying this important foundation, then this is simply storing up trouble for the future. The best BI/DW projects are totally aligned with the strategic goals of the organisation. Given this, explaining their objectives and soliciting executive support should be all the easier. This is something that I would encourage my fellow information professionals to seek without exception.

April 10, 2011

Data visualisation

Some pictures speak for themselves:

If you don't know what this is, check out the announcement from the CDF Collaboration at: - All you have to do is click here. HINT: the peak at 140 GeV/c^2 may be important.

The triangle paradox – solved

When I posted The triangle paradox, I said that I would post a solution in few days. As per the comments on my earlier article, some via Twitter and indeed the context of the article in which this supposed mathematical conundrum was posted, the heart of the matter is an optical illusion.

If we consider just the first part of the paradox:

More than meets the eyes

Then the key is in realising that the red and green triangles are not similar (in the geometric sense of the word). In particular the left hand angles are not the same, thus when lined-up they do not form the hypotenuse of the larger, compound triangle that our eyes see. In the example above, the line tracing the red and green triangles dips below what would be the hypotenuse of the big triangle. In the rearranged version, it bulges above. This is where the extra white square comes from.

It is probably easier to see this diagrammatically. The following figure has been distorted to make things easier to understand:

Dimensions exaggerated

Let’s start with my point about the triangles not being similar:

EAB = tan-1(2/5) ≈ 21.8°

FAC = tan-1(3/8) ≈ 20.6°

So the two triangles are not similar and, as stated above, the two arrangements don’t quite line up to form the big triangle shown in the paradox. There is a ”gap” between them formed by the grey parallelogram above, whose size has been exaggerated. This difference gets lost in the thickness of the lines and also our eyes just assume that the two arrangements form the same big triangle.

To work out the area of the parallelogram:

AE = (22 + 52)½ = √29
EI = (32 + 82)½ = √73
AI = (52 + 132)½ = √194

The area of a triangle with sides a, b and c is given by:

Area of triangle

Sparing you the arithmetic, when you substritute the values for AE, EI and AI in the above equation, the area of ∆ AEI is precisely ½.

∆ AEI and ∆ AFI are clearly identical, so the area of parallelogram AEIF is twice the area of either is

2 x ½ = 1

This is where the ”missing” square comes from.

As was pointed out in a comment on the original post, the above should form something of a warning to those who place wholly uncritical faith in data visualisation. Much like statistics, while this is a powerful tool in the hands of the expert, it can mislead if used without due care and attention.

April 8, 2011

Illuminating the darkness


My partner was kind enough to buy me an Amazon Kindle for Christmas and I have enjoyed using it. Yes there were the problems with them registering me to, rather than (thereby incurring foreign transaction charges). And yes they didn’t cancel a trial Economist subscription I took out on the former when I was transferred to the latter. However, these issues were sorted out and money refunded.

I suppose I had the same initial reaction as many people; that they had left a sticker covering the screen, which was intended to demonstrate what the display looked like. After failing to peal it off (thankfully not too energetically) I realised that the screen was actually that clear and that different from a “normal” computer display (I was thinking smart ‘phone or laptop). I am writing this post on one of my many laptops, the screen is OK, but the Kindle is much easier on the eye and pretty close to a high-quality printed page. Suffice it to say that I downloaded new copies of several of my favourite books to it with the prospect of re-engaging with them at my leisure.

But enough of me singing the general praises of the device, I have discovered a particular benefit. While this may well be realised by other people, it is of particular pertinence to devotees of the works of Joseph Conrad.

Joseph Conrad

As one of the undisputed giants of English prose, it is rather ironic that English itself was either Conrad’s fifth, or sixth, language (chronologically: Polish; Russian – though he later, perhaps understandably given the turbulence of the times, repudiated this as a language; French; Latin; German; and – finally, when he was in his twenties, English). I have greatly appreciated his work, since first reading Heart of Darkness. I won’t attempt to offer a literary appreciation of his genius and leave this to others with greater talents in that area. However, despite coming late to the English tongue, Conrad was a master of it and had an amazing vocabulary.

An indispensable companion to Conrad's works

I generally view myself as being reasonably erudite (less charitably I have been accused of having swallowed a thesaurus), but used to have to keep a dictionary at hand when reading Conrad; either that or try to impute meaning from context (probably getting it wrong more times that I care to admit). In some ways, my own limitations slightly diluted my enjoyment of reading. It is a bit distracting to put down one book, pick up a dictionary, look up a word and then revert to the original tome (it was even more complicated as a child reading Jules Verne’s 20,000 Leagues under the Sea with both a dictionary and gazetteer to hand!).

Incidentally my fondness of Conrad led to my one contribution to the field of science. I established my result after extensive fieldwork involving Nostromo and a daily commute. Thomas’ Theorem is as follows:

While this feat is more than achievable with the works of other authors, it is impossible to read Conrad on the Tube.

However, the Kindle is a joy in this respect as you can look up words using the built in dictionary, quickly, easily and without disturbing the thread of the narrative too much. This has got me out of my rather lazy habit of assuming that I sort of know what a word means and thereby given me a few surprises. Based on the the initial illustration above, for example, I had to modify my understanding of recrudescence!

Of course this means that I may have to re-evaluate whether Thomas’ Theorem holds in all conditions. Perhaps a sub-clause excluding the use of a Kindle is required. I will report back…

This is not the first time that Conrad has appeared in the pages of this blog, I had the temerity to also reference him in Aphorism of the Week some time ago.

April 7, 2011

What is wrong with this picture?

Following on from the optical illusions that I featured earlier in the week, here is another picture with something subtly (or perhaps not so subtly) wrong with it. Can you spot what?

So which one is your favourite?

April 4, 2011

The triangle paradox

This seems to be turning into Mathematics week at The “paradox” shown in the latter part of this article was presented to the author and some of his work colleagues at a recent seminar. It kept company with some well-know trompe l’œil such as:

Old or young woman?




Parallel lines?

However the final item presented was rather more worrying as it seemed to be less related to the human eye’s (or perhaps more accurately the human brain’s) ability to discern shape from minimal cues and more to do with mathematical fallacy. The person presenting these images (actually they were slightly different ones, I have simplified the problem) claimed that they themselves had no idea about the solution.

Consider the following two triangles:

Spot the difference...

The upper one has been decomposed into two smaller triangles – one red, one green – a blue rectangle and a series of purple squares.

These shapes have then been rearranged to form the lower triangle. But something is going wrong here. Where has the additional white square come from?

Without even making recourse to Gdel, surely this result stabs at the heart of Mathematics. What is going on?

After a bit of thought and going down at least one blind alley, I managed to work this one out (and thereby save Mathematics single-handedly). I’ll publish the solution in a later article. Until then, any suggestions are welcome.

April 3, 2011

Half full, or half empty?

Glass half, er...

Someone being described as a “glass half-full” or “glass half-empty” sort of person is something that one hears increasingly frequently. I was recently discussing this with a friend and we both agreed that the analogy was unhelpful. First it supports a drastically simplistic and binary view of people having fixed attitudes and behaviours in all circumstances. Day-to-day observation suggests on the contrary that a person my be an avid optimist one day about one thing and a manic pessimist the next day about another thing. This rather shallow type of characterisation rather reminds me of some of the subjects I touched on in Pigeonholing – A tragedy some time ago.

However, there is a more fundamental consideration; wilful inaccuracy. A glass that is half empty is also half full; that’s the definition of a half. Either description is 100% valid and therefore logically can tell you nothing about the person’s mindset.

Instead what might be more apposite is to adopt a different way to divide sheep from goats. This is still rather too binary for my taste, but at least it has the merit of a greater degree of rigour. I propose dividing people according to how they view a glass that is three quarters empty:

I think that all of our lives would be much the better for adopting this simple principle.

The International Organisation for stamping out sloppiness in spoken speech

Accordingly, I am going to submit this recommendation to the International Standards Organisation for their urgent consideration. I’ll make sure that I keep readers up-to-date with how my submission progresses.

