<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
<channel>
<title>Donald Farmer: Foraging in the Data Forest</title>
<link>http://www.beyeblogs.com/donaldfarmer/</link>
<description>Donald Farmer, from the Microsoft SQL Server Analysis Services team, blogs from behind the scenes as his team work to build the leading analytics platform. Subjects covered include data mining, data quality, data integration ... you know, all that data stuff.</description>
<language>en</language>
<copyright>Copyright 2010</copyright>
<lastBuildDate>Fri, 29 Jan 2010 17:30:00 -0700</lastBuildDate>
<generator>http://www.movabletype.org/?v=3.33</generator>
<docs>http://blogs.law.harvard.edu/tech/rss</docs> 


<item>
<title>QlikView from a PowerPivot standpoint</title>
<description><![CDATA[<p>  <br />
A week or so Darren Kerfoot of QlikPower, a QlikView consultancy, wrote a thought-provoking blog about <a href="http://www.qlikpower.com/blog/bid/30751/Microsoft-s-new-PowerPivot-from-a-QlikView-standpoint">"PowerPivot from a QlikView standpoint."</a> Please do read it: I found it nicely balanced. The thought it provoked in me, and which I tweeted, half joking, was that I might blog from the other side of the fence. I was quite surprised by the number who said I should. So here goes ... what does QlikView look like from the PowerPivot standpoint?<br />
 <br />
I will follow the same headings as Darren uses, to make comparisons more readily, although I should say that perfectly I would structure a full comparison somewhat differently.<br />
 <br />
I will be quite critical of some aspects of QlikView, but let's be clear from the start. In general I think QlikTech are a very smart company. They have excellent growth and great customer satisfaction. Personally, I think that mostly comes from compelling marketing, an innovative and effective sales process, and excellent customer support; the product is good enough to sustain this. QlikTech are an excellent Microsoft partner and it's good to see their success.</p>

<p>Darren does say that his initial reaction to PowerPivot (on seeing it demonstrated by the ever-admirable Rafal Lukawiecki)  was that it was quite familiar to him from his QlikView experience. Our hope, as the PowerPivot product team, is that it will be even more familiar to Excel and SharePoint users. In fact, I might say that the innovative features of PowerPivot are less important than what we haven't invented: there's a great advantage to our Excel-like querying interfaces, our Excel-like syntax for expressions, our use of Pivot Tables for analysis, and our use of the familiar SharePoint document model for publishing and managing analyses, and so on. <br />
 <br />
<strong>Underlying Technology</strong><br />
As Darren says, PowerPivot, under the bonnet, uses an in-memory store. QlikView does too. Then again, so does Tibco Spotfire, IBM TM1, Advizor,  PivotLink and Altosoft to name only a few in the BI space. And of course column-stores, such as the Vertipaq technology that PowerPivot uses, are common in the relational database world, while Sybase, Oracle and IBM also have in-memory relational. As QlikView does not use a column store, PowerPivot is really more like those other systems, with QlikView being the exception. The differences are minor for the end user: QlikView appears to compress strings more effectively than PowerPivot, although that can catch you by surprise when the compressed data is uncompressed back into memory; PowerPivot appears to perform much better with calculated columns. I haven't done anything like benchmarking on this - just my own observations from running both on the same box.</p>

<p>Talking of engines, some have been misled to believe that QlikView's supposed "associative analysis" represents some significant engine smarts. I have even heard analysts very misleadingly say that QlikView has "association rules" - implying some kind of data mining, such as Microsoft implements in its Data Mining server and Excel Add-ins. QlikView add to the confusion by talking about an associative "architectural model." However, despite the hype, as Curt Monash points out (or rather, painfully extracted from QlikTech themselves through a long thread of comments) it is not so:  "The associative aspect is really more meaningful in describing the end user experience, in that you see visually what is associated and is not associated with any particular selection or drilldown." As Curt says, "Thank you for admitting that clearly!!! It wastes a fair amount of analysts' time when your company pretends otherwise." <a href="http://www.dbms2.com/2008/08/04/qliktech-qlikview-update/">http://www.dbms2.com/2008/08/04/qliktech-qlikview-update/</a> </p>

<p>So I guess we all look pretty similar. Of course, at Microsoft we think we have some particular smarts in our engine, but in general the data-handling capabilities should be similar. I do hear of QlikView customers having difficulty scaling - it will be interesting to find PowerPivot's limits, too. It's early days for that, of course, but I'm sure we'll find them.</p>

<p><strong>Sample Applications</strong><br />
Darren is right, our sample apps on www.powerpivot.com are pretty familiar. There are a limited number of public data sets out there to share: sports, and so on, are common ground. So many of these sample application sites are similar. Out of context for this post, but still well worth a visit, is the sample page of our friends at Tableau. <a href="http://www.tableausoftware.com/learning/examples">http://www.tableausoftware.com/learning/examples</a> Now folks, THAT's a set of sample apps!</p>

<p><strong>Slicers</strong><br />
I think Darren's right - QlikView users will see nothing too exciting here. However, our target audience of Excel users (and especially PivotTable users) do like Slicers very much. I think they are pretty basic in the first version - in the future, expect to see an even better experience. Nevertheless, they are a very natural step up from traditional pivot table filters, and provide a nice visual interface for those functions: and they work in Excel services too, bringing browser interaction alive for those users who consume, rather than produce, PowerPivots.</p>

<p><strong>Market Exposure</strong><br />
We have long touted the idea of releasing the PowerPivot add-in as effectively a no-cost download. In fact we were talking about this publicly before QlikTech made their Personal Edition free on the desktop. Maybe they were hoping to pre-empt us? Maybe not. Probably, they just thought it was a good idea. </p>

<p>After all, it <em>is </em>a good idea. For Microsoft, we had already released the Data Mining Add-ins for Excel as a free download, so we knew this model was attractive. For QlikView, it was a good way of heading off complaints about the high cost of ownership. High TCO is still the top complaint I hear about QlikView from their customers, even though their pricing model seems, at first glance, quite modest. However, we see many cases where consulting fees have grown dramatically for QlikView users, who often bought into a story of software that was so easy to use that applications could be built in hours. It can be true in some cases, but there are a helluva lot of QlikView consultants out there, doing very well for themselves, and rather contradicting that myth.</p>

<p>There's a similar story around partnerships. QlikTech have been quick to say "no data warehouse required" and aim to reduce the dependency on the IT department: music to the ears of many business users. Yet you only have to look to QlikView's partnerships to see the weakness in that argument. Some examples:</p>

<p>:: QlikTech Announces Support for HP Neoview: <a href="http://www.qlikview.com/Contents.aspx?id=8702">http://www.qlikview.com/Contents.aspx?id=8702</a> "Robust data warehouse provides a scalable foundation on which QlikView delivers user-driven analysis"<br />
:: QlikTech and Informatica: <a href="http://www.qlikview.com/Contents.aspx?id=9270 ">http://www.qlikview.com/Contents.aspx?id=9270 </a>"The combination of Informatica's robust data integration products ... with QlikView ... enables enterprises to optimize their entire data management process"<br />
:: QlikTech and Kalido: <a href="http://www.qlikview.com/Contents.aspx?id=6628 ">http://www.qlikview.com/Contents.aspx?id=6628 </a>"For QlikTech customers, Kalido provides a robust, enterprise-ready information management capability." </p>

<p>Perhaps QlikTech's marketing team just need a new thesaurus, but it sure looks like there's a problem in the delivery of "robust" solutions that requires data warehouse and data integration partners to solve.</p>

<p>This, ultimately, is the big difference between the market exposure of PowerPivot and the market exposure of QlikView. PowerPivot is one part of a seamless story stretching from robust and scalable enterprise data solutions, such as our Parallel Data Warehouse, through middle-tier applications such as SQL Server Integration Services, down to flexible personal analytics on the desktop with Excel. QlikView, great application though it is, offers only one part of that story: on the one hand it must pull in partnerships for robust enterprise applications, and on the other ... well, even QlikView needs "Export to Excel" to go the last mile in flexibility and agility. </p>

<p><strong>DAX (Data Analysis Expressions) </strong><br />
DAX is a winner. Users can start from simple Excel-like expressions and build-up to really quite sophisticated dimension-navigating, time-aware functions that return in-memory table objects for further, nested, functionality. I wish it was easier at the advanced level, but so far users are really delivering great applications with DAX. There is a contrast with QlikView that Darren does not call out. QlikView requires a LOT of scripting. When QlikTech presented at the Boulder BI Brains Trust last year, this was noted in several tweets during the demo - you really need to know your way around Visual Basic scripts to get the most out of QlikView. This is certainly a problem. I know, because we used a lot of scripting in a previous product on which I worked: SQL Server Integration Services. In fact, I wrote a book just about the use of script in that application. I can tell you - scripting is not for business users. </p>

<p>There is an irony here. As a developer, in some cases, I might actually prefer a script to some of the complex nested expressions that I build in DAX, even with the auto-complete and parentheses-parsing tools we provide. But for business users, that certainly isn't the case. The flow of control needed in scripting is just not how they think: and the advanced Excel user in marketing or finance, is often quite expert at building and debugging complex functions.</p>

<p><strong>Summary</strong> <br />
I hope you have found this interesting. Let me re-iterate. QlikView is an excellent application, and QlikTech sell it in a very compelling manner. I don't expect PowerPivot will put a huge dent in that, because it really is the marketing and sales process, rather than the software, that have brought QlikTech their remarkable growth. QlikTech are especially good in small geographies, where their hard-driving, high-octane, hands-on, move-so-fast-they-can't-see-the-problems style works very effectively, especially when they reach business users directly, rather than IT. In the larger, more mature, and better-served US market they still grow strongly, but don't have the same mindshare.</p>

<p>I do expect that PowerPivot will lengthen sales cycles for the QlikView team in many cases. However, I also expect that in many organizations we will co-exist quite happily. Excel power-users will love and use PowerPivot. Users who enjoy QlikView's polished UI and navigation tools, will no doubt still enjoy that experience. These do appear to be two separate groups of users. I find very few Excel power users to be QlikView converts.</p>

<p>Darren is certainly right: interesting times lie ahead!</p>

<p>For more information on QlikView, see http://www.qlikview.com</p>

<p>For more information on PowerPivot, see http://www.powerpivot.com </p>]]></description>
<link>http://www.beyeblogs.com/donaldfarmer/archive/2010/01/qlikview_from_a.php</link>
<guid>http://www.beyeblogs.com/donaldfarmer/archive/2010/01/qlikview_from_a.php</guid>
<category></category>
<pubDate>Fri, 29 Jan 2010 17:30:00 -0700</pubDate>
</item>

<item>
<title>My 10 favourite business intelligence blog posts of 2009</title>
<description><![CDATA[<p>As the year draws to a close, I thought it would be fun to collect my ten favourite blog posts from 2009 on the subject of Business Intelligence. In my daily reading, I bookmark favourite posts, so I thought it would be quite easy for me to narrow down the field. However, I found over 50 posts in my bookmarked list, so I had to read them all again - a very enjoyable task - and I had to think of some criteria.<br />
I found, on reflection, that there are some important aspects of any blog that appeal to me.<br />
Firstly, I value breadth and depth of knowledge. All the posts listed here are by experts. Some are notable for the depth of experience they reveal - Jill Dyche and Evan Levy's posts are fine examples. Others are notable for the writers ability to synthesize a range of experience: I particularly like Scott Davis's post on strategy, and the dissections of licensing and benchmarks by Merv Adrian and Curt Monash, in this regard.<br />
Next, I appreciate blogs that engage the readers, building a sense of community and engagement. Andy Bitterer, Neil Raden and Stephen Few all get that right in there posts.<br />
Finally, I admire bloggers who share their insights and even their materials openly and willingly. Richard Hackathorn on behalf of the Boulder BI Brain Trust, and Mark Madsen, have great sharing posts.<br />
I hope you like the list. By limiting myself to 10 posts, I restricted myself from including many great posts from friends and colleagues. I'll post a similar, but very different, list of top 10 SQL Server posts of 2009 on my SQL Server blog at: <a href="http://blogs.technet.com/sqlserverexperts/ ">http://blogs.technet.com/sqlserverexperts/ </a><br />
 And, in early 2010, I will post a "blogroll" of my favourite BI bloggers here to share the love a little.</p>

<p>So, here are my 10 favourite BI blog posts of the last year, in date order...<br />
 <br />
<strong>Andy Bitterer</strong><br />
Setting the Record Straight<br />
December 28th, 2008 <br />
<a href="http://blogs.gartner.com/andreas_bitterer/2008/12/28/setting-the-record-straight/">http://blogs.gartner.com/andreas_bitterer/2008/12/28/setting-the-record-straight/</a><br />
I know, I know, this was posted in 2009. However, it just squeezes into one calendar year ago. Besides, the thread continues well into 2009 and good blogs are living documents, not one-off missive to   anonymous masses.<br />
It was just great to see a top Gartner analyst engaging like this with the community of customers, vendors and experts. Andy took Talend's Yves de Montcheuil to task over his comments about Gartner's approach to open source. Yves and many others responded and the result was a most engaging, and mostly enlightening, debate. <br />
And, as the saying goes, all's well that ends well, for Talend were able to blog in November that they are, indeed, in the latest Gartner Magic Quadrant for data integration: <a href="http://www.talend.com/blog/2009/11/30/gartner-recognizes-open-source-as-enterprise-data-integration/">http://www.talend.com/blog/2009/11/30/gartner-recognizes-open-source-as-enterprise-data-integration/</a></p>

<p><strong>Neil Raden</strong><br />
March 29, 2009<br />
From 'BI' to 'Business Analytics,' It's All Fluff<br />
<a href="http://intelligent-enterprise.informationweek.com/blog/archives/2009/03/_from_bi_to_bus.html">http://intelligent-enterprise.informationweek.com/blog/archives/2009/03/_from_bi_to_bus.html</a><br />
Neil's blog is often pointedly to-the-point. In this post, which engendered both comments and controversy, he takes SAS to task for coining the phrase "business analytics" to position their software. Without taking sides, I can certainly say that, for me, this is great blogging. Neil is incisive in his post, without being mean-spirited.</p>

<p><strong>Scott Davis</strong><br />
March 29, 2009<br />
Beyond the Big Bang: Strategy as Habit<br />
<a href="http://circaspecting.typepad.com/circaspecting_musings_on_/2009/03/beyond-the-big-bang-strategy-as-habit.html">http://circaspecting.typepad.com/circaspecting_musings_on_/2009/03/beyond-the-big-bang-strategy-as-habit.html</a><br />
Scott, of Eyeris and Lyzasoft, is one of the most thoughtful business leaders I know, in every sense. He thinks deeply about leadership and innovation, and he always thinks of others in a respectful and caring way. All of this comes across in his blog, which I think has some of the best leadership insights I have read all year. This was a typically aware post by Scott, by every post is excellent. (And wasn't March 29 a great day for blogging?)</p>

<p><strong>Mark Madsen</strong><br />
May 22, 2009<br />
Open Source BI in the Real World - MySQL Keynote Slides and Video<br />
<a href="http://www.b-eye-network.com/blogs/madsen/archives/2009/05/open_source_bi_1.php">http://www.b-eye-network.com/blogs/madsen/archives/2009/05/open_source_bi_1.php</a><br />
Mark and I share a passion for crafting slide decks that go beyond standard templates and bullet points. Even better, Mark shares many of his decks and presentations online too. In this blog, he writes about and shares a fascinating keynote from the MySQL conference.</p>

<p><strong>Richard Hackathorn </strong><br />
July 10, 2009<br />
Boulder BI Brain Trust<br />
<a href="http://boulderbibraintrust.org/cgi-bin/mt/mt-search.cgi?search=birst&IncludeBlogs=1">http://boulderbibraintrust.org/cgi-bin/mt/mt-search.cgi?search=birst&IncludeBlogs=1</a><br />
It's difficult to choose just one post from the Boulder BI Brain Trust, because really it is the concept, and the continuing engagement of so many vendors and experts that makes this blog special. Boulder, for those who don't know is not only where Mork met Mindy, it is also where by happy chance, home to many of the best brains in BI. They meet often on a Friday, hosted by Claudia Imhoff, to see the latest offerings of invited vendors. Richard Hackathorn often writes up the blogs, and captures the sharply critical, but friendly, atmosphere well. Even better - you can follow their immediate reactions on Twitter using the hashtag #BBBT. As they monitor tweets during the vendors discussions, the result is one of the most interactive and entertaining forums for BI in the virtual world.</p>

<p><strong>Evan Levy</strong><br />
July 29, 2009<br />
<a href="http://www.evanjlevy.com/2009/07/good-data-warehouse-dbas-are-hard-to-find.html ">http://www.evanjlevy.com/2009/07/good-data-warehouse-dbas-are-hard-to-find.html</a><br />
Good Data Warehouse DBAs are Hard to Find<br />
If I were to award - oh, why not? I hereby award ... Evan Levy as "blogger of the year" in my opinion. It was great to see Evan starting to blog, and he has not disappointed. The sheer practicality and hard-won insights of Evan's posts are hard to beat. As I work so much with DBAs, I found this one, about the role of the data warehouse DBA, immensely useful. DBAs and CTOs in my executive briefings will recognize some of these observations - I shamelessly reuse them, and credit the source, of course.</p>

<p><strong>Curt Monash</strong><br />
August 10, 2009<br />
Sorting out Netezza and Oracle Exadata Data Warehouse Appliance Pricing<br />
<a href="http://intelligent-enterprise.informationweek.com/blog/archives/2009/08/sorting_out_net.html">http://intelligent-enterprise.informationweek.com/blog/archives/2009/08/sorting_out_net.html</a><br />
Perhaps it raised more questions than answers, but Curt's post carefully picked through the complexities of the licensing and positioning of two important vendors. I think Curt got as near as possible to the bottom line, and did a great service by synthesizing the breadth and depth of his knowledge for us.</p>

<p><strong>Stephen Few</strong><br />
August 19th, 2009 <br />
True Stories about the Benefits of Data Visualization<br />
<a href="http://www.perceptualedge.com/blog/?p=601">http://www.perceptualedge.com/blog/?p=601</a><br />
Stephen's blog ruffles feathers with his bluntness and direct comments on technologies and individuals alike. He certainly ruffles mine:  I'm on record on thinking the style mean-spirited, though I would hate to judge the man by the mannerism. Nevertheless, there's no denying Stephen's expertise and knowledge. His most recent book, Now You See It, would be top of my book recommendations from this past year. In this particular blog post, he reaches out his readers, expressing his frustration at the lack of empirical proof that visual analytics  have real measurable business results. It's a great post in its simple pragmatism, and engages the community well, as you can see from the comments. Follow Stephen's blog, it will raise your understanding, and your blood pressure!</p>

<p><strong>Jill Dyche</strong><br />
November 10, 2009<br />
They're Baaaack! IT Spending in Retail Returns<br />
<a href="http://www.jilldyche.com/2009/11/theyre-baaaack-it-spending-in-retail-returns.html">http://www.jilldyche.com/2009/11/theyre-baaaack-it-spending-in-retail-returns.html</a><br />
Jill's blog is often laugh-out-loud funny, but don't let that fool anyone: you'll also find some of the most insightful writing about business intelligence, master data management and data governance.  This post was exceptional in its depth and breadth. I have seen books on the business shelves of Hudson's in the airport with less useful information than this one post.</p>

<p><strong>Merv Adrian</strong><br />
December 14, 2009<br />
Oracle's TPC Assertions Don't Help Its Credibility<br />
<a href="http://mervadrian.wordpress.com/2009/12/14/oracle’s-tpc-assertions-dont-help-its-credibility/">http://mervadrian.wordpress.com/2009/12/14/oracle’s-tpc-assertions-dont-help-its-credibility/</a><br />
Merv is a former Forrester analyst who brings great rigour, but also really good writing, to the world of BI blogs. This recent post is a great example of his style - detailed, reflective and forward-looking at once, and enjoyable to read. Merv also engages actively, perceptively, and with great respect with commenters - a model of how to do so.<br />
</p>]]></description>
<link>http://www.beyeblogs.com/donaldfarmer/archive/2009/12/my_favourite_bi.php</link>
<guid>http://www.beyeblogs.com/donaldfarmer/archive/2009/12/my_favourite_bi.php</guid>
<category></category>
<pubDate>Sun, 27 Dec 2009 20:45:00 -0700</pubDate>
</item>

<item>
<title>A Christmas Letter</title>
<description><![CDATA[<p>Dear Blogger,<br />
I have been working in Business Intelligence for 8 years. Some of my little colleagues say there is no One Version of the Truth. My manager says, if you read it on B-Eye-Network it is so. Please tell me; is there One Version of the Truth?<br />
Virginia O'Hanlon</p>

<p><br />
Virginia, your little colleagues are wrong. They have been dumbed down by a remarkably dumb age. They think, because their little minds can't handle the truth, that there cannot be any, not even one little version. All minds, Virginia, are little, and RAM upgrades are not available for BI practitioners, nor even for bloggers. In this great universe of ours, Business Intelligence is only capable of grasping that which can be extracted, loaded, transformed, aggregated, mined and visualized, and that is not the whole of truth and knowledge.</p>

<p>Yes, VIRGINIA, there is One Version of the Truth. It exists as certainly as the need to get out next quarter's numbers without arousing the suspicions of the SEC. Alas! How difficult corporate life would be if there were no One Version of the Truth. There would be no knowing if Marketing's "free camo underwear with every order" campaign had actually increased sales among the duck-hunting demographic, or just attracted some very lonely surfers exploring the outer reaches of online shopping. The amicable agreement on metrics which today lightens the business of every organization that has implemented a data warehouse would be extinguished.</p>

<p>Not believe in One Version of the Truth! You might as well not believe in a Balanced Scorecard! You might get your manager to hire analysts to audit your metrics, tracing each data element and its metadata back to its source, but even if they all found different answers what would that prove? Nobody knows what the One Version of the Truth is, but that is no sign that there is no One Version of the Truth. The most real things in our businesses aren't understood by anyone. Did you ever see a Mortgage-Backed Security, or a Collateralized Debt Obligation? Of course not, but that's no proof that they do not exist.</p>

<p>You may slice and dice the raw data to see what the numbers "really" are, but there is a veil covering the unseen world of business metrics that not even the brightest man could tear apart. Codd couldn't model it, nor Tufte visualize. </p>

<p>No One Version of the Truth? It lives, it lives forever. A thousand years from now, Virginia, nay, ten thousand years from now, it will continue to keep us all in work.</p>]]></description>
<link>http://www.beyeblogs.com/donaldfarmer/archive/2009/12/a_christmas_let.php</link>
<guid>http://www.beyeblogs.com/donaldfarmer/archive/2009/12/a_christmas_let.php</guid>
<category></category>
<pubDate>Tue, 22 Dec 2009 12:45:00 -0700</pubDate>
</item>

<item>
<title>Simpson&apos;s Paradox and a Data Quality problem</title>
<description><![CDATA[<p><a href="http://www.dataflux.com/dfblog/">http://www.dataflux.com/dfblog/</a>One of favourite writers on matters of data quality, is David Loshin. (You know who the other one is, Frank!) David blogs, in good company, over at the excellent DataFlux Community of Experts  - be sure to subscribe to the feed.</p>

<p>Back in September, David published an interesting <a href="http://www.dataflux.com/dfblog/?p=1058">blog </a>on applying Pareto's principle to data cleansing and other systemic improvements. As he summarizes it "there is some point where the incremental value you get is not worth the investment ...  the level of effort to get incremental improvements is greater than the value generated by having the improvement."</p>

<p>This reminded me of a paradox which I wielded recently, in persuading a customer to tackle some data quality issues. The proposition is Simpson's Paradox, and, perhaps because it reminds me of Simpson's Hospital in Edinburgh, I always explain it first in medical terms. Here goes . . .</p>

<p>When comparing results of a difficult operation, Hospital A has a 75% survival rate and Hospital B has a 90% survival rate. Which is the better hospital? Which would you choose? It could well be Hospital A.</p>

<p>Truth is, we don't have the data to make a decision. Hospital A may be in a poorer part of town, with patients in worse general health and presenting with more advanced symptoms. Hospital B, on the other hand, has well-insured patients, benefitting from good health and regular screening. The surgeons in Hospital A may in fact, by every measure, be better than those in Hospital B yet Hospital A could still have a lower survival rate.</p>

<p>How does this apply to data quality? Well, try a thought experiment where you replace surgery with a data quality process, and the health of the patients with the initial quality of your data. </p>

<p>In the case of my customer, faced with a limited budget, they had to choose between two different data cleansing initiatives. They were a long-standing supplier of building components in the mid-west and had a well-established B2B customer list. However, that customer list was riddled with inaccuracies. Faced with a changing market, they were sure they had to improve the impact and accuracy of their B2B direct mails. A vendor offered them a low-cost solution to cleaning up their mailing lists, promising a remarkably high degree of accuracy. However, they had another problem: their product database was also outdated, with many discontinued products and categorizations that were no longer in line with industry practices.</p>

<p>The decision of which data to tackle was difficult, however, because improving the product catalog would be a tough job. Think of all the products available in Home Depot, and all their possible categorizations by product type (hardware, lighting), project type (bathroom, kitchen), supplier and so on. Moreover, there were few tools to help. Perhaps getting the entire catalog up to high quality would be impossible on their budget and timescale. Address cleansing on the other hand held out the promise of this high "survival rate." In fact, the promised 95% accuracy was a very tempting number.</p>

<p>Together with the customer, we considered the options, and I explained Simpson's paradox. It is not an exact parallel, but it helps to illuminate these issues. A high-quality mailing list backed by a poor quality product catalog would make good execution on leads and sales difficult. A moderate-quality mailing list, backed by an improved catalog would enable better execution, but there would still be some overspend on mailings and contacts. Nevertheless, improving the product catalog to 80% accuracy (it was bad!) could prove to be a better investment than improving the mailing list to 95% accuracy.</p>

<p>In the end, a healthy order for a discontinued product proved to be the deciding factor. The customer is now upgrading their product catalog.</p>]]></description>
<link>http://www.beyeblogs.com/donaldfarmer/archive/2009/10/simpsons_parado.php</link>
<guid>http://www.beyeblogs.com/donaldfarmer/archive/2009/10/simpsons_parado.php</guid>
<category></category>
<pubDate>Sun, 25 Oct 2009 23:15:00 -0700</pubDate>
</item>

<item>
<title>A future for BI? Signs point to yes.</title>
<description><![CDATA[<p>It's a year now since Dr Jerry Lundegaard of the University of Eastern North Dakota at Fargo published his groundbreaking book "Behind the 8-Ball: Making Decisions in the New Economy." Like previous works setting out new approaches to business, Lundegaard shook up boardrooms with his insightful understanding of how business decisions are made. However, Lundegaard's radical idea was to avoid data-centric, cumbersome approaches such as the Data Warehouse, the Corporate Information Factory, or the Balanced Scorecard. Instead, Lundegaard realized that most decisions are made <em>in spite of</em> the data, not because of it. So he recommended throwing out the stale paradigms and replacing them with (and here's the clever part) the Magic 8-Ball. </p>

<p>At first, Lundegaard's radical vision was only slowly accepted, but his breakthrough moment came when he was the highlight of the Discovery Channel's "Mythbusters: One Version of the Truth." From  that point on, software vendors flocked to support the 8-Ball methodology, including the five traditional Business Intelligence megavendors. We interviewed each of these software giants about their 8-ball tools and methodologies.</p>

<p>SAS spokesman Naeve Bayes is enthusiastic about the advanced 8-ball technology they have brought to market. "It's mathematically more spherical than the regular 8-ball. Business users no longer rely on obscure guess work. They can have more confidence in: (X-Xo)^2     (Y-Yo)^2   (Z-Zo)^2 = R^2. This is not just a Magic 8-Ball, it's an Enterprise Center of Eightness."</p>

<p>If a customer asks the wrong question? Bayes understands the problem well, "You'll get a mathematically more spherical misdirection. It's very powerful." She goes on to describe how financial services customers who had established a Center of Eightness not only failed more completely, they failed more quickly than those using traditional technologies. "This shows that not only is the 8-ball methodology more predictive, it is more efficient too," said Bayes. "Some 8-ball customers were way ahead of the downward curve - that's the kind of advantage you need in today's fast changing environment." </p>

<p>Nevertheless, when it comes to business decisions, many people still believe you cannot get fired for buying IBM. What approach should we expect from that most venerable of vendors?</p>

<p>"It's really a services play for us," said IBM's field manager Bill Ablours. "And most importantly, we are the only vendor who can provide the complete hardware, software and services 8-Ball solution."  Challenged that even IBM's services may be overstretched to meet growing business demands, Ablours responds "Absolutely not. We have a huge global services division. In any major city of the world, you're never more than 5 minutes away from an IBM consultant talking Balls." It's a convincing claim.</p>

<p>At Redwood Shores they have adopted a different strategy again. A spokesperson was not available, but in a statement, Oracle announced: "Over the years Oracle has acquired a lot of balls. In fact, all Oracle 8-ball solutions come with a Teach Yourself Juggling DVD, to help you keep all these balls in the air." Some CTOs have been concerned at the implications if their IT department should drop one of these many juggling 8-balls. Oracle does not address the problem directly in the press release, but a spokesman acknowledged privately that "The balls are 'unbreakable.' But when they do break, this may be a problem for the customer, but not for us, so long as the ball was fully licensed." And if the customer wants support for the broken ball? "Of course, that's fine. Although, naturally, they do have to pay retrospectively for every time they didn't drop the ball."</p>

<p>Not to be left out, SAP spokeswoman Kitty Herrballs was keen to describe their solution for those seeking 8-Ball insight. "Last Summer we released our natural language 8-ball application. You can simply ask the 8-ball a natural question such as 'Will my KNA1's VBUPs my MARD change?' And we'll return the answer instantly from memory."  Faced with skeptical questions that this format actually appears rather un-natural, Miss Herballs noted that "It's quite easy, if you remember to put the verb at the end."</p>

<p>Many customers wonder if this technology will be integrated into the popular Business Objects stack. "Eventually," says Herballs, "But BO have only just heard about it." Pressed on how this could be, given that the SAP 8-ball technology was released last Summer, and Business Objects is now a wholly-owned SAP subsidiary, Miss Herrballs pointed out that "BO are still, in essence, a French company. They were on vacation last Summer, and are only just catching up on email."</p>

<p>Finally, what of Microsoft, the last of the megavendors to come to market with an 8-ball application? Director of Marketing for the recently re-organized Business Services Solution Services Business at Redmond, Skihni Lahti, described their approach as "partner friendly." Says Lahti "We're delivering the 8-ball functionality in the Excel box for our customers' convenience." Industry watchers have been critical of this approach, pointing out that "functionality in the Excel box" means a Microsoft customer can expect 50 blank index cards tucked into the packaging on which to write questions, and a link to an Excel macro that performs a lookup to a table containing 8-ball answers. Lahti is quick to point out the flexibility of this method, and defends it vigorously. "It's really an 8-ball  platform. There is a strong ecosystem of partners supplying questions and answers for all your needs. And for our developer community, we include a Sharpie." We asked for customer evidence that the Microsoft approach is effective. "We have only a few external references just now, but we do use the technology extensively within Microsoft." An example? Look no further than the ongoing question "Should Microsoft buy Yahoo?" The answer "Reply hazy, try again," is right out of Lundegaard's Chapter 3 "The ambiguous answers: try another shake."</p>

<p>That concludes our roundup of the megavendors, and their approaches to Lundegaard's dramatically successful 8-ball methodology. Enjoy the rest of April, and do keep an eye out for Lundegaard's new book, due in booksellers later this month. Co-authored with Davenport, Gladwell and Kaplan, it is provisionally titled "Those Companies We Said Were Awesome? Not So Much." </p>]]></description>
<link>http://www.beyeblogs.com/donaldfarmer/archive/2009/04/a_future_for_bi.php</link>
<guid>http://www.beyeblogs.com/donaldfarmer/archive/2009/04/a_future_for_bi.php</guid>
<category></category>
<pubDate>Wed, 01 Apr 2009 08:30:00 -0700</pubDate>
</item>

<item>
<title>You say you want a resolution ...</title>
<description><![CDATA[<p>Typically, I do not make New Year's resolutions, but having resolved to blog more (see item 5 below) it felt appropriate to start with some goals for 2009. Let's see... </p>

<p><u><strong>1.	I will radically prune slideware.</strong></u><br />
One of my great pleasures in 2008 has been the result of wiring my Zune so I can listen in the kitchen while cooking, and subscribing to numerous podcasts from BBC Radio 4 and Radio Scotland. It is almost like being back home. Just over the last week or so, I listened to lucid explanations of the complex internal politics of the Abbasid Caliphate; the impact of our economic decline on East European migrant workers in the UK, with consequent effects on the growth of cut flowers as a secondary crop for organic farmers; and the details of the UK government's  misuse of knife crime statistics: all this by the spoken word alone, without a bullet list, highlighted term or process diagram in sight. How different from the typical conference presentation, my own too often included, where every point is illustrated, bulleted or highlighted in the supposed interests of clarity.</p>

<p>All through 2008, I have been progressively simplifying and clarifying my presentations. At the Microsoft BI conference in October, my most effective presentation, as scored by the audience, included only graphical slides (<a href="http://tinyurl.com/5vna35">here</a>) and my second best had no slides at all, just talking and demos. This year, when I have material that I would like people to take away, I will make it available as a separate handout, written in prose, which they can pick up, or download later. I am gradually seeing others take a similar stand against bland slideware. Mark Madsen, of Third Nature and TDWI, has some excellent presentations at <a href="http://www.slideshare.com/mrm0">www.slideshare.com/mrm0</a> - even if he does use the occasional bullet list. (Hey, I am not a puritan in these things. To the puritan, all things are impure.)</p>

<p>I have simply grown very tired of the public presentation that is a document in disguise. We have all had to sit through these, especially from software vendors: the endless builds of detailed marketecture; the logos; the highlighted mission statements; the bullet lists; the boxes and arrows. O Lord, the boxes and arrows! If I use such presentation graphics in 2009, shoot me with the arrow and put me in one of the boxes.</p>

<p>Of course, I know where all this comes from. It is well intentioned enough, and Microsoft is far from the worst offender. Here's how it happens in a slide review meeting.</p>

<p>OK guys, slide 5. We want to talk about the new capabilities. The first bullet nails it clearly - <em>scalable </em>- right?<br />
But if we say this new system is scalable, are we implying that the old version did not scale? <br />
I see. Perhaps this version is <em>more scalable</em>? <br />
No, that may have a similar implication. Let's say it has <em>improved scalability</em>. <br />
But that suggests only incremental changes to the code. My team did serious redesign work under the hood, so let's say it has <em>enhanced scalability</em>. <br />
How about enhanced scalability <em>architecture</em>? That would convey that you guys did a lot of work on this - which also helps to explain why we cut some of the usability work if anyone asks. <br />
Good - all agreed - let's capture that in a bullet. Now, is it really <em>integrated </em>with this other system or just <em>compatible</em>? <br />
Let's change the solid arrow between the boxes to a dotted arrow so we don't make too strong a claim.<br />
Excellent. Better label the arrow with the name of the API just to be clear. <br />
OK, but there are several APIs with different capabilities.<br />
No problem. Another couple of boxes, with a callout line from the arrow, can list the APIs and services. <br />
Yes, but let's not overlook SaaS. <br />
OK, make one of the boxes a clipart cloud. <br />
Now we're getting there ...<br />
 <br />
I have been involved in so many of these meetings: actually, I am rather good in them, and I guess that for the major conference keynotes they may even be politically unavoidable. To be fair, especially with release dates, we are carefully hedging commitments that may come back to haunt us. Our audience understands that when we say <em>first half of 2020</em>, we are carefully not putting in writing any commitment that could be used against us, so long as we ship by 5.30pm on the afternoon of June 30th. However, far too often we take this caution into every aspect of the presentation with the results that our slides groan with detail and suffer badly from the fallacy of false precision.</p>

<p>For my own presentations, I can only resolve to try to do much better, while working to improve the situation elsewhere, too. In 2008, I had a lot of fun building the Business Intelligence fairytale with Stacey, my Vice President's communications manager - and the VP was game enough to present it too. It was hugely successful. You can see Ted Kummert presenting it <a href="http://tinyurl.com/9aqcrs ">here </a>- scroll to 1 hour and 16 minutes into the video if you like. I am hoping we can do more to keep our audiences interested in 2009.</p>

<p><u><strong>2.	Stop boring analysts!</strong></u><br />
Yes, that is an ambiguous statement. I will leave it so.</p>

<p>I am increasingly aware that analysts really do have a tough job, not helped by the presentations I have just described. I have always considered the worst mistakes we can make with analysts, are to be misleading or patronizing. I now realize that to be boring is a very close third. <br />
 <br />
<u><strong>3.	I am going to stop talking about metadata. </strong></u><br />
Those who know me well, also know this is about as likely as forswearing the eating of fish or the drinking of claret, but let me explain. I will still be happy to talk about business metadata, or technical metadata, or lineage or impact analysis; but I am not going to talk about undifferentiated <em>metadata</em> in the abstract. The all-encompassing term just is not helpful, and leads to the hopelessly mistaken expectation that there might be a <em>metadata solution</em>. Asking a vendor for their metadata solution is like asking an architect for their door solution. You mean front door, elevator door, office door, cupboard door, or fire door?  Rotating, sliding, or swinging? <br />
 <br />
<u><strong>4.	I will try to stop using analogies.</strong></u><br />
Yes, I know. It is far too easy to reach for an inexact analogy when making a point rather than taking the time and care to construct a better account of the actual issue at the hand. Analogies can be useful in their place, no doubt, but I will use them reluctantly for I notice that I, and many others, fall back on this rhetorical device too readily. I have become convinced that they confuse and mislead as often as they illuminate. After all, Jesus did not use parables to make his message easier to understand: quite the opposite. See Matthew 13.</p>

<p>Perhaps I will use metaphor instead. Metaphor sounds deeper. <em>Your metadata is an office door, opening onto the corridor of the enterprise.</em> By the end of the year, I will publish The Little Book of B.I. Calm.<br />
  <br />
Finally ...<br />
 <br />
<u><strong>5.	I will blog more.</strong></u><br />
Carried over from 2008.</p>]]></description>
<link>http://www.beyeblogs.com/donaldfarmer/archive/2009/01/you_say_you_wan.php</link>
<guid>http://www.beyeblogs.com/donaldfarmer/archive/2009/01/you_say_you_wan.php</guid>
<category></category>
<pubDate>Wed, 07 Jan 2009 10:45:00 -0700</pubDate>
</item>

<item>
<title>Catching up on recent news</title>
<description><![CDATA[<p>I have not blogged for some time. Not that I had given up, I just found little time to craft posts that I thought were interesting enough to share. So this is a very overdue post. </p>

<p>Why get back on the horse now? That's simple to say. Over the last few weeks my mailbox has been full of questions about recent developments in BI and DW at Microsoft - so a new blog post seemed like a great way to provide a generic answer to the many questioners, especially where they were just asking for opinions rather than hard facts.</p>

<p>For those of you who prefer audio, I have cast my pod twice recently with the fine folks at the B-Eye-Network: one interview with <a href="http://www.b-eye-network.com/listen/8214">Jill Dyche</a> and one with <a href="http://www.b-eye-network.com/listen/8345">Colin White</a>. Both podcasts discuss recent events at Microsoft. As always, my views are not Microsoft's official position. They are my own, and often enough not even shared by both sides of my brain at once.</p>

<p>So what has been happening in Redmond that spurred all the questions? For those who need a reminder, here at least 4 issues.</p>

<p>The acquisition of DatAllegro<br />
The acquisition of Zoomix<br />
The release of SQL Server 2008<br />
Bill Baker's leaving Microsoft</p>

<p>I'll be brief on each of these here, but do listen to the podcasts for more.</p>

<p><strong>DatAllegro.</strong><br />
This acquisition made a big splash, if only because so many people had been eyeing the accelerator and appliance market waiting for the first signs of consolidation with BI or RDBMS vendors. However, although MS is first in the fray here, we have not really bought either an accelerator or an appliance - we have bought a significant step forward on our roadmap to greater scalability for SQL Server. The SQL Server team have been continuously improving scalability over the years - with some very effective case studies and proof points. DatAllegro just moves us effectively and efficiently along. <br />
Two points have really got the BI bloggers chatting. The first is the sum Microsoft paid. (At this point, my legal rep is sweating and readying an email to donald.farmer reminding me that I cannot discuss this.) I cannot discuss this. But really, in a month or two this is a footnote. The most important thing about the price is how little it matters in the big picture. The second point the bloggers have enjoyed discussing is how long it will take Microsoft to integrate the DatAllegro codebase, while migrating it away from its open source roots. Again, I can't discuss in detail,  but I can tell you that we bought a shortcut to massive scale implementations and a shortcut it will be.<br />
It's a pretty exciting prospect: we'll  be playing in a fascinating space. Other teams - management tools, ETL, reporting -  also have challenges arising from this shortcut; we now need to ensure other elements of the stack are ready for the massively scaled deployments that we will support. In truth, there will be fewer problems here than you might expect. Our ETL product, SQL Server Integration Services, has already set a world record for ETL. See: <a href="http://tinyurl.com/5olqc6">http://tinyurl.com/5olqc6</a>. Our Reporting Services in 2008 handles huge data volumes very effectively.<br />
Nevertheless, it is interesting to me that few bloggers picked up on these issues at all:  there was a very narrow view of the DatAllegro solution itself, rather than a broader consideration of how this would fit into the wider infrastructure of DW, reporting, performance management etc. This is pretty typical of the way in which we discuss the appliance and accelerator market as an industry: we tend to look only at the implications of massive scale for the database, without considering how the data is to consumed practically and efficiently.<br />
Those of you follow the various BI blogs will of course be aware that there is actually a third issue that popped up around the DatAllegro acquisition - a legal action about some IP. Now that is something I surely cannot discuss - and I know nothing about it anyway. I'll say only one thing: my first reaction was simply "Here we go again." You may be surprised at how often this sort of thing happens, at MS and any other company with deep pockets. It's a bore, but it keeps lawyers busy and well-paid. (Which I like, because one of my attorney friends throws the best barbeques and wine-tastings.)<br />
<strong><br />
Zoomix.</strong><br />
I love Zoomix. They are a great wee Israeli company who addressed the problems of data quality in a new way, and did an awesome job of it. They were deservedly a Gartner cool vendor - if they were any cooler they could have solved global warming. I love the self learning capabilities and they have smart smart people on board. I first proposed the Zoomix acquisition, so I'm doubly pleased to see it happen. I moved teams meanwhile after that first proposition, and my old Microsoft team in Integration Services completed the acquisition. Great to see that happen, and great work by them to see the acquisition through - and I'm looking forward to being able to work more and more with a great data quality stack from Microsoft. And for sure I'll be writing a lot more about this technology in the future.</p>

<p><strong>SQL Server 2008.</strong><br />
We shipped! In fact, we have our ship party this Friday. It's always a great feeling. Listen to my podcast for details of some of the great features: great end-user reporting, data profiling, best practices alerts for OLAP design etc. <br />
The question I have had in my mailbox repeatedly about the release is: didn't you guys launch in February? Well, yes we did. That was the marketing launch of the product along with the other big releases of 2008: Windows Server 2008 and Visual Studio. It made a lot of sense to roll all these big releases into a single (and very successful) worldwide roadshow. I am not in marketing, but I did appreciate the three-in-one launch when I was presenting on the roadshow. It made a big juicy story and gave an opportunity to tell a very integrated and compelling message to IT, developers and BI alike.</p>

<p><strong>Bill Baker</strong><br />
Today is Bill Baker's last official day at Microsoft. Those of you who know me, or who have read my blog, know how much I admire Bill.  I really owe a great deal to him, as do many others in Microsoft. He'll be sorely missed for sure, although he does leave us in a good state to carry on what he started when he first came to MS.<br />
There have been many good things said about Bill over the weeks since he announced his departure. Aside from the personal friendship and mentoring, I would like to mention just two things that I think Bill did that made a huge difference.<br />
Firstly, Bill really understands the BI industry - the customers, the vendors, the consultants and, of course, the technology. He is passionate about Business Intelligence, in a completely non-partisan manner. Any customer who speaks with him, comes away fired up with enthusiasm for how they can transform their business with this technology. Bill  has a real feeling for this transformative power of and he can talk with anyone from accountants to zookeepers about specific, actionable and achievable steps they can take to improve their decision-making.<br />
Secondly, Bill had a real knack for building a community of users. At any major Microsoft conference, the Business Intelligence community is present and has a genuine sense of shared purpose, and we enjoy a lot of fun together. When you look at the growth of Microsoft BI over the years, one of the most significant factors is how much of that growth was driven by customers who are new to BI. These new customers learned BI on MS technologies and built a strong and common experience together. Fortunately, one thing at which Bill also excelled was "making others great." The result is that, the MS BI community will be largely self-sustaining and I expect it to be sparky and vibrant for a long time to come. <br />
So, thanks Bill, for all you have done here. I'm really looking forward to what you do next at Visible Technologies  - they are now, more than ever, a name to watch.</p>

<p>Finaly, if you are interested, you can follow my current adventures on <a href="http://twitter.com/donalddotfarmer">Twitter</a>.   I am enjoying micro-blogging, and for sure I'll be doing more macro-blogging too!<br />
</p>]]></description>
<link>http://www.beyeblogs.com/donaldfarmer/archive/2008/09/catching_up_on.php</link>
<guid>http://www.beyeblogs.com/donaldfarmer/archive/2008/09/catching_up_on.php</guid>
<category></category>
<pubDate>Tue, 02 Sep 2008 13:15:00 -0700</pubDate>
</item>

<item>
<title>I&apos;m biased. And so are you.</title>
<description><![CDATA[<p>Earlier this year, I changed teams and moved offices within Microsoft. This interrupted a little habit I had developed: pinning up my “Cognitive Bias of the Week” outside my office.</p>

<p>Cognitive biases are somewhat like optical illusions, but they affect our thinking rather than our vision. A well known example is confirmation bias; we tend to give more weight to positive observations that confirm our beliefs rather than negative observations. Fortune-tellers may appear successful when people remember one or two correct predictions more readily than the many that were off the mark.</p>

<p>Of course, you wouldn’t make such an error, would you? Think again. Like an optical illusion, many biases are extremely difficult to shake even when you are aware of the effect. In fact, some biases are most effective when we try to think most logically.</p>

<p>I believe it’s important for those of in the BI world to understand these biases. We represent data and analytic conclusions in highly persuasive ways. We help our customers to get it right or to get it wrong - and at times our influence may be inadvertently malign. With that in mind, I’m going to translate my “Cognitive Bias of the Week” posters to occasional blog posts on particular biases. I hope you’ll find these interesting, and relevant. Let me know.</p>

<p>Here’s one to start with. It’s about risk, and it has some revealing insights into how we consider the impact of risk in our decisions. It’s often called “The Pseudocertainty Effect” and it was first examined by <a href="http://www.cs.umu.se/kurser/TDBC12/HT99/Tversky.html">Tversky and Kahneman</a>. </p>

<p>Imagine that the US is at risk from a new disease spreading from Asia. Without treatment, it will kill 600 people, but we have two treatments to choose from.  <br />
  • With Program A, 200 people will certainly live. <br />
  • With Program B there is a 1/3 probability that all 600 people will leave. However, there is also a 2/3 probability that they will all die.</p>

<p>Program A is positive – you’re certainly going to save some people. Program B potentially has a better outcome, but it is way less than certain. What treatment program do you recommend?<br />
In the original study, 72% recommended Program A, and only 28% preferred Program B. </p>

<p>Let’s flip the problem round. <br />
  • With Program A, 400 people will certainly die.<br />
  • With Program B there is a 1/3 probability that no-one will die. However, there is also a 2/3 probability that all 600 people will die.</p>

<p>Now, Program A is negative: 400 people will certainly die. Program B is still uncertain: there is a risk it will all go wrong. However, if you do nothing 600 will die anyway, and if you follow Program A, 400 will certainly die. With Program B you have a chance of saving everyone. In the original study, when presented in this way to a different sample, 78% chose Program B. </p>

<p>That’s pretty remarkable. Exactly the same choices, presented in a different way, led to a complete inversion of preferences.</p>

<p>From this example, you can perhaps see why I consider cognitive biases to be an important study for BI analysts and developers. We may think of ourselves, or our users, as super-rational objective analysts of complex data; but in reality we are subject to these same biases. Also, we will tend to fall back on these biases, shortcuts and heuristics when we are making decisions under stress. </p>

<p>As BI becomes ever more pervasive, emergency planners probably would use our tools and techniques to handle an epidemic. But we could also be discussing customer churn rather than a deadly disease. The specific KPIs we choose, the manner in which we present them – the ways in which they influence decisions may be subtle, but the impact can be dramatic.</p>

<p>I’ll try to keep up a regular posting of biases, with examples relevant to the BI world. <br />
</p>]]></description>
<link>http://www.beyeblogs.com/donaldfarmer/archive/2007/08/im_biased_and_s.php</link>
<guid>http://www.beyeblogs.com/donaldfarmer/archive/2007/08/im_biased_and_s.php</guid>
<category></category>
<pubDate>Mon, 27 Aug 2007 11:32:58 -0700</pubDate>
</item>

<item>
<title>Data visualization - in a music video</title>
<description><![CDATA[<p>Not quite BI, but how often do I get the chance to post a link to a <a href="http://www.youtube.com/watch?v=KHEIvF1U4PM">data visualization music video?</a></p>

<p>If you think you recognize the music, you're probably right. It's playing in the background of the Geico caveman advert when he's on the moving walkway in the airport.</p>

<p>My colleague Olivier Matrat points out that the video production is by a French design firm H5 who also made <a href="http://www.youtube.com/watch?v=E3B__ovj2jU">this </a>excellent visualization for a nuclear services company.</p>

<p>Enjoy.</p>]]></description>
<link>http://www.beyeblogs.com/donaldfarmer/archive/2007/08/data_visualizat.php</link>
<guid>http://www.beyeblogs.com/donaldfarmer/archive/2007/08/data_visualizat.php</guid>
<category></category>
<pubDate>Sun, 12 Aug 2007 11:22:53 -0700</pubDate>
</item>

<item>
<title>The world is flat - or at least its files are.</title>
<description><![CDATA[<p>A couple of weeks ago, <a href="http://www.strategic-pr.com/">Scott Humphries</a> held his annual Pacific Northwest BI Summit in Oregon. It is a private event, small but highly valued, and organized impeccably by Scott. The Summit is a real pleasure, with all the ingredients of a memorable symposium - fascinating company, beautiful surroundings, and a wonderful host. However, the Summit is much more than just a good time: it is an opportunity to have conversations and to exchange insights across a very broad spectrum of the BI business, not only with deeply knowledgeable friends, but also with colleagues from companies outside our usual circle of partners.</p>

<p>This year we formally covered four topics - RFID intelligence, software as a service, IT and business alignment, and data warehouse appliances.  Informally, the subjects were ever more diverse. Coming away from the weekend, I always find that some insights have been new and surprising; some have simply, but valuably, confirmed what I have already been hearing from partners and customers; and some give an interesting new tingle to vaguely defined feelings I have had about the BI Industry and its practices.</p>

<p>Here is just one example. We were discussing Software as a Service, and someone observed that, in their SaaS world, many clients still exchanged data with the service in the form of encrypted flat files, exchanged over secure http. These customers were unwilling, for security, to open a port in their datacenter to exchange data with the service provider. There was much head-nodding and recognition around the table. For me especially, having spent five years specifically working on data integration technologies, I was all too aware that flat files are pervasive. </p>

<p>Nevertheless, one thinks of software as a service as being on the leading edge of innovation, and it was a little surprising to discover that good old flat files are still to be found there - and not only as lingering artifacts of an earlier age, but as a positive choice for otherwise early-adopting customers. It is rather like visiting the restroom in a high-tech Japanese building, and finding a squat toilet – elegant and efficient, but somehow something one expected to be phased out.</p>

<p>I love flat files. You have to marvel at the sheer ingenuity - sometimes inspired, sometimes perverse - with which data architects have been able to overload the meanings of delimiters, work around embedded characters, pad fields, compress fields, normalize, denormalize, you name it. And it’s not only what people have been able to do with the 2**7 characters of ASCII – We had great fun working out how efficiently to parse (and help users to define) fixed width columns in multi-byte character sets. Great stuff!</p>

<p>I have a friend in Canada who, in his retirement, carefully watches the Canadian markets. For this he uses Microsoft's <a href="http://moneycentral.msn.com/">MoneyCentral</a> website. Now, as it happens, several of the exchanges who provide data to MoneyCentral use a simple form of compression for their streaming ticker data: they leave out the decimal point from each quote. For a quote to two decimal places, this can account for between 14% and 25% compression. Every hour or so, the data provider sends a reminder of where the decimal place should be. However, very occasionally, the provider would overlook to send this reminder and my friend's stocks appeared to jump 10000% in value. At his age, this kind of excitement could be too much for him. <br />
Bud's method for dealing scenario was simple enough - he emailed me whenever this happened. After all, I work at Microsoft, so surely I can tell those guys at MoneyCentral to sort it out. Naturally, the team spots these problems pretty quickly anyway and the figures would be adjusted within minutes. Nevertheless, Bud was convinced that I was so powerful within Microsoft that all I had to do was pick up the phone, and entire teams jumped into action to fix the problem just for him. (Today, I believe the problem is permanently solved. I certainly haven't had that panic email from Bud in a while.)</p>

<p>When I reflect on it, it is natural that flat files still have a role to play in our new world of software as a service. They are, like the squatting toilet, simple and efficient. They do, perhaps, involve perhaps some manouevers to which we, in our technolgoical comforts, have grown unused. (My wife and I concluded that the wonderfully supple and elegant old ladies and men performing Tai Chi in parks of an early morning in Hangzhou were actually practising for what my own grandmother would call their "necessary visits.")</p>

<p>Technologies move more slowly in the real world than they do in the high-energy environment of innovators and start-ups. I have no problem with that. If for some folks the world is still flat, it is a good thing that those of us eager to rush forward with all that is new, still have to accommodate them.</p>]]></description>
<link>http://www.beyeblogs.com/donaldfarmer/archive/2007/08/the_world_is_fl.php</link>
<guid>http://www.beyeblogs.com/donaldfarmer/archive/2007/08/the_world_is_fl.php</guid>
<category></category>
<pubDate>Sat, 11 Aug 2007 12:13:59 -0700</pubDate>
</item>

<item>
<title>Chinese surnames</title>
<description><![CDATA[<p>In January, I posted about <a href="http://www.beyeblogs.com/donaldfarmer/archive/2007/01/jills_surname_m.php">the limited range of surnames </a>in my home community in Scotland - and the problems that can cause for data quality. If it's a problem on a Hebridean island, think of how difficult it must be in China, where there is also a limited range of surnames. 85 percent of Chinese population share 100 surnames! </p>

<p>The Chinese authorities are now waking up to this problem and have introduced <a href="http://www.chinadaily.com.cn/china/2007-06/12/content_891902.htm">a new protocol </a>whereby people can register a composite surname comprising both the father's and mother's name. The hope is that this would create up to 1.3 million new surnames - although the real number is more likely to be much lower: around 10,000. Still an improvement.</p>

<p>I guess these would rather like the double-barreled names so enjoyed by the British aristocracy. These were used when property or titles were inherited through the female line: the double name signified the new male line and the endowed female line. Think of the first Britihs prime minister: Campbell-Bannerman, where the dominant Campbell family carried the weight of  history, wealth and titles in his lineage.</p>

<p>Or perhaps these new composite Chinese names would be more akin to the composite names used by ladies in the US - Hilary Rodham Clinton being an obvious current example. Either way, it's an interesting solution to an increasingly difficult problem. </p>

<p>In Thailand they tackled the Chinese name problem <a href="http://www.apmforum.com/columns/thai4.htm">quite differently</a>. They just insisted that Chinese immigrants registered themselves with unique surnames. In order to ensure uniqueness, more and more suffixes and prefixes had to be added to existing names. The result was extremely long names, which apparently Chinese quite enjoyed because they echoed the extremely long names of the Thai nobility. The idea of requiring your name to be a unique identifier appeals to my datahead, if not to my sense of individuality.</p>

<p>Enjoy the links. <br />
</p>]]></description>
<link>http://www.beyeblogs.com/donaldfarmer/archive/2007/06/in_january_i_po.php</link>
<guid>http://www.beyeblogs.com/donaldfarmer/archive/2007/06/in_january_i_po.php</guid>
<category></category>
<pubDate>Tue, 12 Jun 2007 16:45:23 -0700</pubDate>
</item>

<item>
<title>Stratature and the Microsoft platform</title>
<description><![CDATA[<p>It has been some time since I last blogged. Just too much work, with some major conferences thrown in, and not enough time to compose some thoughts. Nevertheless, I cannot let last week’s news pass – that Microsoft has acquired <a href="http://www.stratature.com/">Stratature</a>, a remarkably agile vendor in the MDM space.</p>

<p>I have had many mails and calls from folks wanting to know what it all means. I can understand that – we spend a lot of time in our industry poring over headlines and quotes like the cold-war Kremlin watchers. Is comrade X, standing next to general Y - maybe the tension between their departments is over – and is commissar Z missing from the parade? Similarly, I know many people will be poring over the details of this announcement looking for hints about some grand strategy. </p>

<p>It is really much simpler than that. Like many readers of the b-eye network, a telling number of our customers are asking about MDM, CDI and PIM solutions. In the past, as <a href="http://www.b-eye-network.com/blogs/dyche/archives/2007/06/microsoft_jumps.php">Jill Dyche</a> points out in her blog, we have demonstrated some appealing capabilities using existing components of the very comprehensive Microsoft stack. Yet we have not had a product directly and solely aimed at customers looking for MDM. As I often say we do not have a product with “MDM” stamped on the label.</p>

<p>This acquisition, then, does mark a new step. We will, in the future, have a product focused specifically on the MDM market:  not just rolling various pieces of platform technology but introducing new and unique capabilities for MDM. Stratature is an awesome acquisition for that goal. </p>

<p>On the other hand, the new story is not so <em>very </em>different from our consistent approach to operational and analytic data. We are continuing to build a comprehensive BI and operational platform, now including MDM, built with the Office Business platform and the SQL Server data platform. (We have more platforms than the Jackson Five, and a good thing too.) In this continuing evolution, Stratature is an outstanding acquisition as the technology already dovetails neatly into this framework.</p>

<p>So, as we progress, expect to see some exceptionally usable and effective capabilities emerge from the Stratature acquisition within the Office Business platform – look for the fastest time to the best value in the industry. In parallel, look for the SQL Server platform to grow as the best data platform for operational, analytic and, increasingly, master data.</p>

<p>It’s going to be a stimulating time for Microsoft, our customers, and everyone else with an interest in the MDM space.</p>

<p>One last note. These acquisition announcements rarely capture the full story of how the deal was done.  I’m not going to spill any beans, but I really must congratulate my friend and colleague <a href="http://sqlblog.com/blogs/knightreign/">Kirk Haselden</a> on the tenacity, commitment and dexterity he has shown in this acquisition. Kirk and I had many discussions on this topic: at times tense, (oops, was that a bean?) but ultimately friendly, fully supportive and, as ever, totally focussed on the customer value. On a purely personal note, it’s great to see him shepherd this exciting technology into the Microsoft fold. Great work, Kirk. It’s going to be a pleasure seeing the ripples this will cause!</p>]]></description>
<link>http://www.beyeblogs.com/donaldfarmer/archive/2007/06/stratature_and.php</link>
<guid>http://www.beyeblogs.com/donaldfarmer/archive/2007/06/stratature_and.php</guid>
<category></category>
<pubDate>Sun, 10 Jun 2007 19:22:01 -0700</pubDate>
</item>

<item>
<title>The true art of presenting data</title>
<description><![CDATA[<p>Last week I was either brave, foolish or egotistical enough to share some of my working ideas on presenting data and data solutions. In truth, I expect all three attributes played their part.</p>

<p>This week, however, you should see how a true master of the art performs. Having seen this video of <a href="http://www.youtube.com/watch?v=wUiGGzym_uQ">Demitri Martin</a> you may never create a graph with a straight face again.</p>]]></description>
<link>http://www.beyeblogs.com/donaldfarmer/archive/2007/05/the_true_art_of.php</link>
<guid>http://www.beyeblogs.com/donaldfarmer/archive/2007/05/the_true_art_of.php</guid>
<category></category>
<pubDate>Sun, 06 May 2007 16:48:33 -0700</pubDate>
</item>

<item>
<title>Presentation Skills for Business Intelligence - Nine Points of Roguery.</title>
<description><![CDATA[<p>A long post today, I hope it is interesting. </p>

<p>It’s funny how an idea can be dormant for ages, then suddenly crops up everywhere again. I used to have a simple method for structuring presentations – specifically, where I had to present results of an analysis, and often a related proposal. Most of us in the BI world do this regularly. I had not shared the technique much, but in recent weeks I have found myself describing it in detail several times, sitting down with hassled analysts helping them pull together summary presentations.</p>

<p>The method is simply an outline that you can use to structure your presentation for best impact. I used to call it <em>The Nine Points of Roguery</em> – there is an old <a href="http://lutheran-hymnal.com/celtic/rj73.mid">fiddle tune </a>of that name – but please don't think I am suggesting that you should be roguish with your clients. Still, the method <em>does </em>describe nine points, as follows:<br />
<blockquote>•	Make 3 points that your audience will already understand<br />
•	Enhance and extend these three points<br />
•	Introduce three new findings from your work</blockquote></p>

<p>Easy! I’m going to use an example to illustrate some of the ideas. Imagine that I have been tasked with examining customer data quality for a client and coming up with some suggestions for improvement. Here goes …</p>

<p><u><strong>Make three points your audience already understands </strong></u><br />
You will connect best with your audience when you share common ground. By speaking briefly to a few familiar points, you show understanding of their needs. You can even make it clear that you know that they know. <em>Of course, with your business experience, you understand this even better than I.</em> Do not overdo it – flattery will get you nowhere – but it is good if your audience feels you address them as equals. You are all smart people, tackling a non-trivial issue.<br />
What three points should you make? Naturally, the details depend on context, but do choose engaging, substantive, topics. Get to the core of your audience’s problems. If you need more structure, try the following:</p>

<p><strong>Strategic impact</strong>. How does the current topic affect your audience’s long-term goals? How could a successful project help? What would failure look like? <br />
<em><strong>Example</strong>: Direct marketing is a critical component of your client’s customer acquisition strategy. Poor data quality wastes money by inappropriately marketing to the wrong customers. It also risks alienating the public and damaging the company’s reputation.</em></p>

<p><strong>A tactical concern.</strong> Do not spend too long on strategy: you will be aiming too high. What immediate concerns face your listeners? What decisions will they make today or tomorrow? Choose a tactical problem that concerns them directly.<br />
<em><strong>Example:</strong> From mergers and acquisitions, your client has multiple customer data sources. There is an immediate need for a single version of a customer across the enterprise.</em></p>

<p><strong>An obstacle. </strong>Why is the current issue not easy? Get into detail: is there a financial, technical or human barrier to success? Your listeners understand that difficulties exist. Still, you are reassuring them, in effect, that it is not stupid to be in their situation.<br />
<em><strong>Example</strong>:  Their most important source system is effectively legacy software. It has been used for many years, but is not compatible with more modern CRM or data quality applications.</em></p>

<p><br />
<u><strong>Enhance your three points</strong></u><br />
You and your audience now have a baseline of shared understanding. Next, you should show that you have explored their issues further. It can be tempting to pull a rabbit from your hat, dazzling your audience with some revelation that resolves their problems at one stroke. In fact, most often you will not have such an eye-opener. Even if you do, my advice is to wait. In all cases, you must build authority first. Your presentation is not the Sermon on the Mount. You cannot simply announce “Ye have read … but I say unto you …” unless your authority is unquestionable.<br />
So, develop your themes. When you present new findings later, the audience will appreciate your knowledge and experience. You can build this influence in several ways. Indeed, using a variety of techniques will be more appealing.</p>

<p><strong>Extend. </strong>Expand one of your original topics by considering how the matter changes with time, geography, scale or some other dimension.  Was this problem easier in the past? Why? Does the passage of time have an effect, making things better, worse, smaller, or larger? Could this impact of this concern vary with geography? Perhaps the US division suffers more than the European division. You get the idea. You are building authority by going beyond the obvious.<br />
<em><strong>Example</strong>: Cleaning your client’s customer data is not a one-off action. Accurate operational data may be critical, but so is the ability to analyze customer behavior over time. Because customer data changes constantly, the client needs good quality historical data too.</em></p>

<p><strong>Contradict</strong>. I’m contrary by nature, so I like this one. However, regardless of my own predilections, finding contradictions is an excellent way in which to expand a topic. Few issues that you cover will be simply positive or negative. Your task here is to find the silver-lining in the cloud, or, vice-versa. The underlying message is, naturally, that not only is the subject not simple, but also that your understanding of it is not simplistic.<br />
<em><strong>Example:</strong>  Creating a single version of your customer data from your client’s various mergers and acquisitions is a great vision. However, that single version will be an even more valuable asset than before. As such it may require additional administration, greater security, high availability and disaster recovery planning.</em></p>

<p><strong>Personalize.</strong> Your clients are human. (If not, mail me: I would love to know more.) People relate most directly to the needs and experiences of other people. So, in every presentation, be sure to expand at least one topic to cover personal impacts. How does this concern affect the daily work of the manager, the DBA, the salesperson? Use named individuals if you like, but at least ensure that your presentation is not abstract. It should be rooted in the effects on real people of the problems you are covering.<br />
<em><strong>Example:</strong>  It is increasingly difficult to find staff skilled in the company’s legacy applications. There remains an administrator, Julie, and one developer, Bob. Julie spends too much time preparing dumps of text files for integration with other applications. Bob is stretched developing new reports to keep up with changing compliance requirements. </em></p>

<p><br />
<u><strong>Introduce three new findings from your work</strong></u><br />
By now you have demonstrated an understanding of your audience’s needs. Further, you have shown experience and authority. It is now time for new results and recommendations. The structure of this section will, again, depend on the specific context. However, if you struggle to get that right, I would suggest that you invert one of the patterns we used earlier. Start with an insight or recommendation at a personal level, and then show new tactical and strategic ideas.</p>

<p><strong>Personal insight.</strong> Do your recommendations or discoveries directly affect individuals, whether employees or customers? If so, be prepared to talk to that very directly. Do not cover every impact: just choose one as an example. A well-chosen example can establish an authentic connection with the audience.<br />
<em><strong>Example: </strong> Everyone in your audience has received junk mail. Many will have received duplicate mailings from one company. From your research, you can show that missing out a good target may be less costly exasperating a good target. So, you recommend not only consolidating and cleaning customer data, but also aggressively purging duplicates. By setting the proposal in a personal context, to which the audience can relate, you can make this case effectively. </em></p>

<p><strong>Tactical recommendation.  </strong>This should be the pivotal moment. It is when you make an actionable and material recommendation. You may have many tactical points – specific steps your client can take to achieve their strategic goals. Should you not present them all? I would suggest not: you risk overwhelming your audience. Better to choose the most impactful and representative tactic and speak to it well. Your proposal should relate to one of the issues you have raised earlier. This is also a good time to address ROI and costs associated with the problem and solution. Typically, it is easier to evaluate ROI for a tactical recommendation rather than an entire strategy. It may also be more credible to your audience.<br />
<em><strong>Example:</strong> You recommend migrating the legacy system to a new line-of-business or CRM application. Naturally, there are many sub-recommendations to be found in the report. However, overall costs can be estimated here, and supported with SWOT, cost-benefit or gap analyses.</em></p>

<p><strong>Strategic insight and observation</strong>. Now you can close the loop, referring back to your very first point. You have established common ground with your audience and demonstrated that you understand their strategic, tactical, even personal, concerns. You have specific recommendations based from your analyses. Now, you should show that your suggestions are not only tactical, but that they can have strategic impact too. Relate your point directly to the corporate strategies of your client. If your audience does not primarily comprise strategic decision makers, you can still make this point: just do not dwell on it for too long and be sure to relate any suggestion to their own work.<br />
<em><strong>Example:</strong> Direct marketing is still critical to your client’s customer acquisition strategy. With improved customer data quality you can significantly move beyond that approach. You can use your customer data to grow stronger customer relationships. Perhaps now, with a single version of the customer to hand, an effective loyalty scheme is practical across all the divisions of the enterprise, which previously poor data quality prevented. </em></p>

<p><br />
And that’s the outline. Nine simple points which help you balance the client’s current understanding with your new insights. If you try it out, do let me know how it works for you.<br />
</p>]]></description>
<link>http://www.beyeblogs.com/donaldfarmer/archive/2007/04/presentation_sk.php</link>
<guid>http://www.beyeblogs.com/donaldfarmer/archive/2007/04/presentation_sk.php</guid>
<category></category>
<pubDate>Mon, 30 Apr 2007 15:19:38 -0700</pubDate>
</item>

<item>
<title>Retailer found guilty of OLAP</title>
<description><![CDATA[<p>"It's the most flagrant case of aggregation I have ever seen," said the prosecutor.</p>

<p>Ok, I'm kidding. Yet today I did find a headline in the <a href="http://charlotte.com/123/story/85822.html">Charlotte Observer</a>: <strong>"Lenders accused of data mining</strong>." In this case, the financers in question were illegitimately searching a database of student borrowers. There is no doubt that the public have valid concerns over potential misuse of data, but it is awkward (for those who used the term in a rather more limited way) to see the good name of a useful technology tainted in the process.<br />
This new usage - data mining as database search – is easy to see in a <a href="http://feingold.senate.gov/~feingold/releases/03/01/2003116745.html">press release </a>from Senator Russ Feingold. Data Mining, he says, is “is a broad search of public and non-public databases in the absence of a particularized suspicion about a person, place or thing.”<br />
Most vendors who, until recently, described their technology as <em>data mining </em>now talk about <em>predictive analytics</em>.  It is an attractive phrase for vendors and commentators, having a technical ring to it, without being intimidating. Currently I use this idiom myself, much more than data mining. Unfortunately, the term is not entirely accurate. Many uses of data mining, predictive analysis or knowledge discovery (an even rarer term these days) are primarily descriptive, to enable business analysts to understand their data better, without querying the model for predictions.<br />
As it happens, while I may regret the inconvenience that a useful term has drifted from my own usage, I see no reason to complain. I have no time for those who talk about the “real” meaning of words. The current meaning of a word or phrase is determined by its usage and I am not going to fight that. Between friends, I may continue to have a gay old time chatting about data mining; but in public, I need to be aware that the meaning has moved on.<br />
However, I do have to wonder what phrase the press will next appropriate to capture the public’s finely nuanced paranoia. I could take a guess. Senator Feingold, points out that data mining in his sense requires “a combination of intelligence data and personal information, including an individual's traffic violations, credit card purchases, travel records, medical records, communications records, and virtually any information collected on commercial, public or private governmental databases.” I think we may have to start looking around for an alternative to CDI …<br />
</p>]]></description>
<link>http://www.beyeblogs.com/donaldfarmer/archive/2007/04/retailer_found.php</link>
<guid>http://www.beyeblogs.com/donaldfarmer/archive/2007/04/retailer_found.php</guid>
<category></category>
<pubDate>Mon, 16 Apr 2007 14:24:48 -0700</pubDate>
</item>


</channel>
</rss>