BeyeBLOGS | BeyeBLOGS Home | Get Your Own Blog

« November 2009 | Main

December 30, 2009

Who's data is it?

I've had some negative experiences recently on the topic of data ownership, and how various team cultures respond to the concept of data integration. Read the posts on Sharpening Stones.

Share: del.icio.us Digg Furl ma.gnolia Netscape Newsvine reddit StumbleUpon Yahoo MyWeb  

Posted by Paul Boal at 12:45 AM | Comments (0)

December 15, 2009

Fun with Recursive SQL (Part 3)

See my entry on Sharpening Stones for an impressive way to use recursive SQL to split overlapping time segments and flatten them into a single timeline. (See the article for some pictures of what that means.)

Share: del.icio.us Digg Furl ma.gnolia Netscape Newsvine reddit StumbleUpon Yahoo MyWeb  

Posted by Paul Boal at 7:45 AM | Comments (0)

December 12, 2009

Fun with Recursive SQL (Part 2)

See my entry on Sharpening Stones for another fun way to use recursive SQL.

Share: del.icio.us Digg Furl ma.gnolia Netscape Newsvine reddit StumbleUpon Yahoo MyWeb  

Posted by Paul Boal at 8:30 AM | Comments (2)

December 11, 2009

Fun with Recursive SQL (Part 1)

See my entry on Sharpening Stones for some fun ways to use recursive SQL to do more than just traverse a product or organizational hierarchy.

Share: del.icio.us Digg Furl ma.gnolia Netscape Newsvine reddit StumbleUpon Yahoo MyWeb  

Posted by Paul Boal at 11:45 AM | Comments (0)

December 10, 2009

assert(datawarehouse.data.is_correct())

If a man begins with certainties, he shall end in doubts;
But if he will be content to begin with doubts,
He shall end in certainties.
[Francis Bacon 1561-1626]

When I was learning to program in C and studying algorithms, the assert() assertion macro was one of my favorite debugging tools. Assert can be used to validate that something isn't going wrong that could send your program into left field during the execution of some procedure. For instance, a balanced binary search tree should never be more than log2(n) levels deep (or something similar to that based on the exact insertion algorithm), where n is the number of items in the tree. After a new item is inserted in the tree, you can assert(tree.depth() == log2(tree.count())). If that assertion fails, then you know the tree isn't staying balanced and the search performance guaranteed by a balanced tree isn't valid any more.

If that's too much computer science for you, hold on and see where this is going. There's relevance to this idea beyond low-level programming and computer science theory.

I've been in many conversations with data warehouse sponsors that focused on the question of "how are you sure that the data in the warehouse loads correctly every night?" One of the better ways I've found to approach this kind of data integrity assurance is to think about what kinds of assertions can be found throughout the batch ETL processes that I create.

For this example, suppose a somewhat traditional sort of ETL process that happens in the following steps:

1. Copy or extract raw data from source system
2. Detect changes from last pull
3. Lookup surrogate keys and other translations
4. Apply deletes (as soft-deletes with setting exp_date = current_date())
5. Apply inserts
6. Apply updates

For the rest of this post, see the original at Sharpening Stones

Share: del.icio.us Digg Furl ma.gnolia Netscape Newsvine reddit StumbleUpon Yahoo MyWeb  

Posted by Paul Boal at 12:30 PM | Comments (0)

December 8, 2009

Data Quality - A Family Affair

Grandma's lesson about taking responsibility for data quality.


When I was a young child, we spent every Thanksgiving with my paternal grandparents in Denver. There are two particularly memorable things about those visits. First, even into the late 1980's, my grandparents didn't own their own telephone. They rented their phone from the telephone company. It was the same rotary dial phone they'd had for years, hanging in their kitchen, with an extra long handset cord attached so they could stretch across the dining room or kitchen while still talking on the phone. Second was the important lesson that I learned about doing dishes by hand.

Doing dishes by hand is ideally a three person job: one to wash, one to rinse, and one to dry. The lesson that my grandmother taught me about washing dishes was that the drier is the person accountable for making sure the dishes were clean when they went back into the cupboard.

As data warehousing professionals, we spend a fair amount of time and energy arguing that data quality is something that has to be fixed up stream, by applications. My grandmother would insist that sending the dishes back to the washer is not our only option.

If a dish comes to the drier not quite clean, there are three options:

Ideally the dishes come to us clean and ready to dry. It's a lot less work to dry off some steaming droplets of water and put a nice clean warm dish away in the cupboard than it is to notice that little bit of bread from the stuffing that didn't quite get cleaned and have to use the tip of your fingernail through a dish cloth to get the crumb off.

What are the downsides of sending the dish back through to be rewashed from the beginning:

Perhaps the same is true in terms of data quality. If a transaction moves from system to system and doesn't come out the other end quite exactly clean, because some of those business processes in the middle aren't quite exactly flawless, is it always the best choice to go back to the beginning to find just where things went wrong and correct them there?

I'm not suggesting that any application is allowed to be intentionally lazy about data quality, or should not correct issues that are identified. Rather, I'm suggesting that we make sure we all continue to see data quality as our responsibility and not merely blame up stream systems when there is something that could be done at various points in the chain to ensure quality information is used for decision making.

(Reposted from: Sharpening Stones)

Share: del.icio.us Digg Furl ma.gnolia Netscape Newsvine reddit StumbleUpon Yahoo MyWeb  

Posted by Paul Boal at 1:15 AM | Comments (0)

December 2, 2009

The Agility of Touch-It / Take-It

A parable about the agility that "Touch It, Take It" adds to data warehousing; and the extra work that a misuse of "You Aren't Going to Need It" creates.

Once, there was a great chief called Yagni. Chief Yagni's village was very prosperous and had grown much during his rule. Eventually, Chief Yagni decided that it was time for the village to have a new gathering space as the old one had been well out grown. So, Chief Yagni recruited 2 strong men, Gwyn and Titi, to bring stones from the quarry to the village center so that a group of builders could stack them into the new gathering space.

Gwyn and Titi both arrived on Monday morning to receive their directions from Chief Yagni. Yagni reviewed the building plans and told Gwyn and Titi that he needed 20 large flat square stones for the base of the building. Gwyn and Titi took their push carts down to the quarry, gathered rocks, and returned to the village center. They emptied their rocks together in piles for Chief Yagni's review. Gwyn's pile had exactly 10 flat square stones. Titi's had 10 flat square stones and 3 smaller angled stones.


"Why are these here?" asked Yagni.


"I had to pick them up off of the flat stones in the quarry," replied Titi, "so I thought I would just bring them along in case there was a use for them."


"Get rid of them!" shouted Yagni angrily. "You've wasted time in gathering those worthless rocks I did not ask you to collect. We need 10 tall narrow stones for the doorway. Go back to the quarry and bring me those. Only those, Titi! When you see the other rocks, tell yourself that You Aren't Going to Need It."


Gwyn smiled at Titi's scolding, feeling proud that he'd followed the chief's directions so precisely. Titi believed that the angled stones might eventually come in handy. Gwyn and Titi began pushing their carts back to the quarry. Gwyn's light and empty. Titi's partly full of unwanted rocks.

Frustrated, not wanting to push the angled rocks all the way back to the quarry, Titi dumped the extra rocks in a small pile just outside of chief Yagni's sight.

Gwyn and Titi gathered the tall narrow stones the chief asked for. Titi, again, had to clear angled rocks from on top of the narrow stones, and added them to his cart. This time, Titi added the extra angled stones to his pile just outside of Chief Yagni's sight.

Gwyn and Titi returned to the chief with their carts full of only tall narrow stones and the chief was pleased with them both. This continued for several more trips until the new meeting place was nearly complete. Gwyn following directions exactly and Titi always bring back more than asked. By this time, Titi has accumulated a large enough pile of angled stones to fill his entire cart.

On their last delivery to Chief Yagni, the chief looked over the plans again and stroked his chin in thought. "Gwyn, Titi," he said. "I need you to bring some angled stones for the roof. Like those that you brought back on the first trip, Titi. Go back to the quarry and bring me two carts full of those."

Gwyn and Titi hurried back toward the quarry. Gwyn went to the quarry and began collecting his cart full of angled rocks, but Titi had the large pile he had been accumulating throughout his other trips. He stopped just outside of the chief's sight, collected all of his angled rocks into his cart, and returned to the chief well before Gwyn had even loaded half of his rocks.

"Titi. How did you gather these rocks so quickly, when Gwyn hasn't returned yet?"

Titi explained to his chief that when he had to pick up a stone anyway, he decided that if he had to Touch It, then he should Take It.

The chief was please with Titi's foresight and promoted him to lead rock collector.

(Repost from: Sharpening Stones)

Share: del.icio.us Digg Furl ma.gnolia Netscape Newsvine reddit StumbleUpon Yahoo MyWeb  

Posted by Paul Boal at 9:45 PM | Comments (0)