May 5, 1997 (Vol. 19, Issue 18)
IS SURVIVAL GUIDE
BY BOB LEWIS
Pie charts and bar charts may bring comfort, but wisdom is another matter
"Statistics are used as a drunk uses lampposts -- for support, not
-- Anonymous, provided by A Word a Day
Evolutionary theory must account for all of the bizarre complexity of the
natural world: the tail feathers of peacocks; the mating rituals of praying
mantises; the popularity of Beavis and Butthead.
One interesting question: Why do prey animals gather in herds? Herds are
easy targets for predators. So why would prey animals join them?
One ingenious theory has it that although a herd as a whole is an easy
target, each individual member is less likely to be eaten because it can
hide among the herd. One critter, usually old or infirm, will be eaten
while the rest escape. But as a solitary figure, risk goes up.
Predators hunt in packs for entirely different reasons. Humans, as
omnivores, appear to have the instincts of predators and prey: We hunt in
packs and herd when in danger.
That explains the popularity of "research reports" that show how many of
our peers are adopting some technology or other. These reports show us how
big our herd is and where it seems to be going. Armed with this knowledge,
we can stay in the middle of our herd, safely out of trouble.
And so it was that I found myself reading an "executive report" last week
with several-dozen bar charts. A typical chart segmented respondents into
five categories and showed how many of the 20 or so "yes" responses fell
into each one.
Academic journals impose a discipline upon themselves called peer review, a
system that usually catches egregious statistical nonsense. But whereas an
academic publication requires peer review, a business publication only
requires a printing press.
That is what led to this executive report's distribution to a large number
of CIOs. I wonder how many of them looked at the bar charts; murmured, "No
error bars," to themselves; and tossed the information-free report into the
We have read over and over about information glut. I sometimes wonder if
what we really have is nonsense glut, if there isn't any more genuinely new
information surfacing each year now than there was a century ago.
Bar charts without error bars -- those pesky black lines that show how
uncertain we are about each bar's true value -- are just one symptom of the
larger epidemic. We're inundated with nonsense because we not only tolerate
it, but we embrace it.
Don't believe me? Then consider both the aforementioned report and a
critique by one of your analysts pointing out its deficiencies. Would you
say, "Thanks for the analysis," as you shred the offending pages, or,
"Well, any information is better than none at all."
Thomas Jefferson once said, "Ignorance is preferable to error," and, as
usual, Tom is worth listening to. The next time you're faced with some
analysis or other, take the time to read it critically. Look for sample
sizes so small that comparisons become meaningless, as was the case with
the bar charts.
Also look for leading questions, such as, "Would you prefer a delicious,
flame-broiled hamburger or a greasy, nasty-looking fried chunk of cow?" (If
your source has an ax to grind and doesn't tell you the exact question
asked, you can be pretty sure of the phrasing.)
Look for graphs presenting "data" without any hint as to how items were
scored. How many graphs have you seen that divide the known universe into
quadrants? Every company is given a dot, the dots are all over the
landscape, the upper-right quadrant is "good," and you have no clue why
each dot landed where it did because the two axes both represent subjective
values ("vendor stability" or "industry presence").
Readers David Cassell and Tony Olsen, both statisticians, recently showed
me two formulas, Data Density and the Data-Ink Ratio, from Edward Tufte's
wonderful book The Visual Display of Quantitative Information.
To calculate a report's Data Density, divide the number of data points by
the total graph area and express the result in dpsi, or data per square inch.
To calculate the Data-Ink Ratio, divide the amount of ink used to display
nonredundant data by the total ink used to print the graph. Use care when
scraping the ink off the page -- one sneeze and you're out of luck.
Bob Lewis is a consultant with Perot Systems Corp. Write to him at
[log in to unmask], or join his forum on InfoWorld Electric
Copyright (c) InfoWorld Publishing Company 1997