Lies, Damned Lies and Statistics

Tyranny of Numbers

The Economist has an article ‘Don’t lie to me Argentina‘ explaining why they are removing a figure from their indicators page:

Since 2007 Argentina’s government has published inflation figures that almost nobody believes. These show prices as having risen by between 5% and 11% a year. Independent economists, provincial statistical offices and surveys of inflation expectations have all put the rate at more than double the official number…

What seems to have started as a desire to avoid bad headlines in a country with a history of hyperinflation has led to the debasement of INDEC, once one of Latin America’s best statistical offices…

We see no prospect of a speedy return to credible numbers. From this week, we have decided to drop INDEC’s figures entirely.

Whilst we often talk about how statistics can always provide the answers people are looking for (hence the popular quote used as the title for this post), there is another angle to consider – are the underlying numbers telling the truth? It is a critical question when decisions are based on increasingly complex calculations that are then converted into summary data visualisations to assist decision-making. Was the original source data automatically scraped from systems or keyed in by people? Just how balanced is that scorecard..?

One of the current hot trends on the Internet is the emergence of ‘Big Data’ – being able to scrape massive quantities of information automatically generated, such as the search and surfing habits of everyone who ever logged into Facebook…  and then analysing for patterns. One of the potential attractions is eliminating human error – or influence (in terms of truthfulness) – over the underlying data sources. Doesn’t solve the challenge of influence through gaming of the system but that’s perhaps a post for another day.

If you are interested in the use and abuse of statistics, there’s an excellent short book that walks through historical examples of where statistics simply don’t work – The Tyranny of Numbers: Why counting won’t make us happy, by David Boyle – Click Here for an old book review I wrote.  And naturally, the book is listed on Amazon.

Related blog posts:

Friday Thought: Dangerous statistics

Earlier this week, a report was published claiming proof that alcohol causes more harm than heroin or crack.

I believe it is wrong.

But I’m no professor so best listen to the expert first (source: BBC News)

It’s all well and good using multi-criteria-decision-analysis and other long phrases that academics favour. But the judgements appear to be based on the assumption that the societal effects are unique to alcohol and would otherwise not occur. Take the following quote:

Crack cocaine is more addictive than alcohol but because alcohol is so widely used there are hundreds of thousands of people who crave alcohol every day, and those people will go to extraordinary efforts to get it – Professor David Nutt

Aside from not providing the evidence to confirm we have hundreds of thousands of alcoholics (not sure if that’s just the UK or globally), does he honestly think people who go to extreme lengths to acquire alcohol would not switch to alternatives if alcohol became as difficult to source as heroin? And people who need alcohol for the courage to be abusive, violent and the other societal dangers that led to this judgement are unlikely to live the rest of their lives in a saintly manner without it.

Here’s the chart showing the breakdown (source: BBC News)

Chart: drugs comparison

It appears the ‘harm’ score is influenced by the number of people who use each drug – I can’t believe the score for the likes of methadone wouldn’t change if everyone who currently uses alcohol switched to it on the basis of this chart suggesting it is a much safer option. This is a classic example of how statistics and data visualisation can mislead.

A better chart would be one showing the effects per 100 people who use each drug to excess, i.e. what is the cost to the individual and society of 100 alcoholics vs 100 heroin addicts vs 100 heavy smokers etc. That would offer more practical information when it comes to policy decision-making. Using the Professor’s approach, you might as well add chocolate and chips to the list.


Related blog posts:

Accurate approximations

Steve Clayton has a nice post out – Zero-Sum Game – that links to an older post of his – Orders of Magnitude.

Orders of Magnitude = a scale to describe differences between numbers. 10 being the most common scale to use. I sold 1 apple last week, this week I sold 10 – I increased sales by an order of magnitude of 1. Orders of magnitude are normally used for very large numbers where you don’t need to worry about the ‘loose change’.

Zero-Sum Game = for every gain there is an equal cost. If there is room for 10 apples in the box, one has to be taken out before you can add another.

It’s always nice to be reminded of definitions. Sometimes we can get a bit too colourful (in the polite sense) with our use of language to describe situations. (‘Super super excited about the orders of magnitude we will achieve in selling product X next year’ for example, when the actual prediction is a 20% increase.)

Steve’s post reminded me of a write-up in a newspaper last year, talking about the upcoming finale of Dr Who. The comment went along the lines: ¨At least, by the end of the programme, viewers will understand what the word ‘decimate’ really means.¨ (The baddie – The Master – orders his robots to remove one-tenth of the population).

Sharing birthdays

I have always been surprised how often I have been in a small group with at least two people sharing the same birthday (it’s happened at school, work, on holiday, on projects, basically more times than I can remember). With 365 days to pick from, it seems logical to think you’d need a pretty large group before duplications start to occur…

…but it takes just 23 people for there to be a greater than 50% chance that two people in the group will share the same birthday. For a good explanation of the maths behind the fact and why we so easily get it wrong, read The Law of Small Errors.

I find this simple probability exercise serves as a good reminder to test assumptions when analysing statistics. Sometimes the answer we expect to see is not remotely close to reality, but our brains have an annoying habit of preferring to believe expected myths rather than actual truths, no matter what evidence is presented before us. As we start to integrate business intelligence and performance dashboards into everyday activities, we need to beware our tendency to draw the wrong conclusions from the information presented.

I’ve read a few good books that cover this subject recently, including:

Distracting Data

A follow on from the ‘Dashboard Dangers‘ entry. Here is a simple example of why summary information can distract you from messy realities, from a different perspective.

From ‘Imperialism and World Politics, by Parker Thomas Moon, published 1927 (Macmillan)

…Language often obscures truth. More than is ordinarily realised, our eyes are blinded to the facts by tricks of the tongue. When one uses the simple mono-syllable ‘France’ one thinks of France as a unit, an entity. When… we use a personal pronoun in referring to a country, for example ‘France sent her troops to conquer the Tunis”, we impute not only unity but personality to the country… all too easily we forget the flesh-and-blood men and women who are the true actors. How different would it be if we had no such word as ‘France’, and had to say instead: ’38 million men, women, and children of very diversified interests and beliefs, inhabiting 218,000 square miles of territory’ Then we should more accurately describe the Tunis expedition… as this: ‘A few of these 38 million persons sent 30,000 others to conquer the Tunis.’ This way of putting the fact immediately suggests a question, or rather a series of questions. Who are the few? Why did they send the 30,000 to Tunis? And why did the 30,000 obey?

A very different context, but the same argument. Summary information can lead you to make snap judgements and form opinions, when you should be asking more questions…