How to Lie With Statistics

“Statistics are like mini-skirts – they give you good ideas but hide the important things” (Ebbe Skovdahl)

Statistics is a broad discipline with lots of applications in everyday life.  If you check a weather forecast, invest money in a bank or take out insurance then you are relying on the forecasts provided by statisticians.  Any food, drink or prescription drug you buy and consume will have gone through testing and quality control procedures which involve statistical analysis and sampling techniques.  Fortunately, these are usually carried out by skilled statisticians who have high professional and ethical standards.

But as Darrell Huff pointed out in his fantastic book, “How to Lie with Statistics”, statistics can also be dangerous in the wrong hands.  If someone misuses statistics (either unwittingly or unscrupulously), then people can easily be misled.  This is true in all walks of life, but this piece will look at some examples from football.

Using ambiguous figures

The inspiration for writing this piece has come from the official statement released by Kilmarnock FC on the results of their “newco” consultation.  At first glance, the headline “36% in favour of No to Newco” suggests that 64% of Killie fans consulted were in favour of the newco Rangers being admitted to the SPL.  But if that were the case, surely a headline of “64% say Yes to Newco” would have been more striking.  When something doesn’t seem quite right, it usually isn’t.  A closer read of the statement notes that 2500 supporters and shareholders were consulted, and that 36% of these were against the “newco”.  Rather than giving a percentage of respondents, it has suited Johnston’s agenda to give a percentage of people asked, disregarding the fact that many of those asked didn’t respond at all.

There are many reasons for people not responding to any survey.  In this case, some of the surveys were sent via email, and others were sent in the post.  They were sent out a week before the deadline, but some of the letters didn’t arrive until 3 days before the deadline, leaving little time to respond and then post them back.  In addition, some people would have moved house or changed email address, and wouldn’t have received their survey at all.  Some emails may have ended up in spam folders.  There would also be many people on holiday, given that the survey was sent out in late June.  This is normal for statistical surveys, and standard procedure is to draw conclusions from the received surveys.  With 36% of people voting no, at least 900 responses were received – more than enough to draw statistically significant conclusions.  The only concern about the validity of these results would be the potential existence of “selection bias” (more on this later), but the number of responses makes that unlikely.

Johnston did not (and probably won’t ever) release any further information about the survey, so it is impossible for anyone else to draw conclusions on the true outcome of the survey.  We know that 900 “No” votes were received, but the number of “Yes” votes was not revealed.  Given the lengths he had to go to in order to give the impression that the supporters backed a “Yes” vote, it seems unlikely that was actually the case.  It seems more likely that the Killie supporters’ vote would be broadly similar to that of other SPL clubs (ie >75% saying “No”), which would put the true number of “Yes” votes at 225 at most.

Compare the Kilmarnock statement with that of Motherwell FC on the same issue.  The number of votes for and against the newco proposal are listed, along with the percentage of votes not returned.   The percentages given only cover those who actually voted.  Statistical information should always be displayed clearly and unambiguously, and that is exactly what Motherwell have done.

This is along similar lines to the Scottish devolution referendum of 1979, where 51% of votes cast were in favour of devolution, but the referendum didn’t succeed because of an additional rule which required 40% of the total electorate to have voted “Yes”.  The turnout was only 63.8%, which meant that only 32.9% of the electorate voted “Yes”.  To break the 40% barrier would have needed either a 77.5% turnout with the same voting pattern, or for a 63.3% “Yes” vote on the same turnout.  This was obviously highly controversial, particularly given that there was evidence that electoral registers were out of date, and included people who had died or moved house. It took 18 years for another referendum on devolution, this time without the additional rule, which was passed with a 74.3% “Yes” vote.

Percentages without raw numbers

Johnston is a clever man, and it seems unlikely that any of that was anything other than a deliberate face-saving tactic.  He has previous on quoting unqualified percentages.  Over the last week he has been quoting season ticket sales as being “up 15%” compared to a similar time last year.  Any percentage value not backed up by solid numbers should be treated with suspicion.  What he fails to mention is that this time last year, Killie fans were considering a boycott of season tickets after lax security arrangements allowed thousands of Rangers fans into the home end on the last day of the 2010/11 season.

If only 50 tickets had been sold last year, that would mean that just 58 sales this year – a tiny fraction of last season’s final sales.  Without the numbers, we don’t know how well the sales are doing – but if the sales were really going well then it would seem more likely that the numbers would be given.  When you see statistical information displayed, sometimes what you don’t see is more important than what you do see.

Selection Bias

Michael Johnston’s statement did, in a round about way, manage to touch on a potentially valid point which can often be missed in surveys – the issue of “selection bias”.  People who are passionate about an issue are more likely to take the time to answer a survey than those who aren’t particularly bothered.  To see the potential effect of selection bias, consider a newspaper asking the question “Do you respond to newspaper surveys?”, and giving a phone number to call in with your answer.  The next week, the newspaper could draw the false conclusion that 100% of people responded to newspaper surveys.

That is obviously a contrived example, but polls on football forums can fall victim to the same issues on a lesser scale.  The posters on these are often the most opinionated supporters of the club, and as such may not be completely representative of the fanbase.  However, in Killie’s case,  it would seem incredibly unlikely that this could be responsible for the type of swing Johnston claimed.  On the forum, 460 people voted on the newco issue, and 97% of these said “No”.  Selection bias could not reasonably account for a swing from that figure to something as low as 36%.

Regression to the Mean

One of the enduring cliches in football is the “curse” of the Manager of the Month award, which dictates that anyone receiving the award will see their side suffer a downturn in form.  In reality, the “curse” can more or less be account for by “regression to the mean”, the phenomenon where an extreme value for a variable is likely to be followed by a value closer to the average.  A manager is likely to win the award as recognition of a better than usual run of form for his side, so it shouldn’t really come as a huge surprise when they return to their usual level the following month.

It is also possible that this could become a self-fulfilling prophecy, where a manager and his players, can be adversely affected by media discussion of the potential effect of the “curse”.  There is also a reporting bias inherent in this “curse” – people are primed look for a pattern such as a downturn in form after winning the award and will often forget about examples which do not fit into their hypothesis.  Indeed, reporting bias is a major issue in a number of similar footballing phenomena – most notably the claim that players will always score against their former clubs.

Unequal Comparison

In recent years, statistical analysis of football has become a lot more advanced, with information such as passing percentages and average positions becoming more prevalent (though not in the SPL yet, unfortunately!).  While these statistics can be very powerful if used to supplement your instincts when watching the games, they can also lead to some questionable conclusions if they are blindly quoted out of context.

When you read about a player’s pass completion, there is no distinction made between a simple 5-yard pass and a 60-yard diagonal to a winger.  The best passers of the ball won’t necessarily have the highest pass completion percentage, because they may take on more difficult passes in an attempt to create goalscoring opportunities.  At Euro 2012, Andrea Pirlo impressed us all with a masterful performance against England, but his passing percentage of 88% was bettered by 5 of his Italian teammates who started that night (Buffon, Bonucci, Balzaretti, Marchisio and de Rossi).  Pirlo’s 87.2% for the tournament was only 43rd best, while Nigel de Jong was ranked 2nd with 94.5%.  The Dutch had the highest passing accuracy of the 16 teams at the tournament, but failed to pick up a single point.

The only way to solve this problem would be to have some form of difficulty rating for each pass completed, thus allowing each player’s passing score to be adjusted suitable.  This would be very difficult in practice, given the amount of time which would have to be spent on the analysis.  Until such complex methods are available, the best approach would be to accompany any percentage with a “chalkboard” displaying each individual pass.

A similar issue can arise when it comes to the number of shots a side has in a game.  Often, managers or supporters claim that their side were unlucky to lose because they had more shots than their opponents.  But again this statistical fails to take into account the differences between different types of shots.  You are less likely to score with a speculative 40 yard effort than you are with a clear shot from 10 yards, but both are counted in the same way.  To illustrate the point, we again look to the Dutch performance at Euro 2012.  They averaged 20 shots per game over their 3 matches (more than any team in the tournament), but only scored 2 goals in total.

Again, these stats are not much use on their own, and even a chalkboard will not provide information about the position of the goalkeeper or the amount of pressure from defenders.  Perhaps a slightly more useful statistic to accompany it would be some measure of average shooting distance.  It may even be possible to combine this with the number of shots in order to come up with a single “Shooting Index” for each team.

Selective Statistics

In the lead up to matches, you will often read stats like “Team X have lost just 1 of their their last 13 matches against Team Y”.  Apart from being an unlucky number, 13 would seem like a very odd choice for a number of matches.  Why not choose their last 10 matches or their last 15?  The answer to that is an obvious one – these statistics are picked for maximum impact.  When you read something like that, you can guarantee that had you gone back to 14 matches ago, Team X would have lost.  And “Team X have lost 2 of their last 14 matches against Team Y” isn’t quite as impressive.

Small Sample Sizes

Small sample sizes are frowned upon in statistics, but they are what makes football so watchable.  Every match can be considered as an experiment with a sample size of 1, which means the “expected” result isn’t always achieved.   Consider the example of Team X and Team Y, where the former will win 80% of matches.   If the teams had to play each other 100 times to decide who was best, then Team Y would have virtually zero chance of coming out on top.  But in a cup competition, the sides will only meet once, and Team Y will progress 20% of the time.  In spite of what the cliche might say, the better team doesn’t always win.

About SPLstats
Providing statistics and trivia about Scottish football. Main focus is the SPL, but all Scottish football will be covered. Not affiliated to the SPL.

One Response to How to Lie With Statistics

  1. Pingback: The SFA National Football Survey – A Review « SPLstats

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: