Spurious correlations are a trap. An entertaining trap, in some cases. But a trap nonetheless.
Take some examples from Tyler Vigen’s Spurious Correlations: Correlation Does Not Equal Causation. There are visible correlations between:
- Nicolas Cage’s appearance in films and the number of people who drown in pools
- Miss America’s age and the number of murders using steam or hot objects
- The US per capita cheese consumption and people who die by getting tangled in their sheets
You get the idea. Obviously, we’ve all heard that correlation does not equal causation. I’ve talked about it before and so have many, many other statisticians.
(Image from xkcd.com)
But why does it matter so much? We all know that Nic Cage movies aren’t actually causing drownings. No one is connecting the dairy lobby to bedsheet-related fatalities. We get it.
(No, often we don’t. And that’s why spurious correlations are such a problem in data analysis and storytelling.)
Why We Love Causation
In all those examples above, I never said that one thing was causing the other. I said there was a visible correlation. But we as humans love to see relationships — sometimes even relationships that aren’t there.
Why do we do it?
Because causation is sexy. We want to hear about the story behind things, what caused events or phenomena. If I told you that, for inexplicable reasons, the divorce rate in Maine and the US per capita consumption of margarine seem to follow the same pattern, you probably wouldn’t care. (Even most statisticians wouldn’t give me much more than a wan smile. “Neat.”) But if I said that the divorce rate in Maine was driving margarine consumption — or that Americans’ love of margarine was destroying marriages across Maine — you might pay attention. (Even if only to see where I was going with this.)
Causation is more exciting than mere correlation. And that drives us to embrace spurious correlations. A headline that says “Eating breakfast makes you 10x smarter” is waaay more interesting than a headline that says “Relationship discovered between intelligence and eating breakfast.” Unfortunately, the interesting headline is pretty dishonest and leads to distrust in both the media and science.
How Do We Identify Spurious Correlations?
Understanding data relationships is a big part of recognizing the difference between correlation and causation. Take care to examine all factors that could be affecting what you see. If possible, test to see if causal relationships really exist. What happens if you turn the relationship around?
It’s important to remember that, in causal relationships, one event prompts the other. Correlation, on the other hand, refers to any relationship between two factors — they move together, but it’s not necessarily that one causes the other.
Khan Academy has a great video explaining it:
How to Avoid Spurious Correlations
So, you understand the difference between correlation and causation. But how do you make sure your audience does? Data storytelling is like any other kind of narrative: your words matter. You have to choose them carefully.
Jon Mueller has some great exercises you can do to practice identifying causal language. You can probably identify whether your findings indicate causation or not, so learning to choose the right words will help you avoid spurious correlations.
Still need help? Don’t worry. The team at Datassist is there to help with data collection, analysis, visualization, and storytelling. We are proud to work with journalists, government agencies, and social sector organizations of all sizes. Get in touch today.