Are you committing any of the seven deadly sins of statistical analysis?
Today we continue our end of summer blog series highlighting the importance of understanding the research that forms the foundation of your data story. I’d like to draw your attention to some serious — but sadly, common — mistakes that occur far too often in statistical analysis.
Someone recently pointed me in the direction of this piece on The Conversation, which offers a full rundown of what they consider the seven deadly sins in this field:
- Assuming small differences have meaning (rather than chalking them up to chance)
- Equating statistical and real-world significance
- Ignoring extremes (and the effect they can have on averages)
- Putting too much faith in coincidence (and not understanding data relationships)
- Labelling graphs deceptively (or not at all!)
- Getting causation backward
- Failing to evaluate potential third factors
I work with a lot of groups who fall victim to the last two on that list. The art of understanding causation with data is a tricky one, to say the least. And when a trend in our data reflects a story we want to tell, it’s tempting to jump the gun and publish it. Let’s look at these statistical analysis sins a little more closely, shall we?
Seeing Causality in the Wrong Direction
“Correlation is not causation.”
– Every statistician ever
Seeing causation where there is none is not the only way causation can trip up your statistical analysis. Often, when we identify a correlation between two things, we are quick to jump to conclusions about which causes which.
When I talked about prediction vs. causality a while back, I used the example of police force size and crime rate.
Examining state-level data to understand the effects of police force sizes on crime rates, we notice that regions with larger police forces tend to predict higher crime rates. From this, we can conclude that, by cutting police force resources, we can effectively reduce crime rates in any given area, since smaller police forces = lower crime rates.
Obviously, most of us would not jump to that particular conclusion. Here’s a trickier example. Say you’re studying mental health and unemployment, and you discover a correlation between the two. Many of us would be quick to say the mental health problems increase unemployment, right? But what if being unemployed is causing mental health issues for some people?
Once you recognize this trap, it’s not difficult to avoid. Simply stop and examine your assumptions. If you’ve identified a causal path, ask yourself if it could be reversed. If so, more research may be needed before you can reach a conclusion.
Ignoring Potential External Factors
Data relationships can be incredibly complicated. Successful statistical analysis depends on your taking the time to evaluate all factors that might cause correlation — not just the factors you’re investigating. Because it’s still (technically) summer, I’m going to reuse one of my favourite examples of how external factors can make a mess of your conclusions.
Statistics show us that, throughout the year, murder rates and ice cream sales appear to have a positive correlation. So does eating ice cream make you a murderer? Does the news of nearby murders send us in a frenzy to Ben and Jerry’s?
If all I were to examine here was homicide statistics and data on ice cream sales, I could easily reach a wildly inaccurate conclusion. Obviously, deeper investigation reveals that an increase in temperature affects both murder rates and frozen dessert sales.
Again, this is a hyperbolic example to illustrate my point. But data relationships can sneak up on us in much more subtle ways:
Make sure to consider any potential outside factors that could be influencing correlation.
Want to Improve Your Statistical Analysis?
We’ll be talking about many of these — and other — mistakes to avoid in our upcoming course Crafting Data Stories. (For those who participated in the MOOC I ran earlier this year, this course will be a little different. Check it out early to avoid disappointment!)
Does your team need expert assistance with statistical analysis, data collection, or visualization? At Datassist, we specialize in helping nonprofits, governments, and journalists with all things data. Drop us a line to discuss what we can do for you.