Data journalists need to understand the concept of ecological fallacy — or they need to stop using data.
But wait; isn’t using data to support your arguments a good thing? Which are you more likely to believe, a new story with a sensational headline or a news story with a sensational headline backed up with statistics? Any self-respecting critical thinker would choose the latter, which is why journalists are eager to line their stories with data that supports their claims. Nothing wrong with that, right?
Nothing at all, assuming they understand how to accurately interpret that data — which is where ecological fallacy comes in.
Data Journalists’ Biggest Headache
Several years ago, a couple of British papers reported skyrocketing levels of antidepressant use in the United Kingdom. Both papers attributed the increase in antidepressant prescriptions to the country’s economic uncertainty, suggesting a growing number of people were depressed and turning to medication because of concerns about their jobs or financial status. Both cited (completely accurate) statistics showing the number of prescriptions written for antidepressants had increased by 28% in three years.
So what was the problem?
The number of prescriptions written across the nation had most definitely increased. The journalists had done their homework and reported the news. The fundamental flaw in the reporting occurred when the journalists inferred that a higher number of prescriptions meant an increased number of patients.
The statistics presented showed that more prescriptions for specific drugs were being written in England, but never said anything about if an increased number of individuals were experiencing depression.
Dr. Ben Goldacre points out that doctors could be writing a higher number of shorter prescriptions (for example, giving patients one month’s worth of antidepressants instead of three months at a time) or replacing dated methods for addressing depression with these newer, more effective drugs. A comment on Goldacre’s blog even points out that an increased number of prescriptions could actually be attributed to fewer suicides in those treated — resulting in patients requiring regular dosages of their medication for a longer period of time.
So while the newspaper stories portrayed the statistic as indicative of a problem, when we understand ecological fallacy, we see that is not necessarily the case.
Data Analysis is Not a Two-Way Street
Perhaps, to see the problem more clearly, we should look at exactly what an ecological fallacy is:
An ecological fallacy is the interpretation of statistical data where inferences about individuals are made from data about a group to which those individuals belong.
Data is often collected at the individual level and then analyzed at the neighbourhood level — looking at results across the entire group. The problem arises when we try to go in the opposite direction and use data collected at the neighbourhood level for analysis at the individual level. It feels scientific because you’ve got real data — but you’re making assumptions that are unscientific at best, and at worst, downright wrong.
Let’s look at another example. Statistics show that American states with a more immigrants have a higher proportion of households with incomes of $100,000 or higher; stated more plainly, wealthier states have higher immigrant populations. Our data supports this claim.
But what if we try to make deductions about the populations of those states based on that information? At first glance, it might look like our numbers indicate immigrants are more likely to have incomes over $100,000 — when in fact, the opposite is true. Trying to use state-level data to give us information about individual households leads us to incorrect conclusions.
Confused? Let’s break it down. Remember data analysis can only go in one direction.
- A study of individual household incomes in the US can tell us information about both those households and averages across states.
A study of average household incomes across states can only provide us insight into state averages, not individual households.
I’ve made a short video with a few more examples of ecological fallacy, and how it can be particularly tricky to spot when the results you infer agree with your worldview:
When using data analysis for any type of journalism, it’s critical to remember that using broad data to draw detailed conclusions can be very misleading. You can’t do a statistical test to disaggregate data.
Avoiding Ecological Fallacy in Your Journalism
Want to learn more about how to use data to accurately convey your story? Join data analysis and visualization expert Alberto Cairo (and me!) for our free online workshop Data Exploration and Storytelling: Finding Stories in Data with Exploratory Analysis and Visualization. The program is open to anyone interested in learning more about data analysis and begins January 16.
If you’re struggling to tell a story that engages your audience using your organization’s data, Datassist is here to help. If you’d like some data support writing your stories get in touch with our team today.