We live in an increasingly data-driven world. Obviously, this is very exciting to me. Data gives us the power to make big changes and to better understand what’s happening around us. But it’s also scary. The more I work with nonprofits, policymakers, and data journalists, the more I encounter victims of ecological fallacy. I’m starting to think it’s the biggest hurdles to good data analysis. I’ve talked about it before. You may feel like this is a rerun. But whether this is new information or old hat, it’s something we all need to understand to use data correctly.
What is Ecological Fallacy?
Ecological fallacy is a logic error in the interpretation of statistical data where inferences about the nature of individuals are deduced from inference for the group to which those individuals belong.
Thanks, Wikipedia. But in layman’s terms?
Basically, it’s all about trying to answer a different question than the one your data can answer. It’s making assumptions about individuals in a group based your knowledge of that group as a whole. It can be tempting to assume we understand what’s happening with individuals within a group, but it’s important to resist the urge.
Canadians as a group are known for being more polite than average. An ecological fallacy would be assuming that any single Canadian you meet will be exceptionally polite. (We still have rude people too!)
Because you haven’t measured the precise politeness of each and every Canadian, you can’t know how pleasant your interaction might be with any one individual.
It’s Important to Use Data Correctly
In the example I used above, there isn’t a lot at stake. No one is really harmed if you fall victim to the ecological fallacy. But when we’re making decisions that could affect people’s lives, it’s critical that we use data correctly.
Let’s look at this graph. It displays a link between cigarette smoking and life expectancy — but it’s not the link you’d expect. The data here indicates that people in countries where more cigarettes are smoked have a longer life expectancy. (To be clear, this is real data and not a mistake.)
Does that mean we should all rush out and buy a pack of cigarettes? Four cigarettes a day for an extra ten years is not insignificant. Yet medical science tells us unequivocally that smoking is bad for us. So what’s going on here?
The problem is not with the data, the analysis, or the visualization. The problem is with the title.
The question being asked is “Is smoking cigarettes good for your health?” But the data we have only tells us the average life expectancy and average cigarette consumption of different countries. The chart is asking a question about an individual, but that answer doesn’t exist in the data we have.
To be clear, if we look at data on individuals, we see that medical science is right. Smoking tends to decrease life expectancy.
Tell Stories in the Right Direction
Trying to tell a story about an individual based on data about a group simply doesn’t work. While we can take data on a number of individuals together to talk about the group as a whole, it’s impossible to go in the other direction.
Do you need help telling your data story? Want to ensure you can use data correctly and avoid traps like ecological fallacy? Our experts are always here to help. Whether you’re a journalist, government agency, or social sector organization, Datassist is ready to support you. Get in touch with us today.