Subscribe To Our Newsletter

Get tips and tools to tell your data story better.

No, thanks

 In Data Analysis Concepts Simplified, Data Storytelling

We live in an increasingly data-driven world. Obviously, this is very exciting to me. Data gives us the power to make big changes and to better understand what’s happening around us. But it’s also scary. The more I work with nonprofits, policymakers, and data journalists, the more I encounter victims of ecological fallacy. I’m starting to think it’s the biggest hurdles to good data analysis. I’ve talked about it before. You may feel like this is a rerun. But whether this is new information or old hat, it’s something we all need to understand to use data correctly.

What is Ecological Fallacy?

Ecological fallacy is a logic error in the interpretation of statistical data where inferences about the nature of individuals are deduced from inference for the group to which those individuals belong.

Thanks, Wikipedia. But in layman’s terms?

Basically, it’s all about trying to answer a different question than the one your data can answer. It’s making assumptions about individuals in a group based your knowledge of that group as a whole. It can be tempting to assume we understand what’s happening with individuals within a group, but it’s important to resist the urge.

Canadians as a group are known for being more polite than average. An ecological fallacy would be assuming that any single Canadian you meet will be exceptionally polite. (We still have rude people too!)

Because you haven’t measured the precise politeness of each and every Canadian, you can’t know how pleasant your interaction might be with any one individual.

You can’t draw conclusions about an individual based on information about a group.

It’s Important to Use Data Correctly

In the example I used above, there isn’t a lot at stake. No one is really harmed if you fall victim to the ecological fallacy. But when we’re making decisions that could affect people’s lives, it’s critical that we use data correctly.

Let’s look at this graph. It displays a link between cigarette smoking and life expectancy — but it’s not the link you’d expect. The data here indicates that people in countries where more cigarettes are smoked have a longer life expectancy. (To be clear, this is real data and not a mistake.)

This chart suggests smoking four cigarettes a day will add 10 years to your life.

Does that mean we should all rush out and buy a pack of cigarettes? Four cigarettes a day for an extra ten years is not insignificant. Yet medical science tells us unequivocally that smoking is bad for us. So what’s going on here?

The problem is not with the data, the analysis, or the visualization. The problem is with the title.

The question being asked is “Is smoking cigarettes good for your health?” But the data we have only tells us the average life expectancy and average cigarette consumption of different countries. The chart is asking a question about an individual, but that answer doesn’t exist in the data we have.

The problem isn't with the data; it's with the question.

To be clear, if we look at data on individuals, we see that medical science is right. Smoking tends to decrease life expectancy.

We can only tell stories about individuals with data on individuals.

Tell Stories in the Right Direction

Trying to tell a story about an individual based on data about a group simply doesn’t work. While we can take data on a number of individuals together to talk about the group as a whole, it’s impossible to go in the other direction.

Do you need help telling your data story? Want to ensure you can use data correctly and avoid traps like ecological fallacy? Our experts are always here to help. Whether you’re a journalist, government agency, or social sector organization, Datassist is ready to support you. Get in touch with us today.


Recommended Posts

Start typing and press Enter to search

Should you pay survey respondents? Should you not?The rules we think we know about causality don’t always apply.