Subscribe To Our Newsletter

Get tips and tools to tell your data story better.

No, thanks

 In Data Analysis Concepts Simplified, Data Journalism

Have you been lying with your data?

Of course not. Probably. Maybe? Well, not on purpose.

In reality, there’s a decent chance you have been, no matter how upstanding your intentions. Unfortunately, one of the most common ways of thinking about prediction can lead to accidental lying. Since the road to hell is paved with good intentions, it’s important to stop and ask yourself just how honest your data really is. You need to examine your statistical reasoning.

We all like to think we can predict what will happen if we do certain things. We predict how our health will be affected if we adopt certain diet or exercise regimes. We predict how our finances will fare based on savings or investment plans. And as nonprofits, we like to predict what will happen to the people involved in our projects: will their life expectancy improve? Will they be more successful? Healthier?

Predicting with data is common, and there’s nothing wrong with doing it — after all, an educated prediction based on carefully collected data is a lot better than just randomly guessing at the potential effects of your efforts. But how you make predictions with data can help ensure you’re making honest predictions. Enter statistical reasoning.

What is Statistical Reasoning?

Statistical reasoning is the way we apply concepts and ideas to make sense of the data in front of us. Sometimes it involves connecting multiple concepts; sometimes it may combine ideas about data and chance. The important part of statistical reasoning is that we understand the processes we’re using, so we can be sure we’re interpreting our results without being tripped up by common mistakes or fallacies. For example:

The human brain loves patterns, and data can be full of them. Statistical reasoning is about understanding how to correctly interpret the trends your mind sees when you look at a set of numbers.

Predicting Trends in Longevity

As an example of how your data can make you a liar if you don’t apply statistical reasoning, let’s look at this simple example.

We’ve collected some real data from a reliable source on life expectancies in different countries. We’ve cleaned our data and created a data biography, so we can be certain the data was collected and recorded in a consistent, accurate way.
We see trends in this data, but can we make predictions based on those trends?

So when we look at how trends in longevity changed between 1998 and 2000 in different countries, we’re safe to go ahead and predict how those trends will continue, right?

If we take this clean, accurate, reliable data and use the numbers to project into the next few years — as is frequently done in social research — we get the following trends:

How accurate are the predictions we make based on trends in data?

So, using this reliable data, we can assume that, for a longer life expectancy, we should all move to Bosnia and Herzegovina. Right?

This example is, of course, using numbers from more than a decade ago, so we already know that conclusion is inaccurate. Otherwise, I’d be writing this in Sarajevo. Want to see what actually happened to the life expectancy trends in those countries?

The reality looks nothing like our predictions, which is why we need statistical reasoning.

While the solid line shows our prediction, the dotted line indicates what actually happened to longevity trends in those four countries. As you can clearly see, while the life expectancy in Bosnia and Herzegovina did end up higher in 2005 than it was in 1998, it did not come close to the skyrocketing numbers we predicted with our data. So what happened?

Statistics ≠ Calculations

The problem with making predictions using data is that we can’t assume numbers going in one direction will keep going in that direction, or that change will keep occurring at the same rate. The trends we saw in the data above from 1998 to 2000 were very real, but a real trend is not the same as a guarantee that trend will continue.

Statistics do not equal calculations. Statistics equal calculations plus statistical reasoning. That is, you have to put some thought into what you’re seeing, why it’s happening, and how likely it is to continue.

  • Sometimes the best prediction comes from a straight projection or extension of a trend
  • Sometimes it’s better to use the latest value to predict what will happen next
  • Most often, it’s best to use a combination of the two

It’s important to remember that, to make honest predictions using your data, you need to understand (and, as much as possible, reduce) the variation that is hiding in your numbers. Statistics on their own don’t lie — if your data is collected from a reliable source and cleaned appropriately, it’s as honest as can be. Misuse — often due to misunderstanding — of statistics is when we get into trouble.

Need Help Keeping Your Data Honest?

At Datassist, we work in partnership with nonprofits, journalists, governments, and social sector organizations to help them tell their data stories accurately and honestly. Communicating your data doesn’t have to be difficult or boring. Let our team help tell your story in a way that will engage and educate your audience. Check out testimonials from just a few of our happy partners, or get in touch with us today.

 

Recommended Posts

Start typing and press Enter to search

Regression discontinuity design allows you to simulate a control group without denying aid to qualified applicants.