Fake news! It’s (sadly) a term we’re all familiar with, regardless of where we live or what language we speak. But it isn’t exclusive to social media or American politics. Data stories are just as vulnerable. Do you know how to spot bad data? Data that, intentionally or otherwise, is misleading audiences?
Being able to spot bad data — or rather, careless or biased presentation of data — is an important skill. As a society, we are easily swayed by facts and figures. But we don’t always know how to tell if those facts and figures are telling the whole story.
This post was inspired by Compound Interest’s A Rough Guide to Spotting Bad Science. Many of the points made on their infographic directly relate to what I’m going to talk about. (Want to spread the message? Their graphic is also available in Russian and Spanish.)
So how can you spot bad data when you see it? Misleading stats can be the result of bad data collection, shoddy or incomplete analysis, or poor visualization. Here are a few things to keep in mind.
Where Did This Data Come From?
The first culprit to examine as you try to spot bad data is the collection. Who gathered the stats you’re looking at? How they came by those numbers? The story you’re reading could have bias built in from the very beginning.
Questions to ask:
- Who collected this data?
- Is the data source provided?
- If so, does the data come from a reputable source?
- How complete is the data?
- Are some stats missing?
- Was the sample size big enough?
- Was the data gathered in a way that ensures it is representative of the subject?
- Why was this data collected?
- Did the person or organization gathering the information have an agenda?
- Was it collected to support this particular story, or for another reason?
An author can introduce bias into a data story either accidentally or deliberately when the data source isn’t carefully scrutinized. Make sure you understand where the data came from.
How Was This Data Analyzed?
If the source(s) of your data look good, examine how the data was cleaned and analyzed. This is a big one when trying to spot bad data. Even skilled and well-intentioned analysts can fall victim to a number of fallacies.
Take a look at how the data was cleaned (if possible). How were missing values handled? Are outliers included in the analysis? Should they be? Averages and trends can change drastically with the addition — or omission — or even one or two values.
Check for common fallacies. Even experts can sometimes get tripped up by common analytical fallacies.
- False causality fallacy – You’ve heard the saying “correlation doesn’t equal causation, right? That’s this.
- Ecological fallacy – Data analysis is not a two-way street. Watch for stories that draw conclusions about individuals based on data about groups.
- Prosecutor’s fallacy – Make sure that the question being asked is the same one the data is answering.
Does This Viz Tell an Accurate Story?
It’s easy to think that if the data is good and the analysis has been conducted carefully, the results must be accurate. But you can still spot bad data in visualizations that don’t accurately convey what the data says.
Data viz must reflect the culture it is speaking to. Different cultures interpret graphics differently. Make sure you understand the context of the viz you’re looking at. Who created it? For what audience?
Data relationships are often complex. They can be challenging enough for those of us with a background in statistics to understand — and downright opaque for those without. If the visualization you’re looking at is interactive, play around with it. Are the results clearly and consistently displayed, regardless of how you manipulate the graphic?
Our Experts Can Help Spot Bad Data
Are you concerned that your organization’s data stories might be misleading? Do you need help collecting, analyzing, or visualizing data? The team at Datassist is here to help. Drop us a line to discuss your needs or concerns.