What, in the world of data journalism, makes a good story?
For starters, the story must be interesting. It must be meaningful; it must be informative. Most journalists tick all three of these boxes brilliantly. So what is the roadblock? Where are journalists being tripped up?
Stories are about relationships. Interesting plots convey changing trends and what influences them. Restricting ourselves to speculating on potential relationships (correlation) rather than actual ones… well, it gets really boring, really quickly. It’s very hard to tell a compelling story without causation.
We all know the trite sayings. Correlation does not imply causation, or as Nate Silver says, “You don’t light a patch of the Montana brush on fire when you buy a pint of Haagan-Dazs.” Unfortunately, while these rules may work for authors of scientific journal submissions, they don’t offer much aid to journalists trying to get the general public to understand the world around them.
Understanding Causation in Data Relationships
Journalists already understand that correlation isn’t causation. In many cases, they know it better in an applied sense than academics. But what they want — need, really — to understand is how, if ever, they can write about causation. This is an area where we in the statistical profession could make a significant contribution.
- What types of data and analysis can they accurately write about as causal?
- What are the differences between mediators and confounders?
- Which types of data relationships can be included when building a causal model?
Many vital journalism pieces are dealing with theoretical causal questions at their heart.
The question of whether or not to vaccinate — and how it affects the rest of the world if you don’t — is a causal question that is repeatedly raised in the press, as in the Washington Post’s piece, Why a Few Unvaccinated Children are an Even Bigger Threat Than You Think.
Unfortunately, incorrectly interpreting statistics or communicating conclusions here is a big problem, as seen in ‘We Failed’ in Presentation of HPV Vaccine Story, Star Publisher Says. Dr. Ben Goldacre quickly called out the offending story — but not before it was consumed by members of the public who could be influenced by its conclusions.
“Reporting the raw data from an open adverse event reporting system in that manner is simply misleading, and an abuse. Where data is made openly accessible we all have a responsibility to reciprocate, and analyse/report on it competently. You have abused that trust, with a platform so large that you will inflict harm.”
Crime is another subject to which copious column inches involving causal inference are devoted. (See The Atlantic’s What Caused the Great Crime Decline in the U.S.?) Political reporting sees journalists play at both predictive and causal inference (Conventions May Put Obama in Front-Runner’s Position).
These are all worthy and important topics — which is why it’s critical to tell an accurate story.
A Partnership of Theory and Application
I spent many hours at the NICAR 2017 conference (and other journalism conferences over the years) talking with writers and editors about causal inference from observational data. These conversations are vital; journalism professionals often understand causation in a way that statisticians don’t — and vice versa.
There is much to be learned on both sides of the divide if we in the statistics world can move past repeating the old trite sayings. We must challenge ourselves to help journalists find ways to answer causal questions and develop methods of communicating these answers to broad audiences through stories and visualization.
Are you a data journalist who wants to learn from (or teach) the data experts? At Datassist, we love working in partnership with data journalists, as well as nonprofits and social sector agencies. Drop us a line and let’s talk about how we could work together.