I learned a lot at the recent NICAR conference. But aside from the knowledge and skills I gained while spending time with some of the world’s top data journalists, I also realized just how much of a hunger there is to “bulletproof” data.
It’s exciting to see the appetite so many have to understand data and use it correctly, but over and over, I was asked the same questions:
“How can I make my data bulletproof?”
“How can I be sure everything in my analysis is correct?”
Of course, as a statistical consultant, this conference was not the first time I was asked about delivering bulletproof data — and it won’t be the last! But whether it’s a data journalism story for publication, an evaluation report for a nonprofit, or an article on new methods of using data in the field, my answer is always the same:
I cannot bulletproof your data.
Anyone who says they can is either new to the field or not being honest. Statistical and data analysis is not the practice of removing uncertainty from your work — it is the practice of helping you understand the uncertainty in your work.
Creating a Transparent Data Story
There are three basic steps to creating your data story: finding data, analyzing data, and communicating your results. Each of these stages involves choices on the part of the analyst, and each one of these choices is a little more or less correct, depending on the goal of your work, your worldview, and that of your audience.
There is no such thing as a perfect data story — the closest you can get to bulletproof data is to be completely transparent about the choices you made and why you made them.
Finding Your Data
Let’s go back to the example I used in my previous NICAR post about data biographies. I was working on a story about violence against women, and had to go through each of the three steps: I accessed the data, analyzed it, and communicated results.
Our data came from a trusted source and was presented to look clean and well organized — as data often is. But once we started digging into the details, we found what was actually measured and how those measurements were collected varied a great deal, because the United Nations got their data from a variety of sources with different methods and goals.
Data on real people will alway be subjective. It is impossible to find bulletproof data when dealing with human beings. Even decisions on how to ask your questions can affect your results — think of all the different ways there are to ask a woman if she has ever been a victim of violence. Think of all the different kinds of violence against women there could be, and how different people might interpret those incidents in very different ways.
Your data will never be perfect. The best you can do is to be open and honest about where the data came from, who collected it, and why they did. Be sure to read our post on building a data biography — it’s an important part of ensuring your story is an honest one.
Analyzing Your Data
Next comes analysis. The way we analyze data is also almost never objective because it always involves some level of personal opinion and worldview. Statistics is an art as well as a science.
One of the most basic data analyses done around violence against women is the calculation of a rate of occurrence. The math behind calculating a rate is pretty straightforward — in essence, it’s one number divided by another. Your numerator will, in all likelihood, be the number of women who have reported experiencing violence, but…
- Will you count all women, ever?
- Women who reported in the past 12 months?
- Only women in a certain age group?
- Women who are married? Dating?
- How accurate will this number be?
Then comes the denominator, which should include the population that corresponds to the numerator you’ve selected. How accurately can you count this entire population?
Already, this is looking more complicated than a simple equation, and this is only the most basic calculation of rate. If you want to build a statistical model that will estimate the different social determinants of violence — to write a story or build some evidence-based policy, for example — it becomes even more complex.
What variables will you include in your model? If you choose to control for specific variables, you are directly creating a worldview. For example, if you control for income, you are suggesting that income is a confounding factor. However, if income is a mediating factor, it has no place in your model — and here’s the tricky part:
There is no math available to tell you whether a factor is confounding or mediating.
(If all this discussion about variables is confusing, check out our post on data relationships.)
You can only make this decision using your personal context and worldview, so once again, bulletproof data analysis is impossible. There are endless discussions (some more civil than others) in which online statisticians debate the pros and cons of a plethora of methodologies and analytic strategies. To be clear, there are some methods that are decidedly incorrect and guaranteed to provide misleading results. But while it’s not too challenging to figure out what’s wrong, it’s nearly impossible to figure out what’s right — there is almost never a definitively correct answer.
Communicating Your Results
The final step is to communicate your results, and how you do this will vary widely from audience to audience — obviously, the best way to communicate data to a general audience in a widely-read publication is not the same as the best way to communicate data to busy policymakers. Or amongst peers. Or to industry experts.
As with analysis, while there are a few pretty universally acknowledged ways not to communicate data, there is almost never one correct, bulletproof way to tell your data story. Our good friend and colleague Alberto Cairo has an excellent example on his site of data visualization experts discussing (and disagreeing on) the best way to communicate data in a way that will engage and educate an audience.
The debate started when Stephen Few (another data expert) posted a piece on his blog about a Time Magazine infographic he felt was very poorly done. Cairo, Few, and a number of other data experts and enthusiasts began discussing the merits and flaws of the Time graphic, as well as suggesting replacements.
Even though more than one person in the conversation could be considered an expert in visualization, there is still disagreement on which was best — because there is no perfect answer for how data should be presented.
As Close as You Can Get to Bulletproof Data
Want to tell your data story in a way that’s accurate, educational and engaging? At Datassist, our team of statisticians, data analysts, and visualization experts can help present your data in a way that will impact your audience. Get in touch now to discuss your needs.