This week’s blog post was inspired by Catherine D’Ignazio’s piece on putting data back into context over at DataJournalism.com. We’re big fans of her work and were super-excited to see that our resources on data biographies merited a mention. Putting data in context is incredibly important, and a big part of what we try to teach our partners here. So for that reason, we felt it might be timely to add our own thoughts on the subject. (Although you should absolutely read Catherine’s article!)
Data isn’t Just Data
A common misconception is that data is just raw fact. Unbiased, complete, and ready to use. In fact, the word data means “that which is given.” The implication is that data just exists in some neutral space.
And that just isn’t true.
There isn’t some magical, untouched library of data out there. All our data exists because someone collected it. And the very act of collecting it can imbue it with the worldview of those doing the collecting — whether they intend to or not. I posted recently about the importance of understanding your data’s life cycle. Implicit bias can be introduced into your data in a variety of stages, based on:
- Who is funding the collection or analysis
- The motivation of the data collection
- Project design
- The method used to collect the data
- The method used to analyze the data
- Interpretation of the results
- How those results are communicated
Putting data in context doesn’t bulletproof your data. But it can go a long way to helping ensure you remove — or at very least, understand — the biases that may be hiding in your data.
Putting Data in Context is Hard
“Establishing and understanding the context of your data is likely one of the single most challenging aspects of doing data journalism. It’s like starting out with the leaves of a tree and then trying to connect them back to their branches and roots.”
But why is it so difficult?
One of the biggest challenges we face when trying to put data in context is a lack of standardization. There are no set rules for how data should be collected, stored, or presented. And until relatively recently, many organizations that were collecting data were doing so for their own internal purposes — meaning they didn’t store the information in a way that made it easy for others to access, understand, or use.
Governments and other social sector organizations are increasingly opening up their data vaults to the public, providing a veritable treasure trove of information that’s already been collected and catalogued. But it’s not without its problems.
So What Do You Do?
D’Ignazio suggests a system she calls the “Three-Step Context Detective” — and it’s a good one. (It’s very similar to the strategies I’ve taught students in courses on crafting data stories.)
Get to Know Your Data
I’m talking get to know it at the most basic level. We’re not into motivations or sources yet — this step is literally opening up your spreadsheet and figuring out exactly what data it holds. In this stage, you should be asking questions like:
- How many rows and columns of data do you have?
- Are you sure you understand what each row is counting?
- Over what time period was this data collected?
- What geographic area does this data cover?
- Is there a lot of obviously missing data?
Check out a free tool that D’Ignzaio and Rahul Bhargava have developed to simplify this process.)
Read Your Data’s User Manual
If you’re lucky, the data you’ve acquired will come with metadata. Metadata is kind of like a data dictionary — or a decoder ring! — in that it will help clarify and explain what your data is measuring, how it was measured, and how to use it.
Unfortunately, not all datasets come with metadata. And those that do don’t always make where the metadata is stored readily apparent. Be prepared to put some work into really understanding where your data came from.
Create Your Data Biography
This is a stage I’ve talked about a lot here. (And also the section where D’Ignazio highlights some of our work on helping people develop data biographies. And our free downloadable data biography template!)
Imagine you’re a reporter. If you were interviewing someone for a story, you’d want to know their background so you could determine how reliable they are as a source. Treat your data like an interviewee — understanding where it comes from will give you a better handle on how reliable it is.
Want Help Putting Data in Context?
Catherine D’Ignazio has included links to a massive number of tools and resources in her article that can help you put data in context. (Seriously, go read it.) But if you still find yourself struggling, the team at Datassist is always here to help. Drop us a line to discuss your project now.