Last week we started off 2018 with a New Year’s resolution: to write better data biographies. We covered the who and what questions you need to examine before you start your analysis. This week, we’ll continue working on writing the best data biography possible by examining the how and why of our data.
The questions we examined last week focused on obvious properties of the data itself. It doesn’t take a lot of effort to find out who or what your data includes and excludes.
But to write the best data biography, you need to dig deeper. And that’s what we’ll do this week. By determining how and why your data was collected, you can uncover hidden bias or causes for data trends that would otherwise go unnoticed.
So put on your detective hat and let’s jump in!
The Best Data Biography Asks How
Why does it matter how you collected your data? Data is data, right? Of course, it is… but the methods used to obtain it can disguise some serious bias. And that bias can significantly affect your analysis.
Some important questions you need to ask include:
- How was the data collected? Telephone? Door to door? Online surveys? Using government or program statistics?
- What was the sample size? (Remember, bigger is not always better.)
- Is the data representative of the population you are studying? How randomly was it collected? How comprehensive was collection?
- Has the data been cleaned? Have anomalies or outliers been accounted for or removed?
How your data was collected can affect your results, as we’ll see in the example below. But before we get to that…
The Best Data Biography Asks Why
Does it really matter why the data was collected? Surely the numbers don’t know what the goal of the people recording them is.
It’s true. Numbers don’t know what they’ll be used for. But they can change based on the ultimate goal of the research. Data collected for academic research might differ significantly from political or census data. And both of those might look nothing like data collected to test out a new collection method. It’s important to consider why your data was gathered in the first place:
- Was it to learn or understand something — or to prove an existing theory?
- Were collectors gathering information or testing collection methods?
- Is there a political advantage — or disadvantage — to the numbers skewing in one direction?
- Were collectors studying a subject similar to yours, but not quite the same? (Narrower focus, broader focus)
Consciously or unconsciously, our goal for data collection can influence the data itself. Maybe we have a preconceived notion of what the situation really is, and discount contradictory data as outliers. Maybe we stand to gain from demonstrating a certain result, so we nudge numbers in that direction. Perhaps our own inherent biases affect where we collected the data, who we gathered it from, or how we asked the questions — because we only see the situation from our own perspective.
How and Why Matter
A while ago, we were working with data from the United Nations Women arm on a project measuring the progress of women’s rights. We were examining UN data on the rates of violence against women in different countries.
The statistics coming out of Malawi immediately drew attention. The figures suggested a significant increase in intimate partner violence against women in a relatively short period of time (from 2004 to 2005). What was happening?
The number of women in Malawi who said they had experienced intimate partner violence at some point went from 22.1% in 2004 to 30.1% in 2005. What changed? And the number dropped again — to 21.7% — in 2010. Was the rate of violence against women in Malawi really jumping around so radically? Why?
We found the answer to our queries by asking how and why:
The data for most years came from Malawi’s National Statistics Office. They had collected it in a consistent, traditional manner. The data from 2005, however, was gathered by a research group using emerging methods. The difference in methods and goals had changed the numbers. By building the best data biography possible, we uncovered the discrepancy. Otherwise, we might have wasted valuable resources trying to uncover the causes of the spike.
Keep Your Data Story Honest
Applying a little statistical reasoning and taking the time to create the best data biography you can will help keep your analysis accurate and your stories honest. Want some help telling your data story? The team at Datassist is at your service. Get in touch today to discuss your project.