The beginning of a new year gives us all the opportunity to wipe the slate clean of bad habits and start fresh. Around the office here, everyone is about self-improvement, in one way or another: exercising more, volunteering at a local charity, achieving a good work-life balance. So what better way to restart blogging than with tips on things you might be trying to improve? Our focus: how to write better data biographies.
We’ve talked about building a data biography before, but it never hurts to dig a little deeper when you’re trying to improve. We’re going to make this a two-part post, so we can really get into the nitty-gritty. This week, we’ll cover the importance of who and what questions. Next week, we’ll follow up talking more about how and why.
Let’s get started!
Data isn’t Always Objective
I’ve talked a lot recently about equity in data science, and why our analysis is rarely as scientific and unbiased as we think. The potential subjectivity of data is why it’s so important to write better data biographies. The more we know about where our data came from, the more likely we are to uncover any biases that might be lurking within.
The source of your data (the who factor) and which data you choose to examine (what) can significantly impact how objective your data is. Your intentions may be good, but that’s not always enough. Even sources that seem trustworthy still have their own worldview embedded in their data. And the data we leave out often says just as much about us as the data we choose to include.
So how do we write better data biographies?
Write Better Data Biographies: Who
The source of your data is an important who to consider when developing your data biography, but it’s not the only one. In order to write better data biographies, we should examine who:
- Collected this data?
- Owns this data?
- Is included in this data?
- Is excluded in this data?
Data collectors and owners are not perfect. They’re still human, just like you and me. And like us, they can — consciously or unconsciously — bias their data based on their own worldview. Even very basic math can be affected by assumptions we make without thinking.
Data subjects can also give us some insight into hidden bias in our numbers. Understanding which groups (by gender, age, education level, race, sexual orientation, income level, geographic location… you get my point) are included and which are not can help you identify bias in data. Would including a certain population that was left out drastically change your numbers? Does the inclusion of some people affect your analysis?
Taking time to consider the who factor in your data goes a long way to removing hidden bias.
Write Better Data Biographies: What
The next thing you need to consider to write better data biographies is what data you’re using. This step is one that many beginners overlook. You’ve chosen your data, after all, you know what data you selected. We get caught up in examining the details within the data — and forget to back up and look at it as a whole.
- What data was collected?
- What data was not collected?
Whether you gather data yourself or acquire it elsewhere, this question is crucial. What you include (or exclude) in your data can make a world of difference to your analysis. Obviously, you’re not just looking at all the data in the world. That would be impossible. But how did you select your constraints? What criteria are you using to choose what goes in and what stays out? Are you:
- Only looking at a specific time period or geographic location?
- Confining your studies to specific groups?
- Discounting outliers in your data?
- Missing relevant data on certain times or populations?
None of these factors will invalidate your analysis per se. But they might impact it. When you write a better data biography that details your decisions about what data to include, you improve the transparency of your analysis.
Take Your Story From Average to Awesome
Evidence is valuable. Especially if you want to tell a story on a topic that is emotional, political, or optically challenging. When you know how to write a better data biography, you ensure your story is as honest as possible. Want help improving the transparency of your data story? The team at Datassist is here to help. Get in touch with us today.