I’ve talked a lot about the importance of data biographies on this blog. Spending some time to examine our data and where it’s from is important. Are these statistics from a sketchy source? A global authority? The answer can make a huge difference. That’s why a missing data biography — especially one from a source many people will trust — is so troubling.
Through Datassist and partnerships with other organizations, I do a lot of analysis in the area of global domestic violence. I’m always excited to get my hands on new data about attitudes and prevalence, so I can understand emerging trends and see how we as a society are progressing.
The Organization for Economic Co-operation and Development (OECD) recently published its new Social Institutions and Gender Index (SIGI). It includes a breakdown of domestic violence by country, complete with rankings.
But much to my dismay, the OECD has not included a data biography.
Does a Missing Data Biography Matter?
The missing data biography is bad enough. But the OECD has included almost no source data for users at all. This is very worrying.
Why, you ask?
“Through its 180 country profiles, country classifications, unique database and its innovative simulator, the SIGI provides a strong evidence base to effectively address the discriminatory social institutions that hold back progress on gender equality and women’s empowerment and allows policy makers to scope out reform options and assess their likely effects on gender equality in social institutions.”
I don’t want my leaders — or any leaders, for that matter — to make decisions based on questionable data. The missing data biography means we don’t know for sure:
- Where the SIGI numbers came from
- When they were collected
- Who collected the data
- Why it was collected
Details like this are important. They allow us to identify potential bias, incomplete or incongruous data. And they can be found in any respectable dataset’s data biography.
Mixing and Matching Data
The SIGI covers four domains of gender discrimination:
- Discrimination in the family
- Physical integrity restrictions (includes prevalence, attitudes about and laws regarding violence against women)
- Restrictions on access to productive and financial resources
- Civil liberties restrictions
These are all important ways to measure progress in gender equity. But the data they’re using is a combination of statistics from many different sources and years. I tried to determine the source of the country data on attitudes towards domestic violence. This is what I found:
That’s very broad. They’re citing four different sources across twelve years. How can I know which data they use for which specific country?
The source listing for the prevalence data is even worse. (They haven’t listed any sources at all!)
Going Down the Rabbit Hole
The OECD has provided a methodology section for the SIGI. It’s helpful to know how they’re using the data to calculate the rankings and the index. But it still tells us nothing about the data they’re using.
I spent a fair bit of time searching for the source of the SIGI data. As I tried to find information about where the data was from, I ended up in a weird circular loop. Different pages referred me to different places until I ended up back where I started. Eventually, I found a page that told me to look at individual country pages to see details on country-specific data.
Since I’m from Canada, I decided to try that page. Here’s what I found. A nice list of indicator names with no data and no sources:
The OECD aims to promote the economic and social well-being of people everywhere. It’s a noble goal. Measuring and comparing the progress of gender equity in countries around the world is an important step towards that goal. But the missing data biography really damages the value of this index.
Know Where Your Data Comes From
Data biographies are important for a lot of reasons. Even organizations with the best of intentions can draw inaccurate conclusions when they don’t understand where their data is from. For example:
- Knowing who funded data collection can alert you to any potential bias in your dataset
- Missing datasets can hold important clues — you need to know if any data has been left out, and why
- Perspective matters — you need to know who collected your data to determine if their bias can affect your analysis