Subscribe To Our Newsletter

Get tips and tools to tell your data story better.

No, thanks

 In Current Events, Data Resources for Nonprofits, Experts

I’ve talked a lot about the importance of data biographies on this blog. Spending some time to examine our data and where it’s from is important. Are these statistics from a sketchy source? A global authority? The answer can make a huge difference. That’s why a missing data biography — especially one from a source many people will trust — is so troubling.

Through Datassist and partnerships with other organizations, I do a lot of analysis in the area of global domestic violence. I’m always excited to get my hands on new data about attitudes and prevalence, so I can understand emerging trends and see how we as a society are progressing.

The Organization for Economic Co-operation and Development (OECD) recently published its new Social Institutions and Gender Index (SIGI). It includes a breakdown of domestic violence by country, complete with rankings.

But much to my dismay, the OECD has not included a data biography.

 

Does a Missing Data Biography Matter?

The missing data biography is bad enough. But the OECD has included almost no source data for users at all. This is very worrying.

Why, you ask?

Through its 180 country profiles, country classifications, unique database and its innovative simulator, the SIGI provides a strong evidence base to effectively address the discriminatory social institutions that hold back progress on gender equality and women’s empowerment and allows policy makers to scope out reform options and assess their likely effects on gender equality in social institutions.

 

I don’t want my leaders — or any leaders, for that matter — to make decisions based on questionable data. The missing data biography means we don’t know for sure:

  • Where the SIGI numbers came from
  • When they were collected
  • Who collected the data
  • Why it was collected

Details like this are important. They allow us to identify potential bias, incomplete or incongruous data. And they can be found in any respectable dataset’s data biography.

 

Mixing and Matching Data

The SIGI covers four domains of gender discrimination:

  • Discrimination in the family
  • Physical integrity restrictions (includes prevalence, attitudes about and laws regarding violence against women)
  • Restrictions on access to productive and financial resources
  • Civil liberties restrictions

These are all important ways to measure progress in gender equity. But the data they’re using is a combination of statistics from many different sources and years. I tried to determine the source of the country data on attitudes towards domestic violence. This is what I found:

Data on attitudes towards domestic violence come from four different organizations, in years ranging from 2005 to 2017.

That’s very broad. They’re citing four different sources across twelve years. How can I know which data they use for which specific country?

The source listing for the prevalence data is even worse. (They haven’t listed any sources at all!)

The OECD has not listed the sources of their data on the prevalence of domestic violence.

 

Going Down the Rabbit Hole

The OECD has provided a methodology section for the SIGI. It’s helpful to know how they’re using the data to calculate the rankings and the index. But it still tells us nothing about the data they’re using.

I spent a fair bit of time searching for the source of the SIGI data. As I tried to find information about where the data was from, I ended up in a weird circular loop. Different pages referred me to different places until I ended up back where I started. Eventually, I found a page that told me to look at individual country pages to see details on country-specific data.

Since I’m from Canada, I decided to try that page. Here’s what I found. A nice list of indicator names with no data and no sources:

Canada’s 2014 Country Profile includes almost no information on the data at all.

The OECD aims to promote the economic and social well-being of people everywhere. It’s a noble goal. Measuring and comparing the progress of gender equity in countries around the world is an important step towards that goal. But the missing data biography really damages the value of this index.

 

Know Where Your Data Comes From

Data biographies are important for a lot of reasons. Even organizations with the best of intentions can draw inaccurate conclusions when they don’t understand where their data is from. For example:

Want to know more about why data biographies are so important? Read up on how to build the best data biography you can, or get in touch with the experts at Datassist.

 

Recommended Posts

Start typing and press Enter to search

My new favourite randomized controlled trial ever and perfectly illustrates some of the problems with RCTs.Ethics in data science must be included in every stage of the data product life cycle.