Subscribe To Our Newsletter

Get tips and tools to tell your data story better.

No, thanks

 In Data Analysis Concepts Simplified, Data Analysis Tools, Team

By Julia Silge, Datassist Data Scientist

Why would my organization want to use census data to make policy decisions? That data isn’t relevant to us. Oh, but it is…

My children go to school in my local public school district. Our school district sometimes polls the community using a popular online survey tool to inform decisions the school district needs to make; these web-based survey tools are in wide use by many kinds of organizations because it is easy, quick, and cheap to set up a survey and then get your results.

Sometime in the past year, I got an email from our school district with a link to an online survey about whether our high schools should shift their starting time later. The school district was motivated by the recent research indicating that teens need more sleep, specifically in the morning.

I was thrilled to see the school district open to making an evidence-based policy decision, but I was less happy when I saw how the online survey questions were framed.

There were no demographic questions on the survey, so the school district did not have information with which to interpret the responses that they received. Specifically, the school district had almost no way to deal with the issue of bias.

What if more men than women answered the survey — or vice versa? Or if community members of one racial or ethnic group answered the survey more than others? And what if an imbalance in responses like this was combined with a difference of opinion? If different groups of people are represented unequally in a survey and have varying opinions on a survey question, then the result we infer from survey responses will be different from the real, true opinion in our community.

External Data Can Help Weight Data for Better Decisions

Fortunately, with care in framing survey questions and some statistical weighting tools, we can begin to account for these problems and reach a better understanding of a community’s true opinion.

Using survey weighting and U.S. Census data, we can improve the quality of online survey results. These results still are unlikely to be as reliable as a carefully designed scientific survey, but you can use this approach to make your online survey results less biased and more accurate. A full-fledged scientific survey may be beyond the resources available to you, but doing a better job with the resources you have is not!

Let’s look at my example about a survey question from a school district and see how this can work:

  • Imagine that a school district in San Diego County wants to survey its community members to learn about whether people would approve moving high school start times later.
  • The school district sends out an email with the survey questions, asking not only the start time question but also demographic questions about sex and race/ethnicity.
  • The school district can now compare the demographics of the survey respondents to the demographics of the community.

(The U.S. Census makes detailed, extensive demographic information publicly, freely available to anyone; the American Community Survey, in particular, is a great source for understanding a community.)

More women responded to the survey than men.

Left-hand graph shows the actual population breakdown; the right-hand graph shows the breakdown of survey respondents.

Let’s say that there was an imbalance in the response rate for this survey; more women answered the survey than men.

White community members responded more frequently than other races or ethnicities.

Actual population breakdown is shown on the left, breakdown of survey respondents on the right.

Also, white community members responded to the survey at a higher rate than community members of color.

If all different groups of people had the same opinions, the kinds of over- and under-representation displayed in these visualizations would not make a difference. However, in real life, often different groups of people both have different opinions and are represented unequally in a survey; this causes bias in survey results.

In our pretend survey here, let’s say that women are more likely to approve moving the start time later while men are more likely to disapprove moving the time. Also, let’s say that white community members are more likely to approve moving the start time later while community members of color are more likely to disapprove moving the time. Because of these differences in opinion, taking a survey that is not representative of the population as a whole will give you a result that is different than the true opinion in the population.

Unweighted results stand out against the weighted results and actual popular opinion.

Weighting survey results with census data can help the school board make better policy decisions.

In this pretend survey, because the demographic makeup of the survey respondents is different from the real population and opinion on this issue varies across the groups, the result of the survey (at least the raw result, before any statistical weighting) is different from the true opinion in the population.

In the unweighted survey, it appears that the community approves moving the start time later, with over 60% of the survey respondents choosing the option of a later start time. However, this is because white women were over-represented in this hypothetical survey. If you look at the actual population opinion, the community’s real stance on this survey question was the opposite; more community members are opposed to moving the start time than support moving it.

This is where weighting comes in; we can weight each survey respondent’s answer by what proportion of the population they represent. Respondents who are over-represented in the survey (like white women, in this pretend example) will have their answers weighted down and respondents who are under-represented in the survey (like people of color and men, in this pretend survey) will have their answers weighted up.

In effect, we can use what we know about your community’s demographics from U.S. Census data to get from a biased survey result back to the population’s real opinion, helping us make better policy decisions. In this pretend survey, that means that we find the community actually does not prefer to move the start time.

This hypothetical example is just one way that U.S. Census data might be used by one particular organization. Look for more posts soon in our exploration of survey weighting and how it works, and how you can use this to make better evidence-based policy decisions.

Make Better Decisions With Your Data

At Datassist, our goal is to help nonprofits and data journalists better understand, analyze, and use their data to create stories that will captivate and inspire their audiences. If you’d like to learn more about how we can help you, get in touch with us today.

Recommended Posts

Start typing and press Enter to search

The Margin of Error is Not What You Think It IsThe time and hassle of manual data entry can bring you down.