One really common misconception is that the size of your population is closely related to how big a sample you need. Most of the time, the size of your population of interest doesn’t affect your sample size.
It sounds like a controversial statement. But it’s true. More than one of my clients has looked like they wanted to fire me when I said that. Population size is often the first — and sometimes the only — thing clients want to tell me when I help them prepare to take a sample. And intuitively, it makes a lot of sense. So why am I saying now that sample size isn’t that important?
Recently, Heather, Alberto Cairo, and Nate Silver had a conversation on twitter about a study they were having a difficult time believing. The research included many interesting issues, but eventually, the conversation turned to whether or not the sample size was large enough to make the claims being made. Some of the replies we got reminded me that there are a still lot of outstanding questions and confusion about sample size.
How to Determine the Appropriate Sample Size
Let’s say you want to estimate a population average at plus or minus 3% with a 95% confidence interval. Talking to about 1,000 people in your random sample is going to do the trick. Whether your population is a school of 2,000 or a country of one million, that sample size is adequate.
If the sample size doesn’t matter, what does?
There are a few different characteristics about your population you should know to determine your sample size:
Indicator of Interest
Say we’re examining trends in households linked to the formal banking center. We’ll need a rough estimate of what proportion of our population is, in fact, linked to the formal banking sector. This is the indicator of interest. The number doesn’t have to be exact; an estimate is fine. (If you can’t find an estimate from a similar study or wisdom in the field, use an estimate of 50% in your sample size calculations.)
Level of Variability
This is also referred to as the level of variability. It’s the range of values in the indicator of interest. Say we want to study herd sizes, and most people in our population have either one or two cows in their herd. We can say there is low variability in the population. If we explore changes in income level, and annual incomes in our population range from $1,000 to $2.5 million — we have a population with high variability.
The higher the level of variability around our indicator of interest, the larger our sample size needs to be to provide an accurate picture of the population.
I wrote a post about confidence intervals not that long ago. The confidence level of survey results is a way of denoting the acceptable error rate. It tells us how likely it is that the results from a sample fall within the associated precision.
To achieve a higher confidence level — a greater certainty that results are typical — you’ll need to increase your sample size.
Most surveys use a 95% confidence level. However, this is largely traditional — there’s no scientific basis for that number. Lower confidence levels might be just fine for your survey, depending on the topic and what you plan to do with the results. Of course, there are other factors to consider that will require a more sophisticated sample size calculation:
- If you want to make estimates about specific subpopulations
- Whether or not certain groups are more likely to be highly similar or diverse
- What times of clusters exist in your population
Ready to Calculate Your Sample Size?
If you have a reasonable idea about the size of the attributes we’ve discussed, you can use this table to get a fair estimate of the sample size you need.
For those of you who would rather calculate your sample size manually, here’s a quick list of steps to guide you through the process. (Check out our post on sample size optimization for more detailed instructions.)
- Define your research objective or primary question. What would you like to measure and what change are you expecting or hoping to see?
- Determine what you’re measuring. This is called the dependent variable. Is it binary (yes/no) or continuous (on a numbered scale)?
- Define your Confidence Interval.
- Determine your Significance Level. This measures how likely it is you will be unable to detect a trend, even if one exists in your population.
- Decide on the necessary level of Power. Statistical power is the strength of your data to provide accurate results. (Unless you have a strong understanding of what adjustments to make, use the default value of .80)
- Estimate the current and expected levels of key indicators.
- Estimate your response rate and attrition rate. Not everyone you ask to take your survey will respond. And if your survey includes asking questions to the same people over time, some may answer the first round and then stop responding.
Need help calculating sample size? The team at Datassist is always here to help.