If you’ve ever had to weigh costs against results, you probably know how important it is to consider sample size. The more accurate your sample design and size calculation, the more accurate your results will be. (And sample size optimization can also help you keep costs down. Who doesn’t want that?)

Contrary to popular belief, more data is not always better. Sometimes less data, collected carefully, is a vast improvement over a large sample you don’t know much about. An overly large sample size wastes time and money. In some cases, it can also raise ethical concerns. On the other hand, a too small sample might cause you to miss effects that are really there. Only with the correct sample size do you get the most bang for your buck in terms of accurate data collection and analysis.

The main objective of sample size optimization is to gather the most efficient amount of data you can while maximizing the knowledge it will provide. Here are a few easy steps to make sample size optimization easier.

# Step 1: Define Your Objective

This doesn’t need to be a taxing task. Just write down a couple of sentences on what you want to measure and what change you’re expecting — or hoping — to see.

Describing what you think will happen is a vital piece of the puzzle in sample size optimization. It can take a little time in the beginning. (Many of us are used to letting statistical tests do the thinking for us.) Here are a few examples of measures you might aim to capture with your survey:

- A
*change in an amount* - A
*change in an average* - A
*difference between two groups* - A
*change in the proportion of the population*

The nature of what you need to measure in each of these questions is different. So the sample size calculation for each must be slightly different as well. That’s why spending time on this step is so important.

Often, a project will include a range of sub-research questions. But when you determine sample size, ** you must choose one main research question and express it in a concrete, detailed, and specific way**.

# Step 2: Determine Your Dependent Variable

From the objective that you defined in Step One, you can figure out what your dependent variable is. (That is, the variable that is being measured in your survey.) Determine whether it’s **binary** (has a yes/no answer) or **continuous** (is measured on a scale, like 1 to 10).

- If it’s binary, you’re likely going to calculate probabilities from a logistic regression or proportions from a similar analysis. (ie., 5 out of 10 members in the control group used the new methods, while 8 out of 10 members of the treatment group did.)
- If it’s continuous (ie., litres of milk produced on a dairy farm), you’ll probably calculate increases, percent change, or amount changed from the baseline between treatment and control groups.

# Step 3: Decide on Your Margin of Error

This is also called the **Confidence Interval**. This is the numbers show on poll results as plus or minus 2%, for example. The margin of error is the likelihood that the results from your survey are close to what’s actually happening in your population. The larger the margin, the less confident we are in the results.

- If you’re ok with a larger margin of error, you can get by with a smaller sample size
- For a smaller margin of error, you’ll need to increase the sample size

There is no “correct” margin or error. The traditional margin of error is 5%, but as long as you report your margin of error with your results, you can go higher or lower.

# Step 4: Decide on Your Significance Level

This is also often called the **alpha**. It refers to how likely it is that you will **not** detect a real trend difference, even when one exists in your population. Researchers traditionally set this at 95%, which gives significance in results with a p-value below 0.05.

A **p-value** measures probability on a scale of 0 to 1. This allows you to measure how extreme a statistic is for a particular sample. You can move this number up and down as needed, as long as you say what you’ve done in your reporting.

- Smaller p-values make it more difficult to achieve “significance”. Small values make the required sample size larger.
- Larger p-values make it easier to achieve “significance”. Large values make the required sample size smaller.

# Step 5: Decide on the Necessary Level of Power

One of the key reasons for sample size optimization is to determine the **statistical power** you need. In brief, statistical power is the strength of your data to provide accurate results.

The power of a test is also related to how likely it is that your research will capture what is truly happening on the ground. The power level of the sample can be thought of as the likelihood that you will identify a significant effect or trend when one exists. Unless you have a strong understanding of what adjustments to make, it is recommended to use the default value of .80.

# Step 6: Estimate the Current and Expected Level of Key Indicators

- Detecting a small effect is difficult to identify and needs a large sample
- Finding a large effect is easy to identify and needs a small sample

For example, in our work on Strengthening the Dairy Value Chain, our main objective was to increase household dairy income by 75% between the beginning and the end of the project. To calculate our sample size, we needed an estimate of milk sales income at the beginning and end of the project. We also needed to estimate the standard deviation for this number.

Sometimes you can get this information from your own knowledge or the people you’re working with. You may also get this information from research in similar areas. If you really have no reliable source of this data, you can estimate it.

# Step 7: Estimate Your Response and Attrition Rates

Not everyone you ask to take your survey will be available or agree to do so. Also, if your survey includes asking questions to the same people over time, some of these people will not be available at the later survey rounds, either because you lose contact with them, they drop out of your project, or they are simply unavailable.

You must estimate the percentage of your sample who will either not respond or drop out. (A typical loss rate is around 20%.) The average rate can vary a lot, depending on the population. The more mobile and changeable your population, the more likely you are to lose respondents. A realistic estimate is important for sample size optimization. Your calculations won’t be effective if the sample you manage to collect on the ground is much smaller than what you need.

# Ready to Get Started?

These seven factors will dictate sample size optimization for any design. It’s important to consider all these factors together to achieve a balance and ensure your objectives are met.

To make the whole process even easier, we’ve created the Datassist Sample Size Calculator to help you get an idea of the appropriate sample size for your work. (This sample size optimization exercise is largely based on a basic random sampling methodology, with a minimum design effect.)

Planning to use a different methodology and need help? Finding your sample size too big for your budget or staff resources? Talk to the experts at Datassist. We can help.