Subscribe To Our Newsletter

Get tips and tools to tell your data story better.

No, thanks

 In Data Analysis Concepts Simplified, Data Journalism, Data Storytelling

Do you know what confidence intervals are?

If you said no, you’re not alone.

If you said yes… how sure are you?

We hear about confidence intervals all the time when discussing statistics and data analysis. It’s one of those terms that’s become so ubiquitous, no one wants to ask what it really means. We know the words, but what exactly they mean in that context isn’t always clear. To get down to the nitty-gritty, there is a lot of confusion about what a confidence interval actually is.

 

So What Are Confidence Intervals?

In the simplest terms, confidence intervals are used to communicate how strongly we believe in the accuracy of our result. (How confident we are.) It’s the range (or interval) of results that we believe contains the correct answer.

For example, imagine we’re trying to figure out the typical amount a local household spends on groceries. We do not have the resources to ask every one of the thousands of households in our district. So instead, we conduct a survey. And we use the results of that survey to estimate the typical amount. Obviously, if we asked two households, the likelihood of us estimating accurately would be low. The more households we ask — the larger our sample — the more accurate our estimate.

 

What Confidence Intervals Aren’t

Using the example above, let’s say that we estimate the typical household in our district spends $100 per week on groceries with a 95% Confidence Interval.

What does a 95% confidence interval mean?

What does that mean?

One of the most common interpretations of that statement is that we are 95% sure we have the correct answer. Basically, the percentage of the Confidence Interval is how certain we are that we’re right.

Another popular interpretation of this Confidence Interval is that we are sure the real answer is $100 plus or minus $5. (Basically, that our guess of $100 actually means between $95 and $105.)

Neither of these interpretations is correct.  

Surprised? You’re not alone. As I said, these are two of the most common interpretations of a concept that is really not very widely understood. And both of them sound pretty reasonable. But that doesn’t make them right.

What a 95% Confidence Interval means is that we are certain that 95% of Confidence Intervals we build in exactly this way will result in the correct answer. We feel confident in the process we used to get this estimate — but not specifically confident in this particular answer. There is always a (very real) possibility that this particular interval will be among the other 5%. The percentage that doesn’t contain the right answer at all.

 

Why is This So Confusing?

Does this seem really confusing — and not very useful? That’s because it is confusing (although it can be useful). A Confidence Interval is a frequentist concept. (I know, I’m not making it less confusing.) Frequentism is a philosophical interpretation of probability — and most humans don’t think within a frequentist paradigm. (Because we’re hearing more and more questions about this in our work, we’ll be following up with a post on the Bayesian paradigm soon.)

Don’t feel bad if you’re still struggling to understand this. It’s not an easy concept to grasp. But understanding the basics and recognizing that you didn’t know as much as you thought you did are great first steps. Want to learn more about Confidence Intervals? Need help determining if you’re using them correctly? We’re here to help. Drop us a line today.

Recommended Posts

Start typing and press Enter to search

Have you considered the human experience in your data?