It’s all Greek to me.
Most of you have probably heard the expression before and understand its meaning. But did you know that similar phrases in other languages — using different languages perceived to be difficult? The Dutch equivalent is “It’s all Chinese to me,” while the French say that incomprehensible text must be Hebrew. Greeks might ask if you are speaking Turkish. The Portuguese may say you’re using Japanese or Arabic to confuse them.
Statistical terms are, in a way, a language unto themselves. Statistical reports and research studies are full of these phrases. If you understand them, they convey huge amounts of meaning and, if you don’t, they might as well be in a foreign tongue.
Trickier still is that not all research studies are the same. A growing number of data journalists and communications experts rely on high-end research studies to support their stories. (This is fantastic.) But when you quote include a researcher’s results in your story, you are implicitly adopting the author’s viewpoint and methodological choices.
What does this mean?
It means that it’s crucial to really understand the research you are citing. You wouldn’t read a report in Spanish and use your recollection of twelfth-grade conversational Spanish to include it in something you were writing, right? (Hola, mi nombre es Heather. Hace sol hoy.)
Fortunately, statistical terms are not actually a foreign language. It’s not always easy to dive into a new field. But with a bit of effort, you can become familiar with common statistical terms and use them confidently (and correctly).
An Introduction to Statistical Terms
I’ve already covered a few important statistical terms and concepts in previous posts. (I’m not going to make you listen to me repeat myself about them.) In case you’ve forgotten or missed these posts the first time around, make sure you understand:
- Ecological fallacy – Also known as “why data is not a two-way street”
- Percent change – And why it’s misleading most of the time
- Moderators, mediators, and confounders – Or the secrets to understanding data relationships
- Prediction vs. causality – The dark side of data
- Statistical reasoning – The best way to keep your data honest
Now that I’ve told you what I’m not going to talk about, let’s move on to statistical terms that I do want to explain a bit. The list of terms I’ll cover here is not comprehensive – for a complete list, I highly recommend Journalist Resource‘s piece on Statistical Terms Used in Research Studies.
The Population Being Studied
The first step in understanding the statistical terms in a study or report you’re referencing is to know who the data refers to.
The sample is pretty straightforward – when you sample, say, wedding cake, you are served a small piece of a larger cake for tasting purposes. A statistical sample is a smaller portion of a population that is studied with the goal of better understanding the population as a whole.
Samples might be random or stratified. Random sampling is going to the cake store and just tasting any piece of cake they offer you. A stratified sample is constrained by specific characteristics – if you told the baker you only wanted chocolate cake, or cakes with a fruit filling. (In statistical terms, for example, samples might be stratified by gender, age, or ethnicity.)
Applying the results of a sample to an entire population is known as generalization. To do this, the sample must be truly representative of the population.
When you use results from your sample to make generalizations about a population, it’s important to consider your sample variation. There is always a degree of variance within any population — even if your sample is completely random. To account for this, you should include a margin of error — which is not what you think it is — with your results. (The margin of error provides a general sense of how similar the people in the survey sample are to the entire population being studied.)
As a rule, the larger the sample size, the more likely it is to be representative of the population. With larger samples, the margin of error tends to decrease and the confidence level (or acceptable error rate) rises. Of course, how big your sample should be depends on a number of factors — download our handy Sample Size Calculator to get help determining the most effective sample size for your study.
Want some more comprehensive instruction on statistical terms and how to understand and use them correctly? I’ve got the solution you’ve been looking for! Sign up now for the course I’m leading with the Knight Center on Crafting Data Stories. (This course will be a little different than the MOOC I ran earlier this year with Alberto Cairo, so make sure you check it out early to avoid disappointment.)
If learning the language of data isn’t in the cards for you right now, let the team of experts at Datassist help. Get in touch today to discuss your project.