Relationships are complicated. Scratch that; relationships are very complicated. The number of relationship options I have to choose from on everything from my tax return to my social media status sometimes makes me feel like things are getting a little out of control. Facebook has simplified the choice by giving us all the obvious answer: it’s complicated. What relationship isn’t? So isn’t it nice to escape on occasion to the world of numbers, where data relationships are straightforward and nuance is left at the door?
I have bad news. (I’m sorry.)
When dealing with statistics and data, relationships between your variables can be surprisingly complex. As with your Facebook relationship status, it’s complicated. Rarely is the relationship between the numbers on your charts as simple as drawing a straight line from A to B. Are your variables moderating, mediating, or confounding?
As part of the follow-up to our recently-concluded online Data Journalism and Storytelling course, I’d like to help you examine a few of the complicated data relationships you may encounter. Like last week’s post about the ecological fallacy, understanding these data relationships is crucial to data journalism and something I hope all our students will take away from the course.
As a quick refresh for those who participated in our course, or some insight for those who didn’t, here is the video from Week 5 where I discussed data relationships:
Data Relationship Type 1: Moderators
A moderator is a variable that can affect the strength of the relationship between two data sets you’re examining — it can moderate the relationship. Using our video’s sunscreen-cancer example, skin pigment or skin type can be a moderating variable in the relationship between sunscreen and cancer. While people with one skin type might have an increased risk of cancer when using sunscreen, people with another skin type might not see their risk increase it all.
Remember, moderators never explain a relationship — but they can change it. Boris Blumberg illustrates it well using a different analogy:
Consider the relationship between the sight of a well-grilled juicy steak and feelings of joy. For many, there appears to be an obviously positive relationship. But what about for vegans? Or Hindus? The relationship between the steak and joy will appear negative for those groups. Belonging to a group that doesn’t eat red meat is a variable that moderates the relationship between the sight of steak and the feeling of joy.
Data Relationship Type 2: Mediators
Unlike moderators, mediating variables do not change data relationships, but can instead shed some light on the details of the connection between data sets. In the video, we considered that perhaps harmful chemicals in certain brands of sunscreen were carcinogenic — a fact that would give us more insight into the relationship between sunscreen and cancer. Mediators help explain why the relationship we have observed is occurring.
Let’s look at another example:
Mark James Kelson cites an example where a relationship is identified between women living in poverty and children with low birth weights. A positive relationship seems apparent — poorer women have smaller babies — but how does that happen? The connection between biology and finances doesn’t seem clear. The mediating variable here is that women living in poverty are less likely to be well-nourished or have access to the healthcare and vitamins than their wealthier sisters. The mediator doesn’t change the relationship but gives us more insight into it.
Data Relationship Type 3: Confounders
A confounding variable is one that, well, confounds the your data relationship. It mixes it up, so it’s difficult to distinguish how your datasets are actually related. In the video example, we considered that people with a higher propensity to have cancer might also be more likely to wear sunscreen. In this case, there is no direct relationship between the sunscreen and cancer at all; but since we can’t see from our data who already has an increased cancer risk, it’s difficult to tell what the real relationship is. People with a higher risk of cancer are a confounding variable in this case.
There is a classic example of confounding variables which connects murder and ice cream:
Statistics show us that, throughout the year, murder rates and ice cream sales appear to have a positive correlation. So does eating ice cream make you a murderer? Does the news of nearby murders send us in a frenzy to Ben and Jerry’s? In fact, deeper investigation has shown us that when temperatures rise, so do both homicides and ice cream sales. Temperature here is the confounder that leads us to suspect a direct data relationship where none exists.
Show Your Data Some Love With Datassist
Not sure if your datasets are married, divorced, FWB or BFFs? Hopefully, this post has given you a little more insight into data relationships and how to determine what’s really happening with your numbers — but if you’re still stumped, the team at Datassist is always here to help. Data analysis and visualization is what we do best, so check out some of our work or get in touch today.