We rely on algorithms more and more. They make decisions for us. Draw connections for us. They determine if we’re eligible for more credit and they tell us what to watch next on Netflix. Most of us interact with algorithms regularly. But how many of us know how algorithms work?
What’s happening inside the system that uses our data to tell us what’s next?
This week, I’d like to show you a simplified and generalized example designed to show you how algorithms work. This isn’t an exact model, obviously — it’s just an illustration.
How Algorithms Predict Success
Many of the organizations I work with are nonprofits or government agencies hoping to use data to predict how their programs will help people. So let’s start with an example of how algorithms work to predict the likelihood a new person will succeed in our program.
We already have a bunch of data on existing participants. We have a “success score” for them, as well as some data on their characteristics. (Let’s say their age, whether or not they vote, and how they accessed our program.)
We start with an outcome variable. (Sometimes also called the dependent variable or target.) For this example, our outcome variable is the success score — how likely it is our program will help a new participant. We can write this one of two ways:
- Success Score ~
- = Success Score
We can picture it like this:
One of the characteristics we know about existing program participants is their age. Our algorithm works by looking at a person’s age and relating it to their success score. Mathematically, we can write it either of these ways:
- Success Score ~ Age
- Age = Success Score
Visually, each person is represented by a circle placed on the left-right axis based on their age and on the top-bottom axis based on their success score.
So How Do Algorithms Work?
So we’ve built this picture with our existing data. When we encounter a new person, we want to estimate what their success score is likely to be. This is where the algorithm comes in.
We want to use information about people who are close to the new person in important ways to make the best guess about this new person’s success score. But we don’t have any participants who are exactly the same age as the new person. So we use the information of the people nearest to them in age to estimate.
Let’s put the new person in our picture as a pink circle.
There are many ways to decide which information to use here and how to use it. For this example, we’re going to use one of the most common methods. We’ll take the information about the four people closest in age to our new person and combine their information to give us a good idea of the new person’s success score.
In the case, the four people nearest to the new person have success scores of 3, 3, 4, and 5.
(3+3+4+5)/4 = 3.75 (rounded to 4)
When we find the average, we get 3.75, and we round that to 4 since our success scores are all whole numbers.
But age isn’t the only information we have about our participants. We also know whether or not they voted in the last election. This can change our model:
- Success Score ~ Aged + Voted
- Age + Voted = Success Score
Let’s update our graph to add a yellow plus sign in the circles of everyone who voted. (We know our new person voted in the last election as well.) Our model has changed. So the people we are using to estimate the new person’s success change too. We are no longer just looking at age, but age and whether or not they vote.
This means that the four closest people to our new person are not the same four people.
In this new algorithm, the four people nearest to the new person have success scores of 3, 3, 4, and 1.
(3+3+4+1)/4=2.75 (rounded to 3)
This new model predicts a Success Score of 3 instead of 4.
Let’s make one more change. We also know how our participants accessed our program — through a mobile device, a laptop, or in person. Our algorithm is getting more complex:
- Success Score ~ Age + Voted + Access Mode
- Age + Voted + Access Mode = Success Score
In our visualization, we add a maroon outline for people who used a mobile device, a blue outline on people who used a laptop, and no outline for people who accessed our program in person.
Do you see how algorithms work to estimate the success score of the new person?
It’s getting complicated. The new person used a laptop to access the program, but the four closest people to them did not. So now we’re looking for an entirely new set of four close people on which to base the new person’s estimated success score.
But there aren’t four other people in our program who both voted (yellow plus) and used a laptop (blue outline). There are only two.
If we use only these two other people, our estimate is:
(1+9)/2 = 5
But this guess is based on very little data. We might not feel confident saying this score is a fair estimate of how the new person will do. So what if we try using the two exact matches and the next two closest people who match in any way?
Now our estimate will be:
(1+9+3+3)/4 = 4
The more data we have, the more accurately we can predict the likelihood a new participant will be helped by our program.
Want to Know More?
Of course, this is a very simplified demonstration of how algorithms work. We used a few data points on a relatively small number of participants. Computerized algorithms can handle much larger datasets and produce results very quickly. But it’s important for us to understand how those algorithms work — especially when we’re basing potentially life-changing decisions on them.
Want to know more about algorithms and data analysis for nonprofits? The team at Datassist is here to help. Get in touch with us now.