Imagine you’re a social worker. You get hundreds of text messages each hour from teenagers struggling with mental health issues; it’s your job to provide the support they need. If the young person reaching out is having a down day, it’s probably okay for them to wait a few minutes for your attention. However, if they’re seriously contemplating suicide, one minute might be too long to keep them waiting. How do you decide?
Data mining can help.
How Does Data Mining Work?
Data mining is a technique that sees a lot of use in the corporate and technology sectors. But it also has the potential to be very valuable to those of us in the social sector.
At its most basic level, data mining is the process of extracting useful information from your data. It’s often associated with very large, very complex datasets — much less manageable than a simple Excel sheet — but it doesn’t have to be. What it’s really about it finding relationship patterns within your data that aren’t visible to the naked eye.
Data mining begins with looking through your data, identifying and testing relationships — as many as you can find — within that data to figure out which relationships are occurring repeatedly. Once you’ve found frequently occurring relationships, you need to test if they are statistically significant. Hold on to the relationships that seem powerful, and repeat the process from the beginning — many, many times.
Crisis Text Line
Crisis Text Line is a great example of how data mining can be used for good. The team at CTL have the largest open-source database of youth crisis behaviour in the country, and have used data mining to dramatically shorten crisis response times. Where high-risk texts previously waited 120 seconds for a response, Crisis Text Line has shortened that to 39 seconds.
Using data mining, the Crisis Text Line team identified the word “ibuprofen” as sixteen times more likely to predict the need for emergency aid than the word “suicide.” Using their results, messages containing the word “ibuprofen” are now prioritized in their queue.
How did they do that?
CTL used a data mining program to search through all the information gathered on their incoming messages. It uncovered a strong relationship between the word “ibuprofen” in texts and the texter seriously contemplating suicide. In fact, “ibuprofen” occurred more frequently than “suicide” in messages from people seriously considering ending their lives. The relationship isn’t an obvious one. It probably never would have been found without a data mining program.
Could You Use Data Mining?
Data mining can be a valuable tool for managing funder and donor relationship building, volunteer management, and client satisfaction.
When donor records are combined with social media data, predictive models about who will give, (as well as when and how) can be alarmingly accurate. GiveNext.com lets organizations manage all their giving on a single online platform. Users comb through a database, donate to any of a million-plus nonprofits, and receive a tax form in return.
If you want to attempt some data mining, you’ll need some data (obviously!) and a tool. The best way to start is with data you’re already familiar with — data from Google analytics, your social media trackers, program data, or data on your donors all work.
Once you’ve selected your data, you’ll need to find a data mining tool:
- My favourite tool for data mining is R. It does require some programming skills but once you’ve learned the basics, it’s incredibly powerful. DataCamp offers a great R tutorial and R Studio offers lots of free resources too.
- Rattle is another great tool. It can help you get started with both R and data mining. (When computer scientists say “machine learning,” they almost always mean data mining.)
- Excel also has some data mining extensions you can install. You can find them on the Microsoft site here. Steve Fox has published a helpful video on getting started with them.
When You Can’t Use Data Mining
It’s important to remember that, although data mining can be a powerful tool, it’s not always the right one. It looks for significant relationships by testing all the relationships it can find. Which means it’s great for predictive analysis, but not at all good for causal analysis.
Trying to use data mining to analyze causal relationships can lead you to some wildly inaccurate conclusions. As an example, Google built a data mining tool to help predict flu outbreaks based on Google searches. Oddly, one of the terms that was highly linked to the flu was high school basketball. Obviously, there is a hidden variable here — high school basketball season and peak flu season coincide. But data mining might tell us there is an important connection between the two.
Need Expert Data Analysis?
Do you need help leveraging the power of data mining? The experts at Datassist are at your service. We work with journalists, nonprofits, and social sector organizations to collect, analyze and report on data in an engaging, educational way. Get in touch with us today to learn more about how we can help you.