Big Data is a hot topic in the world of analysis right now. It’s been gaining popularity with corporations for a few years, and now it’s seeping into the social sector. I see evaluators, nonprofits, donors, and even governments discuss its pros and cons. Big Data is hot, and it makes big promises — and that makes you want to get in the game. But before you use Big Data, you need to be sure it’s right for you.
But you told us Big Data was great!
Don’t get me wrong. I’m all for Big Data. It can be cheaper and less invasive to collect. It can lead to the understanding of patterns we didn’t previously recognize. As with all data, there are always some technical issues — how do you protect data owners’ privacy? How do you ensure you’re actually measuring what you intend to measure? But before you use big data in the social sector, there are also important analytics issues to address. And they all come down to one critical question:
Are you trying to answer a question of prediction or causality?
Bigger Isn’t Always Better
Big Data is extremely useful when you’re making predictions. The bigger, the better. Want to understand what someone is most likely to buy from your website? Big Data has the answer. Want to determine which people in your database are most likely to contribute to your fundraiser? Big Data can be your crystal ball. Any time you’re trying to make a prediction about the future, you can pretty safely use Big Data.
But often in the social sector, our questions aren’t about predictions — they’re trying to determine cause. For example:
- Trying to understand the effects of a specific type of education
- Attempting to determine why something is happening the way it is
- Measuring the impact of your program or service
These are all questions of causality. In these instances, it wouldn’t be very helpful at all to use Big Data.
No problem, right?
Mostly no problem. The issue is that Big Data can give you the feeling that you’re doing great analysis. Your sample size is so big and you have so much data, you feel like the conclusions you’re drawing must be meaningful. In fact, there’s a fair chance they aren’t even accurate.
Why Can’t I Use Big Data?
You can use Big Data for predictive questions. If you’re looking to the future, off you go — enjoy the power and ease of Big Data analysis. If you’re trying to determine the cause of something… sorry, Big Data is not for you. Do not pass go, do not collect $200.
Big Data isn’t generally great for causal questions because:
- It’s often biased. If you want data that is a good representation of the community you’re studying, you need to collect it in a certain way. Big Data is almost never collected in a way that allows you to learn about a community. When you use Big Data, you usually only learn about the people contributing the data, and these people are often from a select group that doesn’t represent the group you’re focused on.
- Algorithms aren’t always as smart as they sound. Machine learning algorithms can’t tell mediators from confounders. In most predictive models, the best way to get a good answer is to throw as many variables as you can into the mix to get the highest rate of correct prediction. This method is terrible for causal analysis. When asking causal questions, there is no way to avoid the need for conceptual understanding and theory. Machines just can’t cut it.
- Sample size doesn’t mean what you think. The “big” part of Big Data can give the (mistaken) impression that its sheer size can eliminate bias. This is entirely false. A huge amount of biased data will simply produce a very biased, very incorrect result.
We Still Love Big Data
Please don’t misunderstand: we’ve been able to use Big Data to solve previously unsolvable problems in the social sector. Datassist has used huge amounts of NASA satellite data to help understand the impact of farming inputs franchises in Kenya. We’re also currently working on a project that uses financial transaction data in the US to help understand the impacts of transit on pathways out of poverty. Sometimes, Big Data is great.
But you need to be sure it’s the right tool for the job.
If you’re part of a social sector organization that wants to use Big Data, it’s critical that you recognize what type of question you’re trying to answer. The vast majority of training and Big Data analysis methods currently available are designed for prediction — and are hopeless at answering causal questions.
If you need help determining whether you can use Big Data for your project, we can help. Get in touch with us today to discuss your needs.