If you were inspired by last week’s post on Big Data projects that are going to change the world, you might be thinking it’s time for your own organization to hop on the Big Data bandwagon. And you might be right… notice I said “might.” Before you dive into this brave new world, it’s important to recognize potential problems with Big Data.
But you said Big Data could help me change the world!
I did, and it can — in some cases. But there are situations where Big Data can do more harm than good. To paraphrase an old saying: just because all you have is a hammer, doesn’t mean everything is a nail.
In this post, I’ll outline several Big Data problems that crop up in the social sector, including:
- Privacy and civil liberties concerns
- Hidden bias
- Causation fallacies
- The extractive nature of Big Data
Let’s get started!
Concerns About Privacy and Civil Liberties
The dangers in managing, using, and manipulating Big Data are real and concrete. In addition to the efficiency and effectiveness using Big Data can bring, Big Data problems can include discrimination, violence, and persecution of vulnerable groups if private data falls into the wrong hands — for example, when using Big Data to support refugees.
The malevolent online practices of state and non-state actors can reach refugees extraterritorially or in their country of origin. The Harvard Humanitarian Initiative suggests that data leaks can increase the risk of discrimination and harassment due to “racist, ethnic, and economic tensions that position them as an unwelcome monetary and administrative burden on the host state.”
If your organization is considering a Big Data project, ensure you have appropriate privacy safeguards in place to protect the people whose data you are analyzing.
Everyone knows that biased data is bad data. But one of the most serious Big Data problems is that it gives users a false sense of security — that the sheer quantity of data will mitigate biases. Because Big Data automatically skews results towards the digitally visible, it often exaggerates biases, rather than eliminating them.
It sounds wrong, right?
Think of it this way: Imagine you’re in the Toronto Blue Jays’ dugout, taking a poll on who is the better baseball team — the Blue Jays or the Detroit Tigers. Asking twenty people in the dugout is not a better poll than asking two people in the dugout. In fact, if you could ask ten thousand people in the dugout, it still wouldn’t improve your results. The inherent bias in this situation cannot be overcome, no matter how much data you collect.
Most Big Data analysis is based on looking for correlations — a specific type of statistical relationship — between the many trees in the Big Data forest and using those correlations to define patterns in results. Another problem with Big Data, however, is that the higher the number of variables in your analysis, the higher the chance for spurious correlation.
It’s a bit like throwing a thousand darts at the side of a barn, drawing a ring around the darts, and then claiming you hit the bullseye. And worse still, there is no math or analysis you can use to distinguish a real and meaningful correlation from a spurious one. A human with a strong grounding in the concepts and theory might be able to — but by definition, Big Data is too complex for most humans to analyze. (You can see the dilemma.)
Another conceptual problem with Big Data use crops up when choosing which variables to include in your model. In most Big Data methods, the computer makes that decision — based on the correlations I just talked about. While this is generally ok for models answering predictive questions, it is most definitely not acceptable if you are trying to answer causal questions. For this type of analysis, confounders must be included and mediators must be excluded — regardless of the level of correlation — or your results will, quite simply, be incorrect.
In contrast to conventional survey data where definition and data collection processes are well understood, many Big Data problems result from the fact that it is collected via complex and frequently updated algorithms to which many researcher lack access and insight. This lack of transparency makes it challenging to know how comparable data is over time, and the quality and precise definition of the data is often not fully understood.
To make matters worse, the exact nature of the assumptions and choices that algorithms make (on both participants’ and researchers’ behalf) are even less frequently understood — and often not disclosed at all, hidden behind a proprietary curtain.
What does all this mean? In short, it means that even expert analysts can’t always fully understand Big Data — and how can you make decisions based on assumptions and choices you can’t even see?
Big Data’s Extractive Nature
The final problem with Big Data I want to cover is less of an operational problem and more of an ethical dilemma. Big data can quickly and easily become an exercise in harvesting, where high-status stakeholders collect data cheaply and remotely from lower status “beneficiaries” — while very little benefit actually reaches those individuals or communities. (In these cases, the “success” of the project is determined outside the community it’s meant to serve.)
“Big Data approaches face structural deficits related to a lack of focus on the end user or beneficiary. Unchecked, there is a danger that this will further exacerbate the levels of mistrust in fragile contexts.. Any attempt to enhance M&E with Big Data approaches will therefore require a stronger focus on beneficiary validation through feedback loops that consistently secure beneficiaries’ participation.”
While data can be valuable, it’s critical that social sector organizations using Big Data don’t lose focus on their ultimate goal.
Are You Facing Big Data Problems?
If you’re part of a nonprofit, government agency, or other social sector organization that is interested in leveraging Big Data, but you need help avoiding the pitfalls, we’re here to help. The experts at Datassist can help you determine if Big Data is right for you and set you on the right path to leverage data and make a difference in the world.
Get in touch with us today!