You have invested a ton of time and money and energy into designing your data project. You’ve got data collection tools, analysis plans, and an important research question. Enter a large, completely disruptive event. Your data project — whether it’s Monitoring and Evaluation in the social sector, evidence-based policymaking in the government sector, UX or product design research in the corporate sector — has fallen off a cliff. What can you do? After you unroll from the fetal position, there’s actually a lot you can do to salvage and even improve your data products during a major disruption.
I’ve got some tried and tested strategies to help get your data project back on track in the time of COVID-19 and all the related impacts. After many years running a data consulting company, I’ve been in data disaster settings before. My team and I collaborated with large foundations and corporations during the earthquake in Haiti to develop new tools for urgent data collection and new methods to preserve the value of the data that had been collected prior to the earthquake. We worked with international development organizations and non-profits on the ground during the hartals (social unrest) in Bangladesh to find vital new ways to answer emerging questions with data we had collected for different purposes and found ways to continue to collect reliable data amidst almost total shutdowns. In Pakistan during the floods, we collaborated with the government and local associations to build new statistical models to use existing databases, that weren’t in great shape, to focus on keeping the flow of social response knowledge flowing.
Let’s look at how we can get continue to build useful, ethical, and feasible data products given the current state of the world.
When we encounter a major disruption in our data projects, we need practical, manageable and rigorous first steps towards recovery. The goal is to retain as much value in the data you currently have and analyze and understand it in ways that make sense now. It helps to start by identifying three basic elements. Your current state in these three is the start of your map forward.
- How is your research question holding up?
- What ways can your current data collection design and method be adapted?
- Are your analysis plans still appropriate and feasible?
Change data collection mode
If you haven’t collected much or any data, you may have the most flexibility in adaptation methods. First, consider changing your data collection model. In the time of COVID-19 focus groups, in-person UX interviews, and door to door data collection is not a good option. There are lots of tools that can collect good, reliable data at a distance. Use text message data collection tools, use mobile phone digital surveys, use Google forms. One option we’ve tried with great success is to go back to phone data collection. Often this can serve two purposes — data collection and personal contact support.
Change data collection design.
The design for how you were going to collect your data is usually some kind of sampling plan. This is the way you had hoped to collect data in an intentional, organized way in order to ensure that the people represented in your data are the people you need to answer your questions. Most of these designs, like RCTs (Randomized Control Trials) and stratified or clustered sampling are most likely no longer an option. Two designs we have used with success in disaster or conflict situations are snowball sampling and incentivized sampling. If you’re mid-way through data collection, decrease your generalized sampling efforts and increase focus on collecting the last bit of the most important remaining sample. Run a flash analysis of existing data to guide you towards what sample you still need the most.
Change data collection tool design.
Once you’ve adapted your data collection mode and design, you’ll be in the place to reconsider the data collection tool you’ve planned to use. I’ll let you know right up front from years of experience — it’s probably too long. Reduce the number of questions. And change the type of questions. If you haven’t collected much data, you can change these without losing data. If you have collected data, consider doing a very quick high-level analysis of your existing data to see what the top-line trends are. This can help you see what questions or parts of your tool are yielding the most useful information. Keep those parts. Discard the rest. If you’ve finished collecting your data, go back into your data and record all specific time, date, and geography in as much detail as possible. You’re going to need this to adjust for the disaster impact when you get to the analysis stage.
Find other sources of data.
In times of conflict, disaster, and pandemic collaboration is always key. First off, we are pretty much all in this situation together. Talk to your networks, your funder’s networks, your company’s other branches and partners. See who has data that might be able to fill holes in your data and vice versa. There are a lot of regulations about privacy that need to be respected and within these bounds, there’s a lot that can be done through careful and generous data sharing. This is also a time to consider purchasing data particularly if you are no longer able to spend parts of the budget on collecting data. Open data can also be a valuable source to help build the strength of your datasets. Remember that whenever you’re getting data from an external source, it’s essential to vet it through a data biography.
Are your analysis plans still appropriate and feasible?
At the analysis stage in a time of upheaval, you’re likely to face a number of Issues. The most common of these are: your sample is smaller than you had planned, the sample will not be representative of who you had hoped to understand, the data was probably not collected at the speed or in the timeline you had planned, and the data may be missing an endline or a “post” round of collection.
If your data sample size is too small there are specialized analysis techniques you can use to optimize the type of information you need from it. To squeeze out as much knowledge as you can given the situation you’re in. What technique to use will depend on your unique situation but some include Bayesian techniques and small area estimation techniques. You can also combine your data with other datasets and use analysis techniques that adjust for this fact, these techniques are often Bayesian as well.
Samples that are not representative are more common than most of us would like to admit even in times of less crisis. Yeah, your RCT is probably over but you can still pivot to using methods designed for causal analysis with observational data. Depending on the situation, mixed-effects models, causal diagrams, bayesian analysis, some types of matching analysis, are good places to start.
Mixed-effects models (also called hierarchical linear models or multilevel models) are also very useful for data that was not collected on the time frame you had planned. Data like this often become unbalanced and these techniques can help with this.
Losing your endline, post, or follow-up data is one of the most challenging data disaster situations. This is another situation that mixed-effects models — particularly the nonlinear kind — can help you estimate what would have happened at the end. You can also look into meta-analysis. Ultimately, this may lead you back to step 1 and adjust your research question.
You’ve invested a lot of time, energy and resources in your data project. While it may not turn out exactly as you were expecting it to, there are still lots of ways to harness the power of what you have in emergent situations and answer meaningful questions.