Subscribe To Our Newsletter

Get tips and tools to tell your data story better.

No, thanks

 In Case Studies, Experts

This post was originally written for the blog over at We All Count, a project for equity in data science. We’re working to demystify and democratize data and demonstrate how we can make data more equitable for everyone.

I talk a lot about ethics in data science — and with good reason.

Data science ethics questions are making headline news more and more often. Think of the Cambridge Analytica scandal. We were all a little shaken by that story. And that incident was bad for the entire sector – not just Facebook and Cambridge Analytica. Consider, for example, how much time and money Oxbridge Analytica (another firm with a similar name but no connection to CA) had to put into shoring up their reputation even though they hadn’t done anything wrong (that we know about).

Oxford Analytica has no connection to Cambridge Analytica — and they want to make that clear.


It’s Not Just About Privacy

But ethics in data science are more than just a good idea. Ethics are essential for your organization — and your bottom line.

And data ethics are about more than just privacy.  Ethics in data science must be considered and included in every one of the seven steps of the data lifecycle. Data experts and publications tend to focus most on privacy because it’s often so much easier to address privacy issues than it is to deal with other aspects of incorporating ethics into your data products.

We worked with a financial institution using demographic data to determine if people qualify for credit.


Ethics Can Hit Your Bottom Line

It’s not always easy to incorporate ethics into data science. It takes some work, but the effort will pay off. A large financial institution recently hired our team. They needed help fixing the machine learning algorithms they used to make decisions about consumer credit.

The company included a parameter for an applicant’s immigration status and immigration type as part of the process of ranking applications. The “learning” this algorithm had done was on historic data. And that old, out-of-date data told the company that people of specific immigration profiles were a bad risk. It said they shouldn’t be offered good credit.

One of the variables was immigration status and immigration class (refugee, family reunification, business, etc.) This variable was associated with a negative coefficient.

Based on data of the previous generation. An entirely different set of social and international circumstances. Unfairly and mistakenly losing a large potential customer base.


This decision was using assumptions from the data of a previous generation. A generation with a vastly different set of social circumstances in a time when the world was very different. By relying on this out-of-date data, they were unfairly denying certain groups credit — and inadvertently rejecting a large potential customer base.

Fortunately, someone in the organization realized this was an unethical decision — and that is was probably hurting their bottom line. And they were correct (on both counts). Not only is it unethical to make decisions about people’s lives based on outdated social profiling, but it is terrible for profits. But even though they recognized the problem, this institution didn’t know how to fix their system.


Incorporating Ethics in Data Science

Because that organization realized they had a problem with their system, we were able to help them incorporate ethics in data science into their algorithms. Obviously, we can’t go into the details of the project, since they’re proprietary. But essentially, we took several steps:

  1. First, we ran the original model with the old data, dropping and adding several of the socio-demographic parts of the model to see what happened.
  2. Next, we simulated several datasets based on current social and economic situations rather than the old information they were working with.
  3. Then we ran these datasets through the original model and the versions of the model from the first step.
  4. We analyzed the results of steps 1,2,3 and used our analysis to rebuild a model that includes relevant parameters with weights that are current.
  5. Finally, we built a system that would be much easier to keep current.

The institution can now move forward using their algorithms to process credit applications, knowing they are not acting unethically — or losing potential good customers.

Want to learn more about ethics in data science? Need help addressing an ethical issue with your organization’s data? We can help. Contact us today.


Recommended Posts

Start typing and press Enter to search

A missing data biography — especially from a well-known source — is troubling.