How to overcome the potential for unintended bias in data algorithms

How to overcome the potential for unintended bias in data algorithms

Anyone that has been online recently may have heard that some scary, biased algorithms are running wild, unchecked. They are sentencing criminals, deciding who gets fired, who gets hired, who gets loans, etc.

If you read many of the latest articles and books, it is natural to have a visceral negative response. Of course the prospect of racist, sexist robots making important decisions that affect people is terrifying.

While some of the media frenzy is warranted, these issues are not always so clear-cut. Like people, algorithms should not be stereotyped.

Algorithms have the potential to help us overcome rampant human bias. They also have the potential of magnifying and propagating that bias. I firmly believe this is an issue and it is the duty of data scientists to audit their algorithms to avoid bias.

However, even for the most careful practitioner, there is no clear-cut definition of what makes an algorithm “fair.” In fact, there are many competing notions of fairness among which there are trade-offs when it comes to dealing with real world data.

Let’s talk about three types of algorithms:

The less obvious cases described in #3 can get very interesting and controversial. Not all of these algorithms are running wild unchecked, and some have issues that are not the fault of the algorithm, but simply a reflection of what the world is like. How much are the things that matter in making the decision tied to demographic class?

Let’s say I run a bank and I don’t want to give a home loan (which we will assume is several hundred thousand dollars) to anyone who makes under $15,000 per year, this is a very simple algorithm. Most of us can agree that income is an important factor in the loan decision, but this will lead to varied treatment of different classes since income levels are distributed differently among ethnicity, gender, and age. If the outcome of my decision is that a smaller percentage of one group gets loans compared with another, many people would argue the simple algorithm is unfair.

What makes an algorithm “fair?” Let’s say I have a lot more data besides income - things like credit score, job history, etc. I have a large dataset of past outcomes to train an algorithm for future use.

Aiming for accuracy alone will almost definitely result in different treatment of people along age, race, and gender lines. To be fair, should I aim to approve the same percentage of people from each class, even if that means taking some risks?

Alternatively, I could train my algorithm to equalize the percentage of people from each class that get approved who actually paid back their loan (the true-positive rate which we can estimate from historical data).

A bit of a catch - if I do either of these things, I would have to hold the different groups to different standards. Specifically, I would have to say that I will issue a loan to someone of a certain class, but not to someone else of a different class with the exact same credentials, leading to yet another unfair scenario.

To see how this works with some data, I highly recommend playing with this interactive site created by Google with some artificial credit score data. When determining who gets a loan given that there are two subgroups with different credit score distributions in the data, there is no way to win.

Specifically, there is no situation where you can hold everyone to the same standard (credit score threshold), while also achieving the same approval percentage in each group and the same percentage of true positives (people who should get loans who actually get one).

Data can be biased because it is not diverse or representative, or it can just be “biased” because that is what the world is like - what we call that unequal base rates in the data.

Algorithms trained to associate words by their meaning and context (like word2vec) do not highly associate “woman” and “physicist.

Share it:
Share it:

[Social9_Share class=”s9-widget-wrapper”]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

You Might Be Interested In

4 Ways How Blockchain Will Change the Retail Industry

12 May, 2018

The retail industry has become increasingly complex in the past decades. Products are made in one part of the world, …

Read more

The Line Between Data Lakes and Data Warehouses Is Blurring. Will It Disappear?

27 Sep, 2020

How much does it cost to get medicine to market? It’s a fiercely debated figure, but one oft-cited estimate puts the …

Read more

Safeguarding Your Career in the World of Automation

3 Mar, 2017

“Data scientist” continues to be recognized as a top career, but does this mean unending spoils for the data scientist? …

Read more

Recent Jobs

IT Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Data Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Applications Developer

Washington D.C., DC, USA

1 May, 2024

Read More

D365 Business Analyst

South Bend, IN, USA

22 Apr, 2024

Read More

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

3-steps-to-drive-analytics-adoption

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.