Fake Data Could Help Solve Machine Learning’s Bias Problem—if We Let It Blog

Fake Data Could Help Solve Machine Learning’s Bias Problem—if We Let It

by 7wData
September 21, 2020

Data is the lifeblood of artificial intelligence, and despite estimates that the world will generate more data over the next three years than it has in the previous 30, there still isn’t enough of it to supply the booming A.I. industry.

Amazon can predict your buying habits because its algorithms are trained on the data collected from its 112 million Prime subscribers in the U.S. and the tens of millions of other people around the world who visit the site and use its other products on a regular basis. Google’s advertising business depends on predictive models fueled by the billions of internet searches it processes each day and data from the 2.5 billion devices running the Android operating system. The tech giants have carved out these massive data monopolies, and that gives them near-impenetrable advantages in the field of A.I.

So how is a small A.I. startup to train its models to compete? Data collection is a time-consuming and expensive process. What about a hospital chain that wants to harness A.I. to better diagnose diseases but can’t use its own patient data due to federal privacy laws and cybersecurity concerns? Or a credit scoring agency seeking to model risky behavior that doesn’t want to use sensitive consumer information?

The answer, increasingly, is to use synthetic data—created by A.I., for A.I. In many cases, it’s a cheaper and faster option, but it carries a risk: The techniques used to generate realistic-looking data can also exacerbate harmful biases in that data.

Synthetic data comes in many forms, from images of fake faces that are indistinguishable from real ones to statistically realistic purchasing patterns for thousands of fictional customers. Executives at multiple synthetic data companies—including established firms like GenRocket and startups such as Mostly AI, Hazy, and AI Reverie—said they’ve seen a huge growth in demand for boutique data sets over just the past two years. Companies can also turn to open-source tools like Synthea, which researchers at institutions including the U.S. Department of Veterans Affairs use to create realistic medical histories for thousands of fake patients in order to study disease patterns and treatment paths.

Executives at multiple for-profit synthetic data companies, as well as at Mitre Corp., which created Synthea, have seen an explosion of interest in their services over the past several years. With that growth, though, comes potential peril for algorithms that are increasingly used to make life-changing decisions—and increasingly shown to amplify racism, sexism, and other harmful biases in high-impact areas like facial recognition, criminality prediction, and health care decision-making. Researchers say that in many cases, training an algorithm on algorithmically generated data increases the risk that an artificial intelligence system will perpetuate harmful discrimination.

“That process of creating a synthetic data set, depending on what you’re extrapolating from and how you’re doing that, can actually exacerbate the biases,” says Deb Raji, a technology fellow at the AI Now Institute. “Synthetic data can be useful for assessment and evaluation [of algorithms], but dangerous and ultimately misleading when it comes to training [them].”

One of the most common ways to create synthetic data is with a generative adversarial network, or GAN, a method developed in 2014 whereby two neural networks are pitted against each other. First, both are trained on similar sets of real data. Then the first network, or generative model, attempts to synthesize data realistic enough that it will fool the second network, the discriminatory model, into believing the synthesized data came from the same source as the real training data. The more the two networks compete in this positive feedback loop, the better they each get at their task, resulting in a synthetic data set that can be, statistically and to the naked eye, nearly indistinguishable from the real thing.

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Fake Data Could Help Solve Machine Learning’s Bias Problem—if We Let It

Leave a Reply Cancel reply

Upcoming Events

MarkLogic World | Amsterdam

Knowledge Graph — The Ultimate Center of Excellence

From Text to Value: Pairing Text Analytics and Generative AI

Bringing Data Closer to Decision Makers with Data Fabric

Categories

Tags

You Might Be Interested In

Should You Get A Degree In Business Analytics Or Data Science

How HR Departments Can Obtain and Use Big Data

10 steps for creating a single view of your business

Recent Jobs

Senior Cloud Engineer (AWS, Snowflake)

IT Engineer

Data Engineer

Applications Developer

Do You Want to Share Your Story?

Join our community

Our Services

Company

Work With Us

Follow Us

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.

Fake Data Could Help Solve Machine Learning’s Bias Problem—if We Let It

Leave a Reply Cancel reply

Upcoming Events

Categories

Tags

You Might Be Interested In

Recent Jobs

Do You Want to Share Your Story?

Join our community

Our Services

Company

Work With Us

Follow Us

Get the 3 STEPS

To Drive Analytics Adoption And manage change

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.

To Drive Analytics Adoption
And manage change