Artificial Intelligence Needs Data Diversity
- by 7wData
Artificial Intelligence (AI) algorithms are generally hungry for data, a trend which is accelerating. A new breed of AI approaches, called lifelong learning machines, are being designed to pull data continually and indefinitely. But this is already happening with other AI approaches, albeit with human intervention. A steady stream of data is the fuel for coveted results.
But, with the ever-increasing importance of data, the stakes of data bias are growing ever higher. AI companies have a moral obligation to their customers, and to themselves, to actively address data bias.
The examples of mistakes in this arena are numerous and egregious: Google's Photos application classified African Americans as gorillas. Amazon’s internal recruiting application downgraded female candidates, Microsoft’s AI chatbot adopted racist and anti-Semitic verbiage in response to conversations on Twitter and Amazon’s facial recognition software mislabeled 28 members of Congress as criminals. All of these instances, in addition to similar issues, have exposed the underbelly of bias creeping into AI results, to the chagrin of the leaders and stakeholders of these companies.
Failure to address and even anticipate these issue will not only deliver sub-par products; it will encourage luddites to reject AI entirely. Furthermore, legal repercussions have the potential to dwarf the large fines that have been imposed on big AI companies.
Machine learning methods do not have built-in biases, but data typically does. For instance, looking at U.S. mugshot photos alone, an AI algorithm could easily interpolate an incorrect relationship between skin color and incarceration. Indeed, a particularly egregious example was an algorithm that was used to assist in sentencing guidelines. Lacking any precautions to be race-blind, the algorithm learned to recommend stricter guidelines disproportionately for minorities.
How To Solve It
The most practical way to address the issue of data bias is to actively confront it in either the collection or curation phases for AI data. Algorithms can promulgate or even amplify biases in their data sources. Therefore, the data should be diversified to reduce bias.
Data collection and preparation should be done by the team with diversified experience, backgrounds, ethnicity, race, age, and viewpoints. The view of someone from a less developed or developing country in Asia is going to be different than the view of someone from a Western country.
[Social9_Share class=”s9-widget-wrapper”]
Upcoming Events
From Text to Value: Pairing Text Analytics and Generative AI
21 May 2024
5 PM CET – 6 PM CET
Read More