How to avoid making AI “discriminatory by design”

February 13, 2019 | By Rande Price, Research VP – DCN

Artificial intelligence (AI) is part of today’s decision-making process. It’s used when companies identify the value and risk of issuing credit cards or forecasting unemployment benefits. It’s also used in employee recruitment and the college admissions process. Since AI is based on machine learning, each small decision impacts larger ones. Unfortunately, AI algorithms, especially among those conducted in black boxes, may include discriminatory practices. A biased outcome may not necessarily be the intent of the algorithm, but it can easily be a by-product.

Frederik Zuiderveen Borgesius, Professor of Law at the Institute for Computing and Information Sciences (iCIS), Radboud University Nijmegen, addresses issues of AI bias in his new report Discrimination, artificial intelligence, and algorithmic decision-making. Borgesius analyzes the AI process to better understand how unfair differentiation can be produced. He wrote this report specifically for the Anti-discrimination department of the Council of Europe. However, it is an important read for anyone involved in or thinking of using AI in decision-making.

The AI Decision-making Process

It’s important to first understand the basics of an AI decision-making program. AI involves machine-learning to find correlations in data sets. It uses algorithms to identify the relationships in a set of related attributes or activities, also known as class labels. The class labels separate all possibilities into mutually exclusive categories. When building a machine learning tool, programmers use class labels to predict a derived outcome or what is called the targeted variable.

To understand how this is works in a common application, think of a spam filter. The spam filter is an AI program that sorts through email messages and identifies those that are “spam” and “non-spam.” The program uses archives of older emails labeled as spam or non-spam to help identify the characteristics (a certain phrase, an email address or an IP address) of each.

Professor Borgesius references the work of Solon Barocas and Andrew D. Selbst, two academic research experts, who identify five ways in which the AI decision-making process can lead unintentionally to discrimination.

How AI Leads to Discrimination

1. Defining the target variables and class labels

When defining target variables and class labels, it’s important to think beyond how they are defined. For example, let’s say a company wants to define an “engaged employee.” A variable assigned to an engaged employee is someone who is never late for work. Unfortunately, this could negatively impact employees who do not own a car and depend on public transportation. Car ownership can also reflect higher income while reliance on public transportation can connote lower income. Therefore, this class label of never being late creates a bias against lower income employees. Mindfulness in the usage and creation of class labels is important to prevent built-in biases.

2. The training data: labelling examples

AI decision-making also offers discriminatory results if the system “learns” from discriminatory training data. All training data should be scrutinized to ensure against biases. For example, a medical school decided to use AI decision-making in its application process. The training data for the programs included old admission files from 1980. Unfortunately, the acceptance policy in the 1980s was heavily weighted against women and immigrants. While the AI program was not introducing new biases, it included those inherent in the admissions process of the older applications.

3. Training data: data collection

The sampling process of the data collection must be free of biases. If the sampling process is biased, it will train the predictive models and reproduce the biases. For example, the number of police officers sent to patrol a neighborhood is often dependent on key variables such as neighborhood size or density, etc. If a larger number of officers patrol a neighborhood and report a high level of crime, we need to understand the factors involved. Otherwise, the data amplifies a high crime rate in this neighborhood when it could be that was a higher ratio of officers to see more crimes in progress.

4. Feature selection

A programmer selects the categories or features of data to include in their AI system. By selecting certain features, a programmer may introduce bias against certain groups. For example, many companies in the U.S. hire employees who graduated from an ivy league university. An ivy league education cost significantly more than a state university. If a company uses ivy league universities as part of their data features, they are establishing a bias against individuals of lower income. Data features must be fully accessed to ensure characteristics do not introduce bias in the results.

5. Proxies

Sometimes measures to make a relevant and well-informed decision may lend themselves to a biasness. Zip codes are often used as a neutral criterion to provide socio-economic information for decisions on loans, credit cards, insurance, etc. However, if a zip code is used as proxy to identify people of a specific race or gender, it will impact business results.

Summary

Importantly, transparency of AI systems and the decision-making process is necessary. Borgesius, as well as other academic scholars, advocate for the development of transparency enhancing technologies (TETs) to drive meaningful transparency of the algorithmic processes. AI decision-making can result in negative consequences for people, especially protected member classes. Caution must be used in algorithmic decision-making to ensure AI does not pave the way for discrimination.

artificial intelligence