Turning a Blind Eye to Race in STEM: AI Language Models Magnify Bias
On December 2, Dr. Timnit Gebru, one of few Black women working in artificial intelligence ethics, was terminated from Google after working there for two years. Gebru is widely respected in the AI field, having previously done a postdoc at Microsoft Research on algorithmic bias and earning a PhD from the Stanford Artificial Intelligence Laboratory.
Gebru’s most recent research focused on the ethics surrounding AI models that can recognize and generate language, which can adopt bigoted language if a biased dataset is used. Gebru argued that there had been a misdirection in funding by only developing these large, often biased models for commercial purposes. Her termination demonstrates a broader issue that those from marginalized groups, especially Black women, are neither having their voices heard, nor are welcomed in the technology industry.
Natural language processing is a combination of linguistics and artificial intelligence. Language models analyze language data and train computers to be able to recognize and generate their own speech or text. This technology has more recently become prominent in everyday life, as it’s used in virtual assistants including Siri, Alexa, or Google Assistant.
There has been past research on racism and other biases in AI language models. StereoSet, created by researchers in machine learning and natural language processing, is a dataset used to quantify bias of language models, and is publicly available to promote unbiased models. In a paper evaluating popular language models on StereoSet, current language models displayed strong stereotypical biases surrounding gender, profession, race, and religion.
The biases were measured using fill-in-the-blank sentences, such as with gendered statements like, “Girls tend to be more ___ than boys.” Models also associated certain places like “Africa” with the descriptors, “poor” and “dark.”
Gebru’s most recent research discussed concerns about these biases, as well as the environmental and funding issues of large scale AI language models — those that big tech companies use and depend on.
The first concern is the carbon footprint associated with training and retraining a large language model. The bigger the language model, the more computer processing power and electricity is used, which leads to additional carbon emissions each time the model is trained. Training just one model, for instance, can lead to carbon emissions equivalent to that of five cars.
The bigger the language model, the more computer processing power and electricity is used, which leads to additional carbon emissions each time the model is trained.
This adapted technology benefits big technology companies at the expense of increased greenhouse gas emission that further subjects marginalized communities to bear the brunt of climate change. These communities are vulnerable to climate hazards because they are more likely to live in areas with water contamination or low-elevation coastal zones prone to flooding.
Secondly, these companies are more invested in being able to manipulate language so that virtual assistants work effectively, as opposed to truly understanding the subject and its depth. There is a lack of funds being directed towards models using more curated datasets that could lead to a deeper understanding or less biased results. The datasets that are used in training an AI model are important because datasets that are collected online for quantity rather than quality are potentially filled with racist or otherwise stereotyped language, thus perpetuating that language on a large scale.
The datasets that are used in training an AI model are important because datasets that are collected online for quantity rather than quality are potentially filled with racist or otherwise stereotyped language, thus perpetuating that language on a large scale.
Many, like the hundreds of Google staff and other AI academic and industry members that signed the Google Walkout For Real Change letter supporting Gebru, believe that Gebru was unfairly terminated due to her history of exposing flaws in AI technology — comprising language models. Gebru’s research chiefly showed that facial recognition technology was less accurate in recognizing female and BIPOC faces.
Deeply understanding the need for greater diversity in her field, Gebru co-founded Black in AI to increase the presence and support of Black people in AI and she built a more gender representative team during her time at Google. She has continuously been outspoken about diversity. While Gebru has had public support from many in the field, she also stated in an interview that she experienced racism and sexism during her time at Google, which is unfortunately unsurprising in her line of work.
Out of 177 tech firms based in Silicon Valley, only 2% of the workforce are Black, Latinx, or Native American women. Additionally, less than 0.5% of leadership positions within these firms are held by Black women. These women face a plethora of social barriers, including less access to advanced STEM coursework and social networks, and unwelcoming academic and workplace environments.
Beginning in high school, AP computer science can be an initial indicator and factor of whether students will continue to pursue computer science in the future. But, for many schools with higher populations of students from marginalized communities, they do not have the resources to offer the course or any similar classes. Even for schools that do, Black and Latinx girls face unwelcoming environments filled with stereotypes and biases stacked against them. As a result of their underrepresentation and institutionalized inequities, Black and Latinx women continue to face less mentorship and diverse networks into academia and into the industry.
Immensely limited diversity in the AI field has contributed to insufficient awareness about how this newly influential technology might affect marginalized groups, specifically in regard to climate change and racist language, among other issues such as discriminatory facial recognition or racist policing technology.
As AI technology continues to grow and become more present in our lives, it’s important that the tech industry both focuses on ways to eliminate racial biases from their data and models, and works to improve the retention of BIPOC employees.
Last updated 12/17/20
Read more from this column:
Kaya is a junior studying Industrial & Systems Engineering at the University of Washington and has a love for writing! She enjoys learning and sharing stories about various social and STEM topics. If she's not working on an article, you could probably find her with a cup of tea and a book or crochet hook.