AI Design and Database
A common view regarding bias in Artificial Intelligence (AI) is that the source of the problem relates to the lack of representation within datasets. While an important consideration, focusing solely on datasets can result in a neglect of other essential factors involved in bias mediation. The 'dataset driven' argument has emerged following a wide range of studies that have have demonstrated that when an AI model is trained to perform a task on a collection of data points, the model performance is optimised to those instances – hence, individuals who are not present in the original dataset will experience sub-optimal performance when the model is applied to them.
The problem is especially prevalent in the sciences, due to the use of historic research trials and ‘sample’ populations which are largely comprised of those who reflected the demographic group in power at the time (western white men). Different social sectors have disadvantaged gender and racial/ethnic minorities by excluding them from testing samples, research trials, and training datasets throughout history (1–5). A world that does not collect data on underrepresented groups produces social instruments that are less effective for their use.
"Where men shape technology, they shape it to the exclusion of women, especially Black women.” Safiya Umoja Noble, Algorithms of Oppression: How Search Engines Reinforce Racism
Dr Joy Buolamwini, one of the pioneers of AI bias, highlighted this as an MIT student in Computer Science when she demonstrated that a facial recognition system could not recognise her darker skin, resulting in her using a white mask to be recognised. Now leading the Algorithmic Justice League (AJL), Dr Buolamwini continues to demonstrates these issues in her work at the AJL, through the gender shades project evaluating AI bias in commercial products and the recent documentary ‘Coded Bias’ (6,7).
"Because when we say human, on the whole, we mean man". Caroline Criado Pérez, Invisible Women: Data Bias in a World Designed for Men
AI and Foundational Knowledge
While bias in recruitment of participants for datasets is important, it is only one small part of a much wider picture. In fact, if you solely focused on improving the representation of minoritised participants, even to the degree of having >50% from the minority group, this will not ensure an eradication of bias. The root of the issue does not start at recruitment, but much further upstream, at the knowledge base from which the intention was set.
In designing datasets, we choose variables and parameters e.g., the use of different blood tests to predict a disease. Yet often the variables themselves and the thresholds for distinguishing between normality and abnormality are poorly suited to heterogenous cohorts. For example, researchers recently demonstrated that AI models created to predict liver cancer under-performed for females (1). While female underrepresentation in the datasets was an issue, the choice of variables and the defined thresholds for medical abnormality posed a greater challenge. In medicine, the original research used to develop these diagnostic blood tests were drawn from the US military (99% male at the time), thus the threshold for determining disease was tailored to male physiology (1). Females’ express disease differently in their blood, hence when they are evaluated according to male blood-test thresholds their disease may remain undetected.
In the example of AI being used to predict liver cancer, the use of parameters tailored to male blood disadvantaged females. No matter how many more females were introduced to these datasets, the bias would not be resolved as long as women were being assessed according to male metrics (1). When a narrow demographic dominates a discipline, the domain develops with this one group’s needs, perspectives, and understandings in mind. Medicine lacked the perspectives of women, racial/ethnic, and sexual and gender minorities until the last century and this has been at the detriment of their health outcomes (5).
In ‘Algorithms of Oppression’ Safiya Noble demonstrates how biases in our societal knowledge base permeates the digital structures that we rely on. Her evaluation of Google search ‘Auto-completes’ illuminated damaging assumptions and references to black girls, often in a sexualised or derogatory manner. These biases do not stem from underrepresentation of minorities in datasets, they are based in harmful belief systems that are embedded into the knowledge base of AI. Now that large language-models such as Word2Vec and ChatGPt3 are trained off billions of data points from the internet, these belief systems become integrated into the architecture of AI models (8). More recent research has demonstrated this in AI used within mental health, in which researchers illustrated damaging sexists and racist assumptions related to mental health (8).
AI Evaluation & Belief Systems
AI Bias becomes reinforced when individuals evaluating the decisions of the model, hold stereotypical beliefs that the AI has reaffirmed. For example, diagnosis of personality disorders in psychiatry is highly susceptible to clinician bias. Researchers have demonstrated that practitioners tend to perceive men as antisocial personalities and women as hysterical personalities, even when the patients have identical symptoms (2,3). Imagine therefore that an AI system built for psychiatric use continues to diagnose women with high rates of
personality disorders. If the individual evaluating the predictions holds the incorrect traditional societal beliefs, they will reproduce past discrimination by reaffirming an AI system that produces discriminatory beliefs from the past. Those who act to evaluate the ‘accuracy’ of AI predictions are instrumental in teaching AI that harmful belief systems are correct. If those harmed by discriminatory belief systems are not sat at the table when evaluating an AI’s output, these ideologies will often not be challenged or corrected.
“Representation of the world, like the world itself, is the work of men; they describe it from their own point of view, which they confuse with the absolute truth” - Simone de Beauvoir
AI & The Broader Design
When discussing issues of AI bias, it is often within the context of a specific algorithmic model being deployed in a decision-making process. However, the AI exists as a component of a wider system which may also exhibit biases in the design of hardware or device components. For example, the gender-based differences in cybersickness related to virtual environments has been attributed to androcentric design of augmented realities (9). The issue becomes more pertinent to AI when we enter into a feedback loop, for example, pulse oximetry monitors have be demonstrated to underperform in populations with dark skin (10). If these readings are taken as an input to an AI model to predict oxygen requirements for a patient, the incoming data to the model will be sub-optimal for specific groups.
Caroline Perez provides extensive examples in her book ‘Invisible Women’, offering the
following insights:
“The average female handspan is between seven and eight inches, which makes the standard forty-eight-inch keyboard something of a challenge. Octaves on a standard keyboard are 7.4 inches wide, and one study found that this keyboard disadvantages 87% of adult female pianists. Meanwhile, a 2015 study which compared the handspan of 473 adult pianists to their ‘level of acclaim’ found that all twelve of the pianists considered to be of international renown had spans of 8.8 inches or above.”
“From development initiatives to smartphones, from medical tech to stoves, tools (whether physical or financial) are developed without reference to women’s needs, and, as a result these tools are failing them on a grand scale. And this failure affects women’s lives on a similarly grand scale: it makes them poorer, it makes them sicker, and, when it comes to cars, it is killing them. Designers may believe they are making products for everyone, but in reality they are mainly making them for men. It’s time to start designing women in”
Solutions
Solutions to AI bias sit at every point of the development pipeline, from the creation of an idea to deployment into the environment. We must (1) address the root knowledge and mainstream dogma that has been shaped by historic power relations; (2) once we approach a truth that is applicable across the population, representation must be present in datasets; (3) as the model is built the belief systems present in the data must be challenged; (4) when the model is evaluated there must be individuals present who have the ability to identify and eradicate harmful stereotypical beliefs and finally (5) all model predictions should be evaluated across demographic groups to identify those affected by poor model performance.
References
1. Straw I, Wu H. Investigating for bias in healthcare algorithms: a sex-stratified analysis of supervised machine learning models in liver disease prediction. BMJ health & care informatics. 2022;29(1).
2. Hamberg K. Gender bias in medicine. Womens Health (Lond). 2008 May;4(3):237–43.
3. Man-Made Medicine and Women’s Health: The Biopolitics of Sex/Gender and Race/Ethnicity - Nancy Krieger, Elizabeth Fee, 1994 [Internet]. [cited 2023 Jan 30]. Available from: https://journals.sagepub.com/doi/10.2190/LWLH-NMCJ-UACL-U80Y
4. O’neil C. Weapons of Math Destruction - Google Books [Internet]. [cited 2023 Jan 19]. Available from: https://www.google.co.uk/books/edition/Weapons_of_Math_Destruction/60n0DAAAQBAJhl=en&gbpv=1&dq=cathy+oneil+weapons+math+destruction&printsec=frontcover
5. Straw I. The automation of bias in medical Artificial Intelligence (AI): Decoding the past to create a better future. Artificial intelligence in medicine. 2020;110:101965.
6. Spotlight - Coded Bias Documentary [Internet]. [cited 2023 Jan 30]. Available from: https://www.ajl.org/spotlight-documentary-coded-bias
7. Gender Shades [Internet]. [cited 2023 Jan 30]. Available from: http://gendershades.org/
8. Straw I, Callison-Burch C. Artificial Intelligence in mental health and the biases of language based models. PloS one. 2020;15(12):e0240376.
9. Stanney K, Fidopiastis C, Foster L. Virtual Reality Is Sexist: But It Does Not Have to Be. Front Robot AI. 2020 Jan 31;7:4.
10. Covid: Pulse oxygen monitors work less well on darker skin, experts say - BBC News [Internet]. [cited 2023 Jan 30]. Available from: https://www.bbc.co.uk/news/health-58032842
Comments