How AI risks automating racism, prejudice, and human bias
Chris Middleton explains how problems in human society are being replicated – often accidentally – by artificial intelligence (AI).
• At 3,800 words and exhaustively researched, the most in-depth report available on this issue. This article has been quoted in London’s Evening Standard newspaper. Updated February 2018.
AI is the new must-have differentiator for technology vendors and their customers. Yet the need to understand AI’s social impact is overwhelming, not least because most AI systems rely on human beings to train them. As a result, existing flaws and biases within our society risk being replicated – not in the code itself, necessarily, but in the training data that is supplied to some systems, and in the problems that they’re being asked to solve.
Without complete data, AI programmes can never be truly impartial, they can only reflect or reproduce the conditions in which they are created, and/or the belief systems of their creators.
This report will explain how and why, and share some real-world examples. The need to examine these issues is becoming increasingly urgent. As AI, machine learning, deep learning, and computer vision rise, buyers and sellers are rushing to include AI in everything, from enterprise CRM to national surveillance programmes and policing systems.
Are people with tattoos criminals?
One example of AI in national surveillance is the FBI’s scheme to record and analyse citizens’ tattoos, in order to predict if people with ink on their skin will commit crimes. Take a ‘Big Bang’ view of this project (rewind the clock to infer what the moment of creation must have been), and it’s clear that a subjective, non-scientific viewpoint (‘people with tattoos are criminals’) was adopted as the core principle of a national security system, and software was designed to prove it.
The code itself is probably clean, but the problem that the system is being asked to solve, and the data it is tasked with analysing, are inherently flawed. Arguably, they betray the prejudices of the system’s commissioners. Why else would it have been conceived?
In such a febrile atmosphere, the twin problems of confirmation bias in research, and human prejudice in society, may become automated pandemics: AIs that can only tell people what they want to hear, because of how the system has been trained – but with a veneer of evidenced fact.
Often this part of the design process will be invisible to the user, who will regard whatever results the system produces as being impartial.
A recent AI white paper published by UK-RAS, the UK’s research organisation for robotics and AI, makes exactly this point: “Researchers saw how machine learning technology reproduces human bias, for better or for worse. [AI systems] reflect the links that humans have made themselves.”
That’s the view of the UK’s leading AI and robotics researchers. So, is AI automating prejudice and other societal problems? Or are these issues simply hypothetical?
The racist facial recognition system
The unfortunate fact is that they are already becoming real-world problems, in a significant minority of cases.
Take the facial recognition system developed at MIT recently that was unable to identify African American people, because it was created and tested within a closed group of white researchers.
The libraries for the system were distributed worldwide before an African American student at MIT exposed the fact that it could only identify white faces – a problem that has been endemic in imaging systems for decades, as this video explores.
Similar problems have been reported in other smart imaging systems, such as those developed by HP and Logitech.
The MIT story was shared by Joichi Ito, head of the organisation’s Media Lab, at the World Economic Forum 2017. He described his own students as “oddballs” – introverted white males working in small teams with few external reference points, he said.
The programmers weren’t consciously prejudiced, Ito explained, but it simply hadn’t occurred to them that their group lacked the diversity of the real world into which their system would be released.
A survey published in February 2018 by the New York times found that this problem now exists across multiple AI systems. MIT’s Media Lab – perhaps inspired by its own experiences – carried out tests of facial recognition algorithms developed by IBM, Microsoft, and China’s Face++ [see separate report for more on Face++].
MIT found that gender was misidentified in up to 35 per cent of darker-skinned females, versus just seven per cent of lighter-skinned women. The gender of up to 12 per cent of darker-skinned males was misidentified, versus just one per cent of lighter-skinned men. The results also suggest that the training data was weighted towards males.
The New York Times adds, “One widely used facial-recognition data set was estimated to be more than 75 percent male and more than 80 percent white, according to another research study.” However, it doesn’t link to the source of these figures.
Male dominance and insularity are big problems for the tech industry: in the UK, just 17 per cent of people in science, technology, engineering, or maths (STEM) careers are women, while in the West the overwhelming majority of coders are young, white males.
The UK-RAS report shares another example of racial bias entering AI systems: “When an AI program became a juror in a beauty contest in September 2016, it eliminated most black candidates, as the data on which it had been trained to identify ‘beauty’ did not contain enough black-skinned people.” Again, the humans training the AI unconsciously weighted the data.
The lesson here is not that any given AI or line of code is inherently biased – although it might be – it’s that the data that populates AI systems may reflect local/social prejudices.
At the same time, AI is seen as impartial, so any human bias risks being accepted as evidenced fact. Many AI systems are so-called ‘black box’ solutions (see below), making it hard for users to interrogate them to see how or why a result was arrived at. In short, many AI systems are inscrutable.
But in some cases the bias is both deliberate and overt.
In China, the billion-dollar facial recognition startup and platform Face++ includes an application that allows users to ‘beautify’ video footage by applying a skin-whitening algorithm.
Face++ also offers a ‘beauty score’ app that rates physical attractiveness. Those parameters must first have been programmed by a team of human beings. But who were they, and how did they train the system?
Staring bias in the face
The possibility that AI could worsen discrimination in human society is now being taken seriously by analysts and researchers. For example, The Age of Automation, a 2017 report by the RSA and YouGov, suggests that automation could lead to an “entrenchment of demographic biases”.
The RSA adds that the use of AI in recruitment – such as algorithms that screen CVs – could amplify workplace biases and block people from employment based on their age, ethnicity, sexual orientation, religious beliefs, or gender.
Yet discrimination can take subtler forms than gender bias or racism in the workplace, adds the report. “Equipped with AI systems, organisations will have greater precision in predicting people’s behaviours and the risks they face.
“This could lead to certain groups being denied access to goods, services, and employment opportunities.” (In China, a compulsory social ratings system is already being used to do just that, as this in-depth report, and this article, both explain.)
“Insurance companies, for example, may one day be able to use advanced algorithms to determine the likelihood of prospective customers acquiring a disease, making them uninsurable.”
These fears are shared by Lord Clement-Jones, who chairs the UK’s Parliamentary Select Committee on the economic, ethical, and social implications of AI. In September 2017, he said: “How do we know in the future, when a mortgage, or a grant of an insurance policy, is refused, that there is no bias in the system?
“There must be adequate assurance, not only about the collection and use of big data, but in particular about the use of AI and algorithms. It must be transparent and explainable, precisely because of the likelihood of autonomous behaviour. There must be standards of accountability, which are readily understood.”
Inequality and consent
In the post-Weinstein world, the unequal treatment of women in society is also in the spotlight, and there is evidence that AI and automated processes are reintroducing outmoded stereotypes in order to ensure that advertising has the broadest possible impact.
For example, speaking at the Brighton Digital Festival in October 2017, Dr Tanya Kant, Lecturer in Media and Cultural Studies at the University of Sussex, lamented the blanket targeting of adverts for pregnancy testing kits at women on YouTube, reintroducing overt social pressures that women have long fought against.
Consent adds another dimension to the question of bias in AI and automation programmes. It’s illegal to collect personally identifiable information (PII) or data (PID) without the subject’s consent – national security applications excepted. But if we do choose to divulge personal details, we may not be aware that an organisation is mining them to identify hidden traits in our personalities. In the future, the Ts & Cs we sign will demand detailed reading.
Consider this: the human face is the definitive example of PII, which is why our passports contain our photos and facial recognition systems can verify our identities. But when we have our pictures taken for ID purposes – or friends tag us on Facebook – are we consenting to an AI system analysing our features to predict our beliefs, politics, health, or sexual orientation?
This isn’t a science fiction scenario: such programmes already exist. Are they legal? Might they, too, be examples of confirmation bias? Would a citizen ever find out why they had been denied services, life insurance, employment, or admission? And what if the AI is wrong?
The legal dimension
So, why are all of these risks so important to consider?
Evidence is mounting that data problems may already have begun to automate bias within our legal systems: a real challenge as law enforcement becomes increasingly augmented by machine intelligence in different parts of the world.
COMPAS is an algorithm that’s already being used in the US to assess whether defendants or convicts are likely to commit future crimes. The risk scores it generates are used in sentencing, bail, and parole decisions – just as credit scores are in the world of financial services. A recent article published on FiveThirtyEight.com set out the alleged problem with COMPAS:
“An analysis by ProPublica found that, when you examine the types of mistakes the system made, black defendants were almost twice as likely to be mislabeled as likely to reoffend – and potentially treated more harshly by the criminal justice system as a result. On the other hand, white defendants who committed a new crime in the two years after their COMPAS assessment were twice as likely as black defendants to have been mislabeled as low-risk.
“An even stickier question is whether the data being fed into these systems might reflect and reinforce societal inequality. For example, critics suggest that at least some of the data used by systems like COMPAS is fundamentally tainted by racial inequalities in the criminal justice system.”
Again, this suggests a problem of flawed data being fed into an application that is seen by its users as impartial.
Tainted data in a networked system
The problem of tainted data runs deep in a networked society. In the 2018 New York Times investigation of racial bias in AI algorithms quoted above, it mentions some more troubling statistics.
According to the NYT, researchers at the Georgetown Law School estimated that the faces of 117 million American adults – 47 per cent of all adults, and 36 per cent of the total population – are now logged in the facial recognition systems used by law enforcement agencies. It adds, “African Americans were most likely to be singled out [for inclusion], because they were disproportionately represented in mug-shot databases.”
This problem has been found anecdotally by a journalist colleague. In 2015, he shared a story with Facebook friends of how he searched for pictures of teenagers to accompany an article on youth IT skills – a simple image search.
When he searched for “white teenagers”, he said, most of the results were image library shots of young people; but when he searched for “black teenagers”, he was shocked to see Google return a disproportionately high number of criminal/suspect mugshots.
(Author’s note: I verified his findings at that time by repeating the experiment. However, the problem is far less overt today, suggesting that Google has tweaked its image search algorithm after widespread reports of similar problems.)
The underlying point is simple. For decades, overall media coverage in the US, Europe, and the UK, has disproportionately focused on criminality within certain ethnic groups. This partial coverage populates the network, which in turn reinforces public perceptions: a vicious circle of confirmation bias feeding confirmation bias.
This is why diversity programmes and positive messaging are important; it’s not about ‘political correctness’, as some allege; it’s about rebalancing a system before we replicate it in software.
A problem with deep roots
This extraordinary article on Google search data reveals how prejudices run much deeper in human society than some of us would like to believe, and the problem is revealed by what we search for in private more than by what we say in public. (Sample quote: “Overall, Americans searched for the phrase ‘kill Muslims’ with about the same frequency that they searched for ‘martini recipe’ and ‘migraine symptoms’.”)
Human bias can affect the data within AI systems at both linguistic and cultural levels, because – as we’ve seen – most AI still relies on being trained by human beings. To a computer looking at the world through camera eyes, a human is simply a collection of pixels; AI has no concept of what a person is, or what human society might be. For that, it needs human input.
A computer has to be taught to recognise that a certain arrangement of pixels is a face, and that a different arrangement is the same thing. And it has to be taught by human beings what ‘beauty’ and ‘criminality’ are by feeding it with the relevant data.
The many different case studies cited above demonstrate that both of these concepts are subjective and prone to human error. At the same time, legal systems throughout the world have radically different views on crime – as we will see below.
The conclusion is inescapable. For better or worse, our IT systems tend to replicate our beliefs and personal values – including any misconceptions or omissions. At the same time, coders themselves often prefer the binary world of computers to the messy, emotional world of humans. Again, MIT’s Ito made this observation of his own students.
The proof of Tay
In 2016, Microsoft’s Tay chatbot disaster proved this point: a naïve robot, programmed by binary thinkers in a closed community.
As was widely reported at the time, Tay was goaded by users into spouting offensive views within 24 hours of release, as the AI learned from the complex human world it found itself in.
Humour and internet trolls weren’t part of its training. That’s an extraordinary omission for a chatbot let loose on a social network, and it speaks volumes about the mindset of its programmers.
However, the local/cultural dimension of AI was demonstrated by another story in 2016. In China, Tay’s Chinese counterpart, Microsoft’s Xiaoice chatbot, faced none of the problems that its counterpart did in the West. Chinese users behaved differently, and there were few reported attempts to subvert the application.
Arguably, this is proof that AI is both modelled on, and shaped by, local human society. Its artificiality does not make it neutral.
The rise of robocop
These issues will become more and more relevant as law enforcement becomes increasingly automated. The cultural landscape and legal system surrounding a robot policeman in, say, Dubai is very different to that in Beijing or San Francisco.
This is significant, because in each of these locations robots are already being trained and trialled by local police services: Pal Robotics’ Reem machines in Dubai (in public liaison/information roles); Knightscope K5s in the Bay Area (which patrol malls, recording “suspicious activity”, according to its makers); and Anbot riot-control bots in China.
Based on the findings above, there is little basis for assuming that future AI police officers (or bespoke applications) will implement a form of blank, globalised machine intelligence without bias or favour. It is more likely that they will reflect the cultures and legal systems of the countries in which they operate, just as human police do.
And the world’s legal systems are far from uniform. In Saudi Arabia, for example, to be an atheist is to be regarded as a terrorist, and women have far fewer rights than men. In Iran, homosexuality is punishable by death, as are offences such as the abandonment of religious belief (apostasy).
It’s comforting to believe that, in the real world, no one would design AIs or facial recognition algorithms to determine citizens’ private thoughts, political beliefs, or sexual orientation, and yet here’s an example of AI being deployed to predict if people are gay or straight.
Note how quickly this system has been developed within the current AI boom. Take the Big Bang approach once again, and ask: Why was this issue uppermost in the developer’s mind?
Now factor in robot police or AI applications enforcing laws in one culture that another culture might find abhorrent. The potential is clearly there for technology to be programmed to act against globally stated human rights.
Returning to race
Let’s again consider the risk of automating racial bias in this context.
In the US, the numbers of people shot by police are documented here by the Washington Post, while this report suggests that black Americans are three times more likely to be killed by officers than whites.
Meanwhile, this article exposes the racial profiling that occurs in some sectors of US law enforcement – despite attempts to prevent it – while this news story reveals that black Americans are between three and five times more likely to be charged with marijuana offences than white Americans, despite equal use of the drug in both groups.
In the UK, statistics reveal that force is more likely to used against black Londoners by police than against any other racial group. This is the messy human world that robots are entering – robots that will be programmed by human beings.
The political context needs to be considered too. Throughout the world, politicians are increasingly targeting minority groups or removing legal protections from them, even in societies that we don’t usually regard as oppressive (from our own cultural standpoints).
In the US alone, recent examples include the proposed US bans on people travelling from certain Muslim-majority countries, and on transgender people serving in the military, along with the proposed removal of legal protections for LGBTQ people and the proposed scrapping of the Obama-era DACA scheme. Russia is among several other countries to turn against LGBTQ citizens in what appears to be a concerted national campaign.
So might any future robocop perpetuate the apparent biases in the US or Russian legal systems, for example? As we’ve seen, that will depend on what training data has been put into the system, by whom, to what end, and based on what assumptions. The COMPAS case study above suggests that core data can itself be tainted at source, by previous flaws and inequalities in the legal system.
The limits of AI
But let’s get back to the technology itself. The UK-RAS white paper acknowledges that AI has severe limitations, at present, and that many users have “unrealistic expectations” of it.
For example, the report says: “One limitation of AI is the lack of ‘common sense’; the ability to judge information beyond its acquired knowledge […] AI is also limited in terms of emotional intelligence.”
Then the researchers make a simple observation that everyone rushing to implement the technology should consider: “true and complete AI does not exist”, says the white paper, and there is “no evidence yet” that it will exist before 2050.
So it’s a sobering thought that AIs with no common sense and possible training bias, and which can’t understand human emotions, behaviour, or social contexts, are being tasked with trawling context-free data pulled from human society in order to expose criminals – as defined by politicians.
And yet that’s precisely what’s happening in US and UK national surveillance programmes.
Opening the ‘black box’
The UK-RAS white paper takes pains to set out both the opportunities and the risks of AI, which it describes as a transformative, trillion-dollar technology, the future of which extends into augmented intelligence and quantum computing.
On the one hand, the authors note: “[AI] applications can replace costly human labour and create new potential applications and work along with/for humans to achieve better service standards. […] It is certain that AI will play a major role in our future life. As the availability of information around us grows, humans will rely more and more on AI systems to live, to work, and to entertain. […] AI can achieve impressive results in recognising images or translating speech.”
But on the other, they add: “When the system has to deal with new situations when limited training data is available, the model often fails. […] Current AI systems are still missing [the human] level of abstraction and generalisability. […] Most current AI systems can be easily fooled, which is a problem that affects almost all machine learning techniques.
“Deep neural networks have millions of parameters, and so to understand why the network provides good or bad results becomes impossible. Trained models are often not interpretable. Consequently, most researchers use current AI approaches as a black box.”
That last quote is telling: researchers are saying that some AI systems are already so complex that even their designers can’t say how or why a decision has been made by the software.
Organisations should be wary of the black box’s potential to mislead and to be misled, along with its capacity to tell people what they already believe – for better, or for worse.
Business and government should take these issues on board, and the systems they release into the wild must be transparent – as far back as the first principles that were adopted before any parameters were specified.
More, the data that is being put into these systems should be open to interrogation, to ensure that AI systems are not being gamed to produce weighted results.
Regulations may help. The EU’s GDPR, which comes into force in May 2018, provides citizens with a new right to see information about the logic involved in, and the “significance and envisaged consequences of”, any automated decision-making systems that affect them.
Users: question your data before you ask an AI to do it for you, and challenge your preconceptions.
• For more articles on robotics, AI, and automation, go to the Robotics Expert page.
• Further reading:
Could populism be a side effect of social algorithms? (Opendemocracy)
How Google search data reveals the truth of who we are (Guardian).
Face-reading AI will be able to detect your politics, claims professor (Guardian).
When AI and privacy meet (Constellation Research)
© Chris Middleton 2017