Yes, machine translation technology has become surprisingly good in recent years. But that does not mean that everything is rosy in the MT camp. While the BLEU scores are improving the problem of bias has emerged. Translations produced by machines often display biases, including but not limited to gender bias.
Gender bias can happen when you are translating an expression that is gender-neutral in the source language but needs to be gender-specific in the target language, such as English doctor to German Arzt (male doctor) or Ärztin (female doctor).
If the machine is translating a sentence containing such a word from English into German, a lot depends on whether there is anything in the source text that could help the machine guess the gender of the person.
A sentence such as she is our new doctor is safe: the pronoun she is enough to nudge any well-trained machine translator toward the female translation of doctor (so, Ärztin not Arzt). But what if there are no such clues, as in I am your new doctor? How is a machine supposed to know which gender I refers to? This is an unresolvable ambiguity.
Machines usually “resolve” unresolvable ambiguities by picking whichever translation is statistically more likely, based on what they have seen more often in their training data. So doctors and directors tend to be translated as male, cleaners and caregivers as female: the translations are biased.
Gender bias affects not just nouns but also adjectives (I am happy in French is either je suis heureux if male or je suis heureuse if female); and, of course, pronouns (when translating gender-neutral pronouns from languages that have them).
There are other kinds of biases besides gender bias. One common cause of bias in MT is the English pronoun you. Translating you into other languages often implies having to decide whether it is singular or plural, formal or informal.
If there are some clues in the source text that can signal these distinctions, then good. But if there are no such clues (as in where are you?) then again we have an unresolvable ambiguity and the machine has to make an assumption based on statistical likelihood.
The result is that machine translators tend to favor translating you as singular and formal because that is what occurs most frequently in the (mostly written and formal) texts they have been trained on. That may not be what the user intended when they typed this sentence into a machine translator, but the machine is oblivious to that.
In theory, unresolvable ambiguities can occur on any word or expression in any language pair, and you will meet them in unexpected places.
One example for many: translating river into French requires knowing whether it is a small river that flows into another river (rivière) or a large river which flows into the sea (fleuve); but this information is often not present in the source text. So, a machine translator has to make an assumption and here we have it, another biased decision has been made.
The technical definition of bias is that it is the tendency of an automated system to make the same kind of assumptions again and again. Then there is the popular definition of bias, which is basically the same but with an implication of offense, harm, and injustice.
It is easy to see how gender bias, of all biases, can cause offense and perpetuate undesirable stereotypes. But, at its core, all MT biases are basically word-sense disambiguation problems: we have an expression in the source language that, when seen from the perspective of the target language, has two (or more) senses. The translator needs to disambiguate, to decide which one the author had in mind. Sometimes it is doable (when there is enough context to go on) and sometimes it is not. When not, then we have an unresolvable ambiguity.
Unresolvable ambiguities are different from other occasions when the MT simply gets things wrong. No AI, however smart, can ever guess what you meant if there is no trace of it in the text. This means that we cannot fix MT bias just by improving existing AI. The only way is to ask the user to disambiguate manually.
When you think about it, this realization — that we cannot fix MT bias just by improving the AI and that we have to ask users to disambiguate manually — is quite a game changer. It means we cannot treat machine translation as a linear process anymore, as a black box where in goes the text in one language and out it comes in another. The process will have to become more interactive, with humans in the loop.
Do major MT players such as Google and DeepL know this? Well, it seems that it is beginning to dawn on them. Google in particular has been taking gender bias seriously and has, as far back as 2018, launched a manual disambiguation feature for gender-specific nouns and gender-neutral pronouns in some language pairs such as English–Spanish. DeepL also has a manual disambiguation feature in some language pairs, not for gender but for changing the form of address between formal and informal.
Adding manual disambiguation into machine translation poses many challenges. First, the software needs to be able to detect that an unresolvable ambiguity has occurred. Second, it has to be able to produce alternative translations depending on the user’s choices.
Google has got to where it is now by manually annotating their training data and getting their language models to “know” about gender even when it is not overtly expressed.
DeepL is notoriously secretive but it is reasonable to assume their methods are similar.
An example of a different approach is Fairslator, a plug-in that works with any machine translator and only examines its output. A rule-based algorithm in Fairslator scans the source text for unresolvable ambiguities and another algorithm re-inflects the translation according to the user’s choices. (Disclosure: The author of this article is Fairslator’s founder.)
The bottom line is that the problem of MT bias is solvable with technology, but at a price. We will have to wean our users off the idea that MT is a black box, and bring them into the loop.
In a world where we are increasingly running out of room for further improvement in human-likeness (as measured by BLEU scores and the like), the next frontier for MT is ambiguity.