The Bias Dilemma: The Ethics of Algorithmic Bias in Natural-Language Processing


Feminist Philosophy Quarterly 

Volume 8 | Issue 3/4 Article 1 

 
Recommended Citation 
Deery, Oisín, and Katherine Bailey. 2022. “The Bias Dilemma: The Ethics of Algorithmic Bias in Natural-Language 

Processing.” Feminist Philosophy Quarterly 8 (3/4). Article 1. 

2022 

 
The Bias Dilemma: The Ethics of 

Algorithmic Bias in Natural-Language 

Processing  
 

Oisín Deery  

York University and Macquarie University 

oisin@oisindeery.com 
 

Katherine Bailey 

Shopify Inc. 

katherine@katbailey.net 
 

Deery and Bailey – The Bias Dilemma: The Ethics of Algorithmic Bias in Natural-Language Processing 

Published by Scholarship@Western, 2022  1 

The Bias Dilemma:  
The Ethics of Algorithmic Bias in Natural-Language Processing 

Oisín Deery and Katherine Bailey* 
 
 
Abstract 

Addressing biases in natural-language processing (NLP) systems presents an 
underappreciated ethical dilemma, which we think underlies recent debates about 
bias in NLP models. In brief, even if we could eliminate bias from language models or 
their outputs, we would thereby often withhold descriptively or ethically useful 
information, despite avoiding perpetuating or amplifying bias. Yet if we do not debias, 
we can perpetuate or amplify bias, even if we retain relevant descriptively or ethically 
useful information. Understanding this dilemma provides for a useful way of 
rethinking the ethics of algorithmic bias in NLP. 

 
Keywords: artificial intelligence, algorithms, bias 
 
 
1. Introduction  

Bias in statistical models or algorithms can result in moral harms. A model is 
an abstract representation of a phenomenon, such as how words relate in a natural 
language. We prefer the term “model” to “algorithm,” since strictly speaking 
“algorithm” refers only to the actual sequence of steps carried out during the training 
of a model or at the time it issues predictions. However, everyday use of “algorithm” 
generally picks out the already-trained model, which implicitly includes the data on 
which that model was trained. If the data are biased, then there are biased 
algorithms.1 Yet for simplicity’s sake, we will mostly stick with “model.” 

Reliance on biased models can exacerbate marginalization of vulnerable 
groups. For instance, the ethical issues related to statistical models in hiring have been 
widely discussed (e.g., Hu and Chen 2017). Problems can arise even when models are 
deployed to reduce the influence of bias. Thus, recidivism models aim to reduce the 

 
* Oisín Deery, Departments of Philosophy at Macquarie University, Sydney, Australia 
& York University, Toronto, Canada (oisin@oisindeery.com) 
Katherine Bailey, Shopify Inc., Toronto, Canada (katherine@katbailey.net) 
1 Algorithms can be biased for many reasons (see sections 6.3 and 8). We focus on 
bias deriving from data. 


Feminist Philosophy Quarterly, 2022, Vol. 8, Iss. 3/4, Article 1 

Published by Scholarship@Western, 2022  2 

influence of bias in decision-making when sentencing offenders yet often serve only 
to reproduce or disguise bias (Kehl, Guo, and Kessler 2017).2 

More generally, numerous ethical questions arise in relation to machine 
learning (ML) and artificial intelligence (AI), most of which we leave aside. Instead, our 
focus is on bias in the data on which ML models are trained, and especially language 
models. The central issue here is that the corpora on which such models are trained 
typically contain biases—for example, sexist or racist biases. That is because the 
corpora comprise examples of how people actually use language, and people use 
language in sexist and racist ways.3 Thus, the outputs of models trained on such data 
reflect biases (see, e.g., the papers surveyed by Blodgett et al. 2020). We must ask 
whether (1) it is practically speaking possible to debias language models and (2) 
whether we should do so, if possible.  

In relation to (2), a central issue is whether there might be practical or ethical 
value to not debiasing models. Some maintain—as we do—that not debiasing can be 
valuable, since models often reflect descriptively accurate information about how 
language is actually used (see Goldberg 2021). By contrast, others claim that not 
debiasing runs the risk of perpetuating and amplifying biases (Bolukbasi et al. 2016). 
Some argue that existing models do not actually provide descriptively accurate 
information at all, since the corpora on which the models are trained do not reflect 
marginalized voices (Bender, Gebru et al. 2021).  

These debates do not deploy the descriptive-normative distinction that we will 
outline, yet this distinction helps to clarify the points of contention in such debates. 
Furthermore, even if we had descriptively accurate models—for example, models 
that do not exclude marginalized voices—there might be practical and ethical value 
to not debiasing, especially when we can control for the negative effects of bias at the 
output stage rather than by debiasing the model itself.  

Once we recognize that there might be an ethical value to not debiasing, we 
encounter an ethical dilemma. Either we do not debias, in which case we retain 

 
2 Statistical recidivism models aim at reducing the influence of biases (explicit or 
implicit) of judges in sentencing. Yet the data these models use are typically obtained 
from questionnaires that criminals are asked to complete. These data often include 
details about a criminal’s upbringing, family, and social connections. Such data ought 
to be irrelevant to a criminal’s sentencing. Nevertheless, since the data are used to 
generate a “recidivism score” for criminals, which is used in sentencing, the data 
influence sentencing in a way that it should not. For example, higher scores often 
result in longer sentences. More perniciously, a longer sentence will often 
subsequently result in a higher score on the recidivism scale. 
3 Data can also be biased due to bad sampling. We do not focus on such bias (but see 
sections 6.3 and 8).  


Deery and Bailey – The Bias Dilemma: The Ethics of Algorithmic Bias in Natural-Language Processing 

Published by Scholarship@Western, 2022  3 

descriptive accuracy and potentially ethically useful information yet run the risk of 
perpetuating or amplifying bias; or instead we debias, thereby avoiding perpetuating 
bias yet losing descriptive accuracy and ethically useful information. Our aim is to 
draw attention to this dilemma and to show how it can help in rethinking the ethics 
of algorithmic bias in NLP systems.  

In section 2, we specify the type of bias we will focus on. In section 3, we 
outline the distinction between descriptive and normative correctness as often 
competing virtues of NLP models, by analogy with how biases rooted in statistical 
regularities are often thought to offer support for beliefs even when these beliefs are 
morally problematic. In section 4, we illustrate our distinction by focusing on a 
particular variety of NLP model called word-embeddings. In section 5, we discuss 
practical problems with debiasing NLP models. These problems lend support to our 
claim that it is often preferable to control for negative effects at the output stage 
rather than try to debias the models themselves. In section 6, we consider varying 
degrees of scope for debiasing NLP systems’ outputs. In section 7, we explain the 
ethical dilemma. In section 8, we discuss how our view helps to rethink the ethics of 
algorithmic bias in NLP.  
 
2. How (and How Not) to Control for Morally Relevant Bias  

Both human cognition and artificial systems can exhibit morally relevant bias. 
In humans, the relevant bias is typically a negative evaluative tendency regarding 
other people based on their apparent membership in a socially salient category or 
group (Brownstein 2018, 43, 126, 172; Brownstein and Saul 2016a, 2016b). Such 
biases are often explicit, consisting of mental attitudes (like belief) that an agent has 
and consciously endorses. For example, someone is explicitly sexist if they openly 
admit to being sexist and approve of these attitudes. By contrast, implicit biases are 
unconscious attitudes that can be difficult to inhibit, seem insensitive to an agent’s 
explicit attitudes, and are often in conflict with an agent’s explicitly held attitudes (see 
Levy 2017 for a review of the empirical literature). For example, even someone who 
explicitly condemns sexism might still have implicit sexist attitudes that they are 
unaware of and that are difficult to inhibit.  

When people act on explicit biases, we hold them morally responsible for 
doing so, and as a result we blame them for their behavior (see, e.g., Brownstein and 
Saul 2016b). It is more difficult to know whether we can blame or hold people 
responsible for actions caused by implicit biases, since it is unclear whether people 
have sufficient control over how their biases influence behavior for them to be 
responsible for these behaviors (Levy 2017).4 Even so, it is plausible to think that we 

 
4 The view that control is not needed for responsibility is a minority position (e.g., 
Smith 2008). It is more widely held that responsibility of the sort that might justify 


Feminist Philosophy Quarterly, 2022, Vol. 8, Iss. 3/4, Article 1 

Published by Scholarship@Western, 2022  4 

need, at least, to control for the negative influences of implicit bias when expressed 
in behavior.5 There can be structural ways of doing so. For example, the biased hiring 
of male lead violinists in orchestras (whether explicit or implicit) can be controlled for 
by having musicians audition behind a screen so that the hiring panel cannot see the 
violinist’s gender (see, e.g., Goldin and Rouse 2000). 

Bias in the relevant sense is something that it can be ethically problematic for 
a system to have, and it is especially bad when a system’s outputs have negative 
ethical effects by causing harms. Human cognition exhibits implicit bias, and various 
ML systems, including NLP systems, exhibit a close analog of such bias.6 In each case, 
the relevant system learns or acquires its biases from statistical regularities it 
identifies in the world.7 We humans learn biases from other members of our 
communities.8 For ML systems, biases are often learned from training data, which for 
NLP systems are language corpora that we produce. Since we produce the data 
containing the biases, the biases the NLP systems acquire they learn from us.9  

To preview one of our central claims: It can be difficult to debias human 
cognition, especially for implicit biases, and we think it is also difficult to debias ML 

 
blaming or praising does require control. The empirical literature suggests that we 
lack such control over how implicit biases affect our behavior (Levy 2017).  
5 We make no claim about whether agents can be responsible for actions caused by 
their implicit attitudes—although one of us defends this view in print (Deery 2021). 
Here, we claim only that even when behaviors are caused by implicit attitudes, we 
need to do something to control for their negative effects. 
6 The relevant biases for ML systems are analogous to implicit biases because such 
biases are not explicitly represented in models but instead take the form of implicit 
distributed patterns—e.g., between vectors in a model. We expand on this analogy 
between human cognition and ML systems in section 3. 
7 Some human cognitive or perceptual biases may be innate. They are not our focus 
here.  
8 In cases where a population uses words in certain ways and people acquire biases 
from exposure to that usage, the exposure might be mediated by how the media 
portrays such language use. Even so, we acquire biases from what we are exposed to 
(the world), which here would include how the media portrays language use. 
9 Might we have more control over the inputs to ML systems than we do for 
ourselves? If so, we could control for biases at the input stage. Yet the corpora on 
which ML systems are trained are vast, and to eradicate biases from them would be 
an impractical task. That is why those who seek to eliminate bias from models typically 
focus on the already-trained models (see section 5 for details). Exceptions include 
badly sampled data and algorithms that are biased for other reasons (see sections 6.3 
and 8 below).  


Deery and Bailey – The Bias Dilemma: The Ethics of Algorithmic Bias in Natural-Language Processing 

Published by Scholarship@Western, 2022  5 

systems, including NLP systems. For humans, it is often easiest to control for the 
negative effects of bias at the output stage. As noted already, auditioning musicians 
can perform behind a screen, thereby defusing the possibility of implicit biases’ 
influencing hiring. As a general matter, it seems that the negative effects of bias are 
often best controlled for when we acknowledge bias in a system and control for its 
negative influences at the output stage rather than by debiasing the system itself.10 
The possibility we explore in this paper is whether we might do something similar for 
NLP systems. Moreover, we claim that there can be an ethical value to not debiasing 
ML systems or their outputs, and we maintain that the question of whether we should 
debias raises an ethical dilemma. 
 
3. Descriptive vs. Normative Accuracy 

To explain our claim that there can be an ethical value to not debiasing models 
or outputs, we rely on a particular way of distinguishing descriptive accuracy from 
normative correctness.11 We explain our distinction by analogy with how implicit 
biases in human cognition can offer support for beliefs even when these beliefs are 
morally problematic. In this way, people’s epistemic reasons can support biased 
beliefs even when their ethical reasons cut against these beliefs. In relation to racist 
bias, Rima Basu outlines this epistemic/ethical conflict as follows: 
 

We may find ourselves facing the following conflict: what if the 
evidence . . . supports something we morally shouldn’t believe? For 
example, it is morally wrong to assume, solely on the basis of 
someone’s skin color, that they’re a staff member. But, what if you’re 
in a context where, because of historical patterns of discrimination, 
someone’s skin color is a very good indicator that they’re a staff 
member? When this sort of normative conflict looms, a conflict 
between moral considerations on the one hand and what you 
epistemically ought to believe given the evidence on the other, what 
should we do? It might be unfair to assume that they’re a staff 
member, but to ignore the evidence would mean risking inaccurate 
beliefs. Some . . . have suggested that we simply face a tragic 
irresolvable dilemma. (Basu 2020, 191; cf. Gendler 2011; Puddifoot 
2017) 

 
10 If the bias comes from badly sampled data, that is a different matter.  
11 We introduce the normative/descriptive distinction at some length since we have 
found it is unfamiliar to nonphilosophers working in AI and ML. Yet these researchers 
find it useful when explained.  


Feminist Philosophy Quarterly, 2022, Vol. 8, Iss. 3/4, Article 1 

Published by Scholarship@Western, 2022  6 

In other words, descriptively accurate information about statistical regularities can 
support beliefs that are morally problematic or normatively incorrect, in which case 
we face a choice about whether to prioritize the descriptive or instead the normative. 
Likewise, we think that there are two general ways in which ML systems or humans 
can be right or wrong—descriptively and normatively—and that a dilemma similar to 
the one Basu describes can arise even for ML systems.  

To explain, a visual representation of a scene gets things descriptively right to 
the extent that it accurately depicts visual information about a scene as presented 
and wrong to the extent that it does not. For example, if you photograph me robbing 
a store, your camera gets things descriptively right insofar as it accurately depicts my 
face and wrong insofar as it does not—for example, because its exposure settings are 
off. Most tasks a camera is designed to perform require for success that at least some 
threshold of descriptive accuracy (in our sense) be achieved in the camera’s outputs. 
Likewise, an audio recording of my boss making sexist or racist remarks is descriptively 
accurate to the extent that it records what he said and descriptively wrong to the 
extent that it does not, perhaps due to electromagnetic interference from another 
device. 

By contrast, we expect people—as moral agents—to get things normatively 
right in a moral sense. It is normatively correct for me to pay for my groceries instead 
of stealing them (all else being equal). When I rob a store, by contrast, I get things 
normatively incorrect by failing to do what I ought to do (pay for my groceries) and by 
doing what I should not (stealing them). Similarly, my boss behaves normatively 
correctly by not making sexist or racist remarks.  

We do not expect systems like cameras or audio devices to get things 
normatively right. Consider an imaginary camera, the iBlur, which blurs out the 
morally objectionable parts of a visual scene.12 As a result, the iBlur blurs out the 
image of me robbing the store, since robbing a store is a morally objectionable action 
(all else being equal). Not only do we not expect a camera to work like this but we 
expect it not to work this way, since such an imposition of normative correctness 
undermines descriptive accuracy, which the functions of a camera require.13 
Additionally, the iBlur would fail to provide us with ethically useful information, since 

 
12 We model our iBlur after the Arkangel system in the TV show Black Mirror. 
13 Cameras serve various functions, including the attainment of artistic or imaginative 
goals. Likewise, security cameras might be used to intimidate people or to shed light 
on the activities of select people in select places in a way that is morally problematic. 
Yet the tasks cameras perform still require that cameras accurately depict what is in 
front of them, and artistic or nefarious goals are served via such accuracy.  


Deery and Bailey – The Bias Dilemma: The Ethics of Algorithmic Bias in Natural-Language Processing 

Published by Scholarship@Western, 2022  7 

the authorities will need an accurate depiction of my face to hold me accountable.14 
And this is likewise the case for an audio recording of my boss making sexist or racist 
remarks. 

Regarding descriptive accuracy, one might worry whether devices such as 
cameras really do get things descriptively right (even if we grant that there is less of a 
worry in this regard for audio-recording devices). For instance, Shen-yi Liao and Bryce 
Huebner (2021, 94) have argued that cameras have a “light-skin bias.” The early 
development of Kodak film printing required that skin tones be matched to an image 
of a white model, with the result that “darker skin tones . . . [were] . . . over saturated, 
or under-lit, so the only images that . . . looked ‘right’ were images of light-skinned 
people” (95). Moreover, “light-skin bias . . . is not primarily a technical issue . . . . Film 
emulsions could have been designed that were more sensitive to a wider range of skin 
tones,” (95) but they were not. Why? Because “deeply entrenched forms of racial 
ignorance and racial biases led the people who were developing emulsion 
technologies to ignore variations in skin tone, or assume that racialized differences 
were irrelevant to the design of this technology” (95; see also Roth 2009; Smith 2013; 
Benjamin 2019). 

However, when we use cameras as a paradigm case of descriptive accuracy, 
we are considering an ideal—a system that does get things descriptively right (in a 
relevant regard). If a camera using Kodak film fails to accurately depict darker skin 
tones, then in that regard it will be descriptively inaccurate, notwithstanding its 
perhaps achieving descriptive accuracy in other regards.  

Still, there are two important lessons. First, if a lack of descriptive accuracy in 
any particular respect is due to bias, as in the Kodak film case, we must be alive to the 
possibility that for language models, even when we appear to have descriptive 
accuracy about how people use language, we might not, at least in some relevant 
respect, perhaps due to bias in the training data.15 Second, there are various respects 
in which a system or model might get matters descriptively right about how people 
use language, and different systems might have differing aims in this regard—for 

 
14 We will say more about why descriptively accurate information can be ethically 
useful in section 6. But note that even if knowing who perpetrated a wrongdoing is 
ethically useful information, we agree with Benjamin (2019) that often we have to ask 
prior questions about whose faces are being disproportionally recorded by cameras 
(e.g., members of overpoliced or oversurveilled communities), and in particular by 
devices such as CCTV. These considerations are consistent with our claims.  
15 This issue will be important later, when we discuss large language models. Some 
argue that models like BERT or GTP-2/3 not only reproduce bias (and are thereby 
normatively incorrect) but are descriptively inaccurate since the training data do not 
reflect marginalized voices (Bender, Gebru et al. 2021).  


Feminist Philosophy Quarterly, 2022, Vol. 8, Iss. 3/4, Article 1 

Published by Scholarship@Western, 2022  8 

example, a model might aim for descriptive accuracy in one respect yet not in another. 
For example, word-embeddings models (which we will discuss in detail in section 4) 
aim at capturing how words cluster in a vector space based on how people actually 
use language, and these models might be assessed for descriptive accuracy in this 
respect. As we shall see, “woman” and “homemaker” do cluster together, whereas 
“woman” and “computer programmer” do not. Even so, such a model is clearly 
descriptively inaccurate in another sense, since these clusterings are descriptively 
inaccurate when it comes to these words’ semantics—there is nothing about 
“woman,” semantically, in virtue of which it clusters with “homemaker” (by contrast 
with “bachelor,” which semantically clusters with the word “man”). 

One might think that whether a representation is descriptively accurate is a 
function of who we are and what we want to do with it—in which case, descriptive 
accuracy seems context-dependent. But this issue is more one of salience, not 
accuracy. Many representations are accurate even if some information they convey 
is not salient, and salience depends on the context. The family photograph I took in 
my front garden is descriptively accurate in depicting us smiling. In the context of 
sending a holiday card to friends, this information may be most salient. Yet the 
photograph might also accurately depict the hole in my eavestrough, and in a 
different context—such as getting a quote for fixing the hole—this descriptively 
accurate information will be of primary salience rather than the fact that we are 
smiling.16  

Finally, note that descriptive accuracy does not require perfect descriptive 
accuracy. Many representations of aspects of the world are descriptively accurate 
even when they are less than perfectly accurate. That is the case even for scientific 
theories, including theories of physics (see, e.g., Hitchcock 2004, 10). Moreover, a 
visual representation can be accurate without being perfectly accurate. For instance, 
canine visual systems are optimized more for low-light conditions than color, by 
contrast with our own vision (Barber et al. 2020). Even so, dogs’ vision is descriptively 
accurate in many respects and is often more descriptively accurate than ours. Finally, 
it is a truism that all models are inaccurate to some degree (see, e.g., Box 1976, 792) 
while still being usefully descriptively accurate (and thus also predictively or 
explanatorily useful). 

Leaving aside descriptive accuracy, a system or its outputs are normatively 
correct if they reflect the world as it ought to be—in some respect and on some 
conception of how the world should be, even if that is not how it is. Language models 
are normatively correct if they reflect language use as it should be. For example, a 
model might be normatively correct in not giving sexist outputs. Such a system would 
not resolve the analogy task “Man is to computer programmer as woman is to x” by 

 
16 We thank Shelley Park (at the 2021 FSJAI Workshop) for prompting this clarification.  


Deery and Bailey – The Bias Dilemma: The Ethics of Algorithmic Bias in Natural-Language Processing 

Published by Scholarship@Western, 2022  9 

filling in “homemaker” for x (we will discuss this example in greater detail in section 
4). By contrast, normatively incorrect models or outputs reflect language use as it 
ought not to be; such a system might fill in “homemaker” for x in our example.17 For 
our purposes, morally relevant bias in a language model is sufficient for that model to 
be normatively incorrect, and in such cases these models reflect bias, as outlined in 
section 2, whereas normatively correct models or outputs do not (nor, consequently, 
do they perpetuate bias). A system might also be normatively correct in one regard 
(e.g., in not giving sexist outputs) yet incorrect in another (e.g., in giving racist 
outputs).18 

Additionally, normative correctness can have various goals. Not reflecting 
ethically problematic biases might be one normatively correct goal, whereas other 
goals might include promoting outcomes related to social justice. We acknowledge 
these various goals as being normative in a relevant sense. Even so, our primary focus 
will be on the aim of not reflecting bias.  

To see more clearly how descriptive and normative correctness relate to bias 
in NLP systems, let us consider the widespread NLP method of word-embeddings, 
which we will now describe.  
 
4. Word-Embeddings: Descriptively Right, Normatively Wrong 

Later, we will consider applications that rely on large language models. Yet, 
for now, in order to introduce the idea of bias in NLP models, we will outline the 
simpler case of word-embeddings.  

Word-embeddings are learned representations of words that aim to 
approximate the semantic relationships between words through mathematical 
relationships between vectors (Mikolov et al. 2013). A word-embeddings model 
learns from vast language corpora produced by us, which are therefore examples of 
how we use language. By definition, a good set of representations will accurately 
reflect how language is, as a matter of fact, used by the sampled people. In this sense, 
word-embeddings do a remarkably good job of getting things descriptively right, since 

 
17 Normatively incorrect outputs need not be descriptively accurate, although often 
they are (see section 6.3).  
18 Arguably, the best way to ensure that language use is as it should be and that 
language models trained on such language use are normatively correct would be to 
have the nonlinguistic part of the world that the language use aims to describe—e.g., 
the proportion of women and men who are computer programmers—be as it should 
be. After all, people associate “he” with programmers more than “she” because 
traditionally most programmers have tended to be men. Yet over time, if more 
programmers were women, that would be less true. Normative correctness would be 
descriptively accurate. 


Feminist Philosophy Quarterly, 2022, Vol. 8, Iss. 3/4, Article 1 

Published by Scholarship@Western, 2022  10 

they are usefully deployed for many purposes—for example, to successfully resolve 
analogy tasks. Yet there is a sense in which word-embeddings get things normatively 
wrong. 

In their paper, “Man Is to Computer Programmer as Woman Is to 
Homemaker,” researchers at Boston University and Microsoft Research (Bolukbasi et 
al. 2016) demonstrated that pretrained models such as word2vec seem sexist in 
certain ways. Just as the analogy relationships “Man is to woman as king is to queen” 
and “Sister is to woman as brother is to man” were captured by the models, so too 
were the sexist relationships “Man is to computer programmer as woman is to 
homemaker” and “Father is to doctor as mother is to nurse.”19  

The difficulty is that to work successfully, NLP systems relying on word-
embeddings often have to learn the biases that exist in the corpora on which they are 
trained (Caliskan, Bryson, and Narayanan 2017, 186). These biases are expressed by 
us in the corpora, and to learn the relationships that actually exist between words in 
our uses of language, the models must learn biased relationships—including sexist 
relationships. Biases in the corpora on which the models are trained will thus naturally 
be captured in the geometry of the word-embeddings vector space. Worse, according 
to Bolukbasi and colleagues’ assessment, “The blind application of machine learning 
runs the risk of amplifying [emphasis added] biases present in data” (Bolukbasi et al. 
2016).20 

Still worse, word-embeddings models often get things normatively wrong 
precisely because—and to the extent to which—they get things descriptively right 
about people’s language use. That is, the success of systems using word-embeddings 
can require reproducing bias, as when they solve analogy tasks like “Man is to 

 
19 For criticism of Bolukbasi et al. (2016), see Schluter (2018), Nissim, van Noord, and 
van der Goot (2019), and footnote 24 below.  
20 Bias is amplified in two ways. First, sexist bias (for example) in a language model is 
acquired from training data that include expressions of sexism from (p number of) 
individuals expressing the bias. This model may then be used for various purposes in 
the world such that its outputs reach many more people (p + n) than those people (p) 
who expressed the biases that influenced the training data. Mere repetition is widely 
known to increase people’s tendency to believe claims or to reproduce bias, including 
in written language (e.g., Lacassagne, Béna, and Corneille 2022). In this way, bias gets 
amplified in the population due to biased models. Second, this larger pool of 
individuals (p + n) is therefore more likely to reproduce examples of language 
containing bias than it would have been otherwise, and these examples may be used 
as input for training further models. Thus, the bias gets amplified to a wider range of 
models than it would have been otherwise; see Bolukbasi et al. (2016) for a similar 
example.  


Deery and Bailey – The Bias Dilemma: The Ethics of Algorithmic Bias in Natural-Language Processing 

Published by Scholarship@Western, 2022  11 

computer programmer as woman is to x” by filling in “homemaker” for x. A model 
gets things normatively wrong in producing this output, despite its being descriptively 
right in reflecting actual language use. And it gets things normatively wrong because 
and to the extent to which it gets things descriptively right in this way.21  

The ideal solution might be to debias ourselves. In that case, we would 
produce language corpora that do not contain biases, the NLP models would be 
trained on such data, and all would be well. Yet that is not a practical suggestion, much 
as it might be nice to imagine living in such a world. Or we might debias the models 
that have acquired biases. As mentioned earlier, it can be difficult, as cognitive science 
shows, to eliminate biases from human cognition. We think that it is no easier to 
debias NLP models, which raises a practical problem, which we outline in section 5, 
and an ethical dilemma, which we outline in section 6.  
 
5. Debiasing: Practical Problems  

Word-embeddings and large language models are trained using neural 
networks whose objective function is to minimize error on a particular task, often the 
ability to predict each word in a text given the surrounding words (see, e.g., Mikolov 
et al. 2013). That is the training task. When we consider downstream tasks, we are 
taking a model that has already been trained on a training task using a massive 
language corpus and fine-tuning it or using it as is on a new task. The new task could 
be text classification, which might then be used in a further application—for example, 
a content-management system that adds keywords and categories to content. 

This case is an example of transfer learning, where learning from one task on 
a massive corpus is transferred to another task on a different, smaller corpus. Often, 
the so-called pretrained model is made publicly available by those who trained it and 
can end up being used in countless downstream applications that the original 
developers of the model might never know about. That is what happened with 
word2vec, the word-embeddings model we mentioned above (Mikolov et al. 2013), 
and BERT, one of the first large language models to be made publicly available (Devlin 
et al. 2019). The focus on debiasing has tended to be on this initial stage of the training 
pipeline.22 The resulting models are subsequently released for use in various 
applications; if we debias these pretrained models, then the applications built 

 
21 We acknowledged in section 3 that such a model is descriptively inaccurate 
regarding these words’ semantics—there is nothing about the word “woman,” 
semantically, in virtue of which it clusters with “homemaker.” 
22 Here, word-embeddings models are the initial stage of the training pipeline and can 
subsequently be used in downstream applications (Mikolov et al. 2013). Blodgett et 
al. (2020) surveyed 146 papers on bias in NLP systems and 54 of these papers 
specifically dealt with bias in word-embeddings systems.  


Feminist Philosophy Quarterly, 2022, Vol. 8, Iss. 3/4, Article 1 

Published by Scholarship@Western, 2022  12 

downstream of them will also be debiased unless bias is introduced somehow later. 
Consequently, debiasing the models is seen as an attractive solution to many 
researchers. However, we think that researchers would do better to focus on 
debiasing the outputs of models rather the models themselves. In turn, we claim that 
given the serious practical difficulties with debiasing models, it behooves us to 
consider whether the biased models we are left with are always bad. We will argue 
that they are not (in, e.g., section 7 below). 

In their previously mentioned paper, Bolukbasi and colleagues (Bolukbasi et 
al. 2016) introduce a method they call “hard-debiasing,” which aims to reduce sexist 
bias within an embeddings vector space without compromising the overall structure 
of that space or, therefore, its usefulness. They claim that their method can 
“significantly reduce gender bias in embeddings while preserving [their] useful 
properties such as the ability to cluster related concepts and to solve analogy tasks” 
(Bolukbasi et al. 2016, 1).23, 24 

We doubt whether word-embeddings can be completely debiased using this 
method. What Bolukbasi and colleagues call hard-debiasing works largely by 
maintaining humans in the loop at multiple steps to identify and compensate for 
biases picked up by the model. This feature is partly attractive, since humans in the 
loop are frequently practically and ethically beneficial (especially when the stakes are 
high, as in medical applications, since the instance of harmful errors will thereby be 
lowered; arguably, humans must always be kept in the loop in such cases). Yet as 
applied to word-embeddings, the end product is less a well-oiled machine that works 
by itself and more an old TV that must be continually retuned for clear reception.25 
The retention of humans in the loop as a means of policing a system’s acquisition of 
bias also raises worries about implicit bias in human cognition. In particular, people 

 
23 There are also technical aspects to Bolukbasi and colleagues’ hard-debiasing 
method that we do not discuss.  
24 A word about analogy tasks: Nissim, van Noord, and van der Goot (2019) outline 
various problems with using analogy tasks for bias detection. At their worst, some 
algorithms ignore input vectors, so that “A is to B as C is to x” is often prevented from 
returning A, B, or C for x (Nissim, van Noord, and van der Goot 2019, 491–92; cf. 
Schluter 2018). Nissim and colleagues think using analogy tasks to detect bias in 
embeddings results in claims about biases in models where the evidence does not 
warrant it and may even conceal existing biases.  
25 A solution might be not to use such models. We are concerned with what to do 
given that they are used. 


Deery and Bailey – The Bias Dilemma: The Ethics of Algorithmic Bias in Natural-Language Processing 

Published by Scholarship@Western, 2022  13 

may unconsciously add biases even if they succeed in eliminating others (see Caliskan, 
Bryson, and Narayanan 2017).26 

Finally, it is unclear whether we can realistically remove all morally relevant 
biases from a word-embeddings model while preserving its usefulness. Any tweaks 
made to a pretrained word-embeddings model to increase its normative correctness 
in a particular regard will result in a less descriptively accurate model regarding actual 
language use, at least in cases where the models reflect bias. Many applications will 
be negatively affected by this reduced accuracy. For example, when pretrained 
models are used in sentiment analysis and text classification, which rely on 
descriptively accurate representations, these models seem a poor fit for those tasks. 

Such practical problems with debiasing suggest that a more effective way to 
control for the negative effects of bias in word-embeddings might instead be to see 
how bias manifests itself in a system’s outputs and control for bias there to reduce 
the negative effects.  

 
6. Applications 

Controlling for bias at the output stage means we can retain descriptive 
accuracy. Exactly how we decide to control for bias in this way will, of course, depend 
on the application in question. Let us therefore consider some applications. 
 
6.1. Translation 

Machine translation is the oldest NLP task and is often regarded as one of the 
field’s major successes. Deep-learning techniques similar to those employed in the 
training of large language models such as word2vec and BERT are employed in today’s 
most commonly used machine-translation systems, such as Google Translate. One 
difference is that these systems are trained end-to-end, meaning that there is no 
breaking up of the problem into separate steps whereby a language model is first 
trained on a separate training task and only later fine-tuned for the task of translation. 
Instead, the translation task is the training task (Wu et al. 2016).  

In spite of their successes, machine-translation systems still encounter trouble 
with pronouns. Douglas Hofstadter (2018) has written about the “shallowness” of 
machine translation, pointing to such systems’ inability to understand what they are 
translating. Many of Hofstadter’s examples are of mistranslating pronouns. The 
reason pronouns are so difficult is that to translate a pronoun correctly, a system must 

 
26 Bolukbasi et al. (2016) concede that their method is limited. Gonen and Goldberg 
(2019) argue that even the technical aspects of Bolukbasi and colleagues’ method 
serve primarily to mask the problem of gender bias without actually eliminating it. 
They conclude that, “While the bias is indeed substantially reduced …, the … effect is 
mostly hiding the bias, not removing it” (Gonen and Goldberg 2019, 609).  


Feminist Philosophy Quarterly, 2022, Vol. 8, Iss. 3/4, Article 1 

Published by Scholarship@Western, 2022  14 

know what it refers to. With gendered pronouns, the problem is clear. Some 
languages do not have gendered pronouns, and those that do can behave very 
differently—for example, with possessive pronouns in French, the gender agrees with 
the noun, not the possessor (as in English). Hofstadter illustrates the problem as 
follows: 

 
I began my explorations . . . using the following short remark . . . 
 

In their house, everything comes in pairs. There’s his car and 
her car, his towels and her towels, and his library and hers. 

 
. . . Here’s what Google Translate gave me [translating into French]: 

 
Dans leur maison, tout vient en paires. Il y a sa voiture et sa 
voiture, ses serviettes et ses serviettes, sa bibliothèque et les 
siennes. 

 
The program fell into my trap, not realizing, as any human reader 
would, that I was describing a couple, stressing that for each item he 
had, she had a similar one. . . . The deep-learning engine used the word 
sa for both “his car” and “her car,” so you can’t tell anything about 
either car owner’s gender. Likewise, it used the genderless plural ses 
both for “his towels” and “her towels,” and in the last case of the two 
libraries, his and hers, it got thrown by the final s in “hers” and 
somehow decided that that s represented a plural (“les siennes”). 
Google Translate’s French sentence missed the whole point. 
(Hofstadter 2018; bold emphasis added) 

 
Data-driven approaches seem unlikely to solve this sort of problem. Yet gendered 
pronouns are an obvious place where bias reveals itself in machine translation. For 
example, Turkish does not have gendered pronouns yet there are numerous examples 
of Turkish sentences translated into English by Google where a genderless pronoun 
becomes gendered according to whether the noun is a word for a stereotypically male 
or instead female profession: “he” for doctor, “she” for nurse, and so forth. Thus, “O 
bir doktor” will be translated as “He is a doctor,” even though the Turkish pronoun 
“O” is ungendered and can be translated as either “he” or “she.” In 2018, Google 
introduced gender-specific translations in order to avoid gender bias in Google 
Translate. As a result, when you enter “O bir doktor,” the system now offers a 


Deery and Bailey – The Bias Dilemma: The Ethics of Algorithmic Bias in Natural-Language Processing 

Published by Scholarship@Western, 2022  15 

choice—either “he” or “she”—and it lets the end user decide on the correct 
pronoun.27  

But when context is provided, Google Translate gets it wrong. If you are simply 
translating “O bir doktor,” there is no wrong answer, since there is no context 
provided; either “he” or “she” is acceptable. However, with a real piece of text, 
Google’s system reveals its bias. For example, it will provide the following (correct) 
translation from English into Turkish:  

 
Did you meet John’s brother? He is a nurse. 
John’un erkek kardeşiyle tanıştın mı? O bir hemşire. 

 
Yet when translating this Turkish sentence back into English, the output is: 

 
Did you meet John’s brother? She is a nurse. 

 
This result is a double fail: The translation is incorrect (even if this usage of “she” in 
relation to “nurse” remains descriptively accurate about people’s actual language 
use), and the output is biased. Moreover, it reveals the changes that Google 
implemented as relatively superficial and insubstantial. 
 
6.2. Assorted NLG Tasks  

Translation is an example of a natural-language-generation (NLG) task, where 
the output is a sentence or sentences in a natural language. Other examples of NLG 
tasks include speech recognition, image captioning, and dialogue systems. One naïve 
suggestion for systems that perform NLG tasks might be that they should never give 
biased outputs, such as sexist or racist outputs.  

When we look at particular applications, we can see why this suggestion is 
naïve. If a speech-recognition system receives as audio input the sentence “A proper 
wife should be as obedient as a slave,” the only appropriate output is text of these 
words.28 There is no scope for debiasing without undermining the task that the system 
is designed to perform. Descriptive accuracy trumps normative correctness in this 
case, since, as with a camera, the functions of a speech-recognition system require 
descriptive accuracy.  

 
27 We made this suggestion in a talk we gave at Google in 2018. Much as we would 
like to take credit for the change, it was no doubt something that the people at Google 
had already been discussing long beforehand.  
28 This phrase, “A proper wife should be as obedient as a slave,” often appears online 
as attributed to Aristotle. It seems to be a broad paraphrase of scattered parts of 
Aristotle’s Oeconomica, Book III, section 1, paragraph 2. 


Feminist Philosophy Quarterly, 2022, Vol. 8, Iss. 3/4, Article 1 

Published by Scholarship@Western, 2022  16 

In translation tasks, there may not be only one acceptable output, since for 
any phrase we input, there may be multiple acceptable ways of translating it into 
another language. Even so, there appears to be limited scope for debiasing or 
imposing normative correctness. The output must be an accurate translation of the 
sentence—for example, “A proper wife should be as obedient as a slave”—into the 
target language, however sexist or morally objectionable that sentence is. Again, we 
cannot impose normative correctness without undermining accuracy.  

With image captioning, we begin to see scope for imposing normative 
correctness. Clearly, there are numerous suitable captions that a system might 
suggest for any image. For a photo of a woman kitesurfing, a sexist person might 
suggest the caption, “A disobedient wife.” Yet we would hope that no image-
captioning system would ever do so. Thus, there seems to be a normative constraint 
on the output. Obviously, we do not want an image-captioning system to label a photo 
of a kitesurfing woman with “Solar system,” or “Proton,” since these captions are 
descriptively inaccurate. Assuming the aim is descriptive accuracy regarding the 
image’s content, the caption must accurately describe what is in the image. Because 
images are typically more informationally rich than linguistic descriptions of them, 
most captions (even if they are partly descriptively accurate) will only be minimally 
accurate. As a result, “Woman sailing” might be minimally accurate, while “Woman 
kitesurfing” might be better. But “Woman kitesurfing on a Slingshot kiteboard in light 
seas and moderate winds” might be too accurate for most contexts. In any case, 
“Disobedient wife” is not an accurate description in virtue merely of the informational 
content of the image. This caption might be descriptively accurate of the situation 
being depicted, were we to learn (for example) that this kitesurfing woman in fact 
disobeyed her sexist husband’s command never to kitesurf, since the husband 
believes wives—being women—should not kitesurf. Once we understand the caption 
“Disobedient wife” in a pejorative sense—as implying something like, “This woman, 
in virtue of kitesurfing, is behaving as no woman should behave”—we can see it would 
be normatively incorrect to allow the caption. Most image-captioning systems would 
be unlikely to provide this caption in any case, since they would not have encountered 
it in their training data. Beyond these considerations, we also want a system to be 
normatively correct in not labeling the image with “A disobedient wife” even if the 
system had been exposed to this label in its training, since the label is sexist—that is, 
normatively incorrect. 

Finally, consider dialogue systems. Here, there can be appropriate and 
inappropriate ways to respond to a question, even descriptively. In answer to the 
question, “What is the weather like in Sydney?” a system can reply “Sunny” or 
“Beautiful,” but presumably not “Subatomic” or “Bread.” Yet if someone asked, “How 
obedient should a proper wife be?” the system should never answer, “A proper wife 


Deery and Bailey – The Bias Dilemma: The Ethics of Algorithmic Bias in Natural-Language Processing 

Published by Scholarship@Western, 2022  17 

should be as obedient as a slave.”29 What is an acceptable output in the case of speech 
recognition should never be output by a dialogue system. Here, we reach maximal 
scope for imposing normative correctness on a system’s outputs.  

The upshot is, first, that it can be preferable—not only practically but ethically 
(we discuss why this is an ethical issue in sections 7 and 8)—to control for biases in 
ML models and systems at the output stage rather than to try to debias a model and 
expect to deploy it for downstream tasks in ways that will not reproduce bias. Second, 
even when we focus on controlling for biases at the output stage, NLG applications 
must be treated differently, including from an ethical perspective. In some cases we 
can reduce bias; in others we cannot. 
 
6.3. Search  

Search is a different application. In considering sources of bias in ML systems, 
Harini Suresh and John Guttag (2020) discuss two that are relevant to both search and 
our purposes in this paper—that is, historical and representational bias (see also 
Fazelpour and Danks 2021). The latter bias results from imperfect measuring or 
sampling. By contrast, according to Suresh and Guttag (2020, 4), “historical bias arises 
even if the data is perfectly measured and sampled, if the world as it is or was leads 
a model to produce outcomes that are not wanted.” They provide an example of this 
bias: 
 

In 2018, 5% of Fortune 500 CEOs were women . . . . Should image 
search results for “CEO” reflect that number? Ultimately, a variety of 
stakeholders, including affected members of society, should evaluate 
the particular harms that this result could cause and make a judgment. 
This decision may be at odds with the available data even if that data 
is a perfect reflection of the world. Indeed, Google has recently 
changed their Image Search results for “CEO” to display a higher 
proportion of women.” (Suresh and Guttag 2020, 5)  

 
In our terms, Suresh and Guttag claim that descriptive accuracy (“the world as it is or 
was”) conflicts here with normative correctness (i.e., we get “outcomes that are not 
wanted”). That is to say, imposing normative correctness by displaying a higher 
proportion of women in the image search results may be descriptively at odds with 

 
29 If we asked, “How obedient does Aristotle say a proper wife should be?” the 
appropriate answer might be “Aristotle says that a proper wife should be as obedient 
as a slave.” Yet when asking the system itself, it seems clear that imposing normative 
correctness by forbidding such a response is justified. 


Feminist Philosophy Quarterly, 2022, Vol. 8, Iss. 3/4, Article 1 

Published by Scholarship@Western, 2022  18 

the data even if the data are “a perfect reflection of the world.”30 By altering the 
results, Google would seem to sacrifice descriptive accuracy in favor of normative 
correctness, by glossing over the actual disparity between women and men who are 
CEOs. This inaccuracy highlights an ethical problem. To possess such descriptively 
accurate information is often to acquire ethically useful information—in this case, the 
information might be used as evidence for the claim that there is indeed a disparity 
between women and men CEOs, perhaps in the course of redressing the disparity. 
However, once Google alters the results as described, one effect is that we lack access 
to this descriptively accurate and ethically useful information and so injustices of this 
sort might go underrecognized.  

Moreover, it is implicit in Suresh and Guttag’s discussion of their example that 
there is a value both to descriptive accuracy and to normative correctness, although 
Suresh and Guttag are unclear about whether they think it can be more valuable to 
prioritize one form of accuracy or correctness over the other. After all, while some 
ethical purposes might best be served by Google’s decision to change its results for 
“CEO” to display more images of women (thereby making the results more 
normatively correct), other ethical purposes may be directly undermined and would 
be better served by descriptively accurate results; so, Google’s strategy may be 
ethically undesirable in other respects. That is not to say Google made the wrong 
decision. It is only to say that there might be a cost to prioritizing either descriptive 
accuracy or normative correctness in search, as with NLP systems relying on word-
embeddings.  

In fact, we have reservations about whether our ethical dilemma applies in 
Suresh and Guttag’s CEO case, or in search more widely. That is because it is unlikely 
that Google Search (or search engines generally) actually provide us with results that 
are descriptively accurate in the first place. For example, Safiya Umoja Noble (2018) 
outlines how Google’s search results can be racist in ways that are driven by internal 
factors, such as an agenda built into Google’s algorithms, or by external factors, such 
as how savvy users can deliberately influence results. 

To the extent that search results are not descriptively accurate, our ethical 
dilemma does not arise (as we explain in detail in section 7). We confront the dilemma 
only when we have descriptively accurate outputs that are normatively incorrect, and 
the incorrectness is due to the accuracy. In such cases, the dilemma consists in our 
having to decide whether to prioritize descriptive accuracy (including its ethical 

 
30 In another widely discussed case, The Guardian reported in 2016 how Twitter user 
Kabir Alli conducted a Google image search using the phrases “three black teenagers” 
and “three white teenagers” (Allen 2016). Results were very different. The search for 
“three black teenagers” returned mostly police mugshots while for “three white 
teenagers” it gave stock photographs of smiling white youths. 


Deery and Bailey – The Bias Dilemma: The Ethics of Algorithmic Bias in Natural-Language Processing 

Published by Scholarship@Western, 2022  19 

usefulness) at the expense of normative correctness or vice versa. Even if search 
results are normatively incorrect (as in Suresh and Guttag’s case) unless they are also 
descriptively accurate, there is no dilemma (as we explain in sections 7 and 8 below); 
we must simply address the matter by trying to attain normative correctness.31  

Even so, in an idealized case where search results did reflect descriptive 
accuracy and were normatively incorrect, our dilemma would frequently arise. After 
all, descriptively accurate information can—as in the case of word-embeddings 
models, for instance—be ethically useful in studying biases (as expressed in 
language), how they are perpetuated, and how to address them. 
 
7. The Ethical Dilemma  

It can sometimes be ethically valuable not to debias NLP models or their 
outputs. For one thing, not debiasing enables us to uncover biases and can thereby 
prevent us from thinking that all is right with the world (in some regard) when it is 
not. However, a lot depends on the application under consideration. With speech 
recognition, there is no room for imposing normative correctness, and doing so might 
even be ethically problematic. Imagine we are listening to Donald Trump making a 
speech on TV and our speaker is not working. Instead, we are relying on a live-
captioning feed. We would be poorly served by this system were it to impose 
normative correctness by producing an eloquent discourse on feminism. We need to 
know what Trump is actually saying—unpalatable as it might be—in order to hold him 
accountable.  

At the other end of the spectrum is search. If search results do not reflect 
descriptive accuracy (see, e.g., Noble 2018), the ethical dilemma that we outline does 
not arise. If search results were somehow to reflect descriptive accuracy but were 
normatively incorrect, we might confront the dilemma. We would then be faced with 
two options: either prioritize descriptive accuracy over normative correctness, which 
has the potential cost of perpetuating or amplifying bias, or instead prioritize 
normative correctness, with the potential cost of withholding ethically useful 
information even if we thereby avoid perpetuating or amplifying bias.  

Recall that descriptive accuracy, as we use this term, means sufficient 
descriptive accuracy in a relevant regard (all models are inaccurate in some—and 
presumably many—regards, even with respect to a target phenomenon). Without 
descriptive accuracy we cannot proceed usefully and there is no dilemma. Call a 
situation Dilemma-land when we have a descriptively accurate model that, in virtue 
of its descriptive accuracy, gives normatively incorrect (i.e., biased) outputs. By 
contrast, call a situation Utopia-land when we have a descriptively accurate model 

 
31 Alternatively, we could strive for descriptive accuracy, in which case our dilemma 
might apply after all. 


Feminist Philosophy Quarterly, 2022, Vol. 8, Iss. 3/4, Article 1 

Published by Scholarship@Western, 2022  20 

that gives normatively correct outputs. There is no way of getting to Utopia-land until 
people no longer express biases (e.g., sexism or racism) in the language corpora on 
which we train models.  

What if we have descriptively inaccurate models? Perhaps, in that case, we 
only need normative correctness. However, an inaccurate model may fail to 
sufficiently perform the task it is meant to perform, since inaccuracy often 
undermines usefulness. Achieving normative correctness without descriptive 
accuracy would, in any case, put us in Fantasy-land; if a model exhibits normative 
correctness but is descriptively inaccurate, it is out of touch with reality and is thus a 
fantasy (although sometimes, as we acknowledge below, we may have reason to 
prefer such fantasies to descriptive accuracy—for example, normatively correct 
outputs may serve to work against amplification of bias in some cases, even if the 
model is descriptively inaccurate).  

Achieving normative incorrectness and descriptive inaccuracy puts us instead 
in Disaster-land. Although it may be possible to get from Fantasy-land or Disaster-land 
to Utopia-land, it is highly unlikely. (We leave confirmation to our readers.) More 
likely is that we simply cycle back and forth between Fantasy-land and Disaster-land 
by achieving normative correctness/incorrectness in our model yet without 
descriptive accuracy. Either way, until we have descriptive accuracy, we cannot get to 
either Dilemma-land or Utopia-land; and (as indicated above) Dilemma-land remains 
a more likely destination than Utopia-land anyhow. 

Only in Dilemma-land does the dilemma arise, since for it to arise a model 
must exhibit descriptive accuracy and, in virtue of this descriptive accuracy, produce 
normatively incorrect outputs (if it gives normatively incorrect outputs resulting from 
something else, that is a different matter that must be addressed on its own terms). 
In Dilemma-land, we must decide whether to prioritize descriptive accuracy or instead 
aim for normative correctness by debiasing. Debiasing may undermine the usefulness 
of the model and might undermine descriptive accuracy and thus withhold ethically 
useful information. Yet prioritizing descriptive accuracy runs the risk of perpetuating 
or amplifying the relevant biases. If a model exhibits descriptive inaccuracy, there is 
simply no dilemma to confront given that we would never want to prioritize 
descriptive inaccuracy over either normative correctness or incorrectness. 

Our aim is to draw attention to this dilemma. Language models and various 
applications (even potentially including search) that rely on them can be used for 
different purposes. Sometimes these purposes are best served by having the model 
reflect descriptive accuracy even when that entails reflecting bias. Those purposes 
will, moreover, often be served precisely because the biased results provide us with 
ethically useful information. However, we do run the risk of amplifying bias. In other 
cases, an application might be used for a different purpose, such that it is more 


Deery and Bailey – The Bias Dilemma: The Ethics of Algorithmic Bias in Natural-Language Processing 

Published by Scholarship@Western, 2022  21 

important to avoid amplifying bias than to retain ethically useful information. Then, 
we may prefer normatively, not descriptively, correct results.  

Either way, there is a cost. We also maintain that there is sometimes an ethical 
value to the less intuitive horn of the dilemma, on which we retain ethically useful 
information by not imposing normative correctness. Even so, we recognize that in 
doing so we can amplify bias.  
 
8. Rethinking the Debate about Bias in NLP 

Our way of distinguishing normative correctness and descriptive accuracy 
adds nuance to recent debates about algorithmic bias in NLP and can help us to 
understand these debates better.  

For instance, as outlined above, Suresh and Guttag’s (2020) historical bias can 
be understood as arising when we have descriptively accurate outputs (regarding 
people’s use of language) about how the world is yet these outputs are normatively 
incorrect. Thus, our distinction can help to clarify the points of contention among 
these debates’ participants and the implications of prioritizing one form of accuracy 
over another. Once we acknowledge the ethical dilemma that arises when descriptive 
and normative accuracy conflict in the way we have outlined, we can see clearly the 
ethical costs of prioritizing one form of accuracy over the other. 

To take another example, Emily Bender, Timnit Gebru, and colleagues argue 
that large language models like BERT or GPT-2/3 not only reproduce bias but do not 
reflect “the world as it is” (Suresh and Guttag 2020, 4), since the corpora on which 
they are trained do not reflect marginalized voices (Bender, Gebru et al. 2021).32 Thus, 
one cannot argue on the basis of a model’s descriptive accuracy for not debiasing it if 
it is not descriptively accurate. 

Part of the problem is that while “user-generated content sites like Reddit, 
Twitter, and Wikipedia present themselves as open and accessible to anyone, there 
are structural factors including moderation practices which make them less 
welcoming to marginalized populations” (Bender, Gebru et al. 2021, 613), resulting in 
marginalized voices’ being underrepresented in the models. Simultaneously, “white 
supremacist and misogynistic, ageist” voices tend to be “overrepresented in the 
training data” (613) due to how the data are filtered.  

Bender/Gebru and colleagues’ description of this problem can be usefully 
framed in terms of normative and descriptive correctness. Large language models, 
Bender/Gebru and colleagues appear to say, are not only normatively incorrect but 
also descriptively inaccurate in important ways, and they are normatively incorrect 
partly because they are descriptively inaccurate.  

 
32 Bender and Gebru are listed as joint lead authors on their 2021 paper. We have 
slightly modified citation format for that paper to reflect this. 


Feminist Philosophy Quarterly, 2022, Vol. 8, Iss. 3/4, Article 1 

Published by Scholarship@Western, 2022  22 

Even criticisms of Bender/Gebru and colleagues’ claims can be usefully framed 
in terms of normative correctness and descriptive accuracy. For instance, Yoav 
Goldberg (2021) claims that Bender/Gebru and colleagues “suggest that good (= not 
dangerous) language models are . . . models which reflect the world as they think the 
world should be,” yet “an alternative view by which language models should reflect 
language as it is being used in a training corpus is at least as valid, and should be 
acknowledged.” Goldberg picks up here on an ambiguity in how “good” (or its 
cognates) can be used in this context—that is, as picking out normative correctness 
(“not dangerous”) or descriptive accuracy (“language as it is being used”). In our 
terms, Goldberg claims that Bender/Gebru and colleagues fail adequately to 
acknowledge their preference for prioritizing the normative over the descriptive, and 
in a particular way. Indeed, Goldberg thinks that Bender/Gebru and colleagues 
prioritize normative correctness in a way that can result in our adopting certain views 
on ethical issues even when there is legitimate debate about whether they are the 
right views to adopt. As Goldberg (2021) puts it, the fact that there is ongoing debate 
on such issues “should be made explicit” by Bender/Gebru and colleagues. Goldberg 
also thinks there can be practical and ethical reasons for prioritizing descriptive 
accuracy over normative correctness, contra what he takes Bender/Gebru and 
colleagues to implicitly maintain. 

A central target of Bender/Gebru and colleagues’ concern seems not to be 
what Suresh and Guttag call historical bias but rather what they call representation 
bias, which occurs “when certain parts of the input space are underrepresented” 
(Suresh and Guttag 2020, 5). For example, with language models, “datasets collected 
through smartphone apps can under-represent lower-income or older groups, who 
are less likely to own smartphones” (Suresh and Guttag 2020, 5). This form of bias is 
one of the central focuses of Bender/Gebru and colleagues’ attention, rather than 
historical bias, given that a central claim of theirs is that often the relevant data have 
not been perfectly measured and sampled. By contrast, historical bias arises even 
when the data have been “perfectly measured and sampled” and “the world as it is 
or was leads a model to produce outcomes that are not wanted” (Suresh and Guttag 
2020, 4). 

Our normative/descriptive distinction again helps to clarify the points of 
disagreement and agreement between Bender/Gebru and colleagues and their critics 
such as Goldberg, while additionally clarifying how language models can involve 
representation bias in Suresh and Guttag’s sense. Bender/Gebru and colleagues’ claim 
is that large language models are descriptively inaccurate relative to the wider 
population of language users, as a result of their being built around an imperfect 
measuring and sampling of data. Such models can be normatively incorrect in ways 


Deery and Bailey – The Bias Dilemma: The Ethics of Algorithmic Bias in Natural-Language Processing 

Published by Scholarship@Western, 2022  23 

that result from such inaccuracy.33 Thus, Suresh and Guttag’s representation bias is 
one way in which a model can be descriptively inaccurate. When such inaccuracy is 
distinguished from normative incorrectness, the points of disagreement among 
Goldberg and Bender/Gebru and colleagues become clearer.  

It might be suggested that Bender/Gebru and colleagues underestimate the 
extent to which language models like GTP-2/3 are descriptively accurate. After all, 
even if they are trained on datasets that come in significant part from platforms from 
which marginalized voices have been excluded, and so the models are descriptively 
inaccurate at least to that extent, the models might still be descriptively accurate 
about how language is used on those platforms. 

Bender/Gebru and colleagues’ response might be that the aim of descriptive 
accuracy is to capture patterns that exist in the entire population (or most of it), not 
simply a subpopulation. For large language models, the aim (in that case) would be 
that models accurately reflect how humans in general, including those in marginalized 
groups, use language. If we take language from only a specific subpopulation, we 
cannot make claims about descriptive accuracy (except, perhaps, regarding that 
specific subpopulation’s use of language). The most obvious bias here would seem to 
be representational bias, given that the model is only representative of a specific 
subgroup while ignoring others. Even practically, the model’s usefulness for 
downstream tasks on texts written by other language users will thus be limited. 

Yet suppose we trained a hypothetical model on all of the language ever 
produced by people across all groups.34 We might then have descriptive accuracy. 
Even so, bias might remain—not representational bias but historical bias. In our 
terms, we would have descriptive accuracy yet normative incorrectness (and the 
normative incorrectness would be due to the descriptive accuracy). For example, we 
would still have the problem of gendered pronouns in translation. If people speak as 
though there are more female than male nurses, the pronoun “she” will still show up 
more frequently near “nurse” than “he” in the relevant vector space, even across the 
enormous dataset of all the language ever produced by all groups. Likewise, if people 
speak as though there are more male than female programmers, “he” will show up 
more frequently near “computer programmer” than “she.” This bias comes from 
descriptive, historical facts about language use, not from how the dataset was built.  

 
33 Bender, Gebru et al. (2021) thereby seem to be saying that normative incorrectness 
can result from descriptive inaccuracy. We do not deny this, although our focus is 
instead on how descriptive accuracy can conflict with normative accuracy—i.e., how 
normative incorrectness can result from descriptive accuracy. 
34 Falbo and LaCroix (2022) note that minority groups often “code-switch” by 
conforming to majority norms of expression; thus, the content they do generate might 
still not reflect marginalized voices. 


Feminist Philosophy Quarterly, 2022, Vol. 8, Iss. 3/4, Article 1 

Published by Scholarship@Western, 2022  24 

The distinction between representation bias and historical bias is also 
important when we consider whether there can be a value to not debiasing models. 
Historical bias often indicates something interesting about the world, whereas 
representational bias might—in the worst cases—only indicate lazy data-gathering. 
The more descriptively accurate a model is, the more useful it will be for downstream 
tasks and the more accurately it will reflect historical (and current) biases in language 
use. The choice to debias must start with a descriptively accurate model. In that case, 
we encounter the dilemma: Either we debias the accurate model and avoid amplifying 
bias yet lose accuracy and ethically useful information, or we do not debias and we 
use the model to learn about biases, for example, but risk amplifying bias. 
 
9. Conclusion 

What should we do about bias in NLP systems? It depends, of course, on the 
source of such bias. But even for bias from properly sampled data, it also depends on 
what we want from such systems. Do we want them to be more like cameras or more 
like people? We expect cameras to get things descriptively right in certain relevant 
regards, but not normatively right. Yet we expect people to get things normatively 
right, even if they get things descriptively wrong. For example, we do not excuse 
someone for making a sexist comment even if their making that comment is a 
descriptively accurate reflection of how language is actually used in their language 
community—we want the person to get things normatively right, not descriptively 
right. 

Should we expect NLP systems to get things normatively right, even if they get 
things descriptively wrong? Such systems could work like the iBlur camera that we 
described in section 3, or like the speech-filter in the TV show The Good Place, where 
Eleanor says things like “Holy motherforking shirtballs!” despite presumably trying to 
say something else. This filter “translates out” things that we do not want to hear or 
are not permitted to hear. Alternatively, is it enough for NLP systems to be like 
cameras in getting things descriptively right, so that they reflect the actual world and 
let us decide what to do about it in particular contexts? 

We maintain that there can be a value to having NLP systems function more 
like cameras. For one thing, it can provide ethically useful information, which we may 
even have an ethical obligation to provide. Then we face a dilemma, since allowing a 
system to get things descriptively right by not debiasing it can perpetuate or amplify 
bias. We think that deciding which way to go can only be done on a case-by-case basis, 
by weighing the ethical costs and benefits.  

Our primary aim has been to draw attention to the dilemma, which arises once 
we acknowledge that there can often be a value, however counterintuitive, to not 
debiasing. We have also argued that it is frequently best to control for the negative 
effects of bias at the output stage rather than by debiasing models themselves. Yet 


Deery and Bailey – The Bias Dilemma: The Ethics of Algorithmic Bias in Natural-Language Processing 

Published by Scholarship@Western, 2022  25 

even then, we cannot easily avoid the dilemma, since we must still decide whether 
normative correctness should overrule descriptive accuracy in cases where 
descriptive accuracy provides us with ethically useful information. 

Finally, our distinction between normative correctness and descriptive 
accuracy provides help in better understanding recent debates about algorithmic bias 
in natural-language processing. In particular, the question of whether to debias helps 
us to better understand these debates by bringing the consequences of adopting one 
view over another into clearer focus. 

 
References 
Allen, Antoine. 2016. “The ‘Three Black Teenagers’ Search Shows It Is Society, Not 

Google, That Is Racist.” Guardian, June 10, 2016, https://www.theguardian 
.com/commentisfree/2016/jun/10/three-black-teenagers-google-racist-
tweet. 

Barber, Anjuli L. A., Daniel Mills S., Fernando Montealegre-Z, Victoria F. Ratcliffe, Kun 
Guo, and Anna Wilkinson. 2020. “Functional Performance of the Visual System 
in Dogs and Humans: A Comparative Perspective.” Comparative Cognition & 
Behavior Reviews 15:1–44. https://doi.org/10.3819/CCBR.2020.150002. 

Basu, Rima. 2020. “The Specter of Normative Conflict? Does Fairness Require 
Inaccuracy.” In An Introduction to Implicit Bias: Knowledge, Justice, and the 
Social Mind, edited by Erin Beeghly and Alex Madva, 191–210. New York: 
Routledge. 

Bender, Emily M., Timnit Gebru, Angelina McMillan-Major, and Shmargaret 
Shmitchell. 2021. “On the Dangers of Stochastic Parrots: Can Language Models 
Be Too Big?” In FaccT ’21: Proceedings of the 2021 ACM Conference on 
Fairness, Accountability, and Transparency, 610–23. New York: Association for 
Computing Machinery. https://doi.org/10.1145/3442188.3445922. 

Benjamin, Ruha. 2019. Race After Technology: Abolitionist Tools for the New Jim Code. 
Medford, MA: Polity Press. 

Blodgett, Su Lin, Solon Barocas, Hal Daumé III, and Hanna Wallach. 2020. “Language 
(Technology) Is Power: A Critical Survey of ‘Bias’ in NLP.” In Proceedings of the 
58th Annual Meeting of the Association for Computational Linguistics, edited 
by Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel Tetreault, 5454–76. 
Stroudsburg, PA: Association for Computational Linguistics. https://doi.org/10 
.18653/v1/2020.acl-main.485. 

Bolukbasi, Tolga, Kai-Wei Chang, James Zou, Venkatesh Saligrama, and Adam Kalai. 
2016. “Man Is to Computer Programmer as Woman Is to Homemaker? 
Debiasing Word Embeddings.” arXiv preprint, arXiv:1607.06520 [cs.CL]. 
https://doi.org/10.48550/arXiv.1607.06520. 


Feminist Philosophy Quarterly, 2022, Vol. 8, Iss. 3/4, Article 1 

Published by Scholarship@Western, 2022  26 

Box, George E. P. 1976. “Science and Statistics.” Journal of the American Statistical 
Association 71 (356): 791–99. 

Brownstein, Michael. 2018. The Implicit Mind: Cognitive Architecture, the Self, and 
Ethics. New York: Oxford University Press. 

Brownstein, Michael, and Jennifer Saul, eds. 2016a. Implicit Bias and Philosophy: 
Volume I, Metaphysics and Epistemology. Oxford: Oxford University Press. 

———, eds. 2016b. Implicit Bias and Philosophy: Volume 2, Moral Responsibility, 
Structural Injustice, and Ethics. Oxford: Oxford University Press.  

Caliskan, Aylin, Joanna J. Bryson, and Arvind Narayanan. 2017. “Semantics Derived 
Automatically from Language Corpora Contain Human-Like Biases.” Science 
356, no. 6334 (April 14): 183–86. https://doi.org/10.1126/science.aal4230. 

Deery, Oisín. 2021. Naturally Free Action. Oxford: Oxford University Press.  
Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. “BERT: Pre-

Training of Deep Bidirectional Transformers for Language Understanding.” In 
Proceedings of the 2019 Conference of the North American Chapter of the 
Association for Computational Linguistics: Human Language Technologies, 
Volume 1, edited by Jill Burstein, Christy Doran, and Thamar Solorio, 4171–86. 
Stroudsburg, PA: Association for Computational Linguistics. https://doi.org/10 
.18653/v1/N19-1423. 

Falbo, Arianna, and Travis LaCroix. 2022. “Est-ce que Vous Compute? Code-Switching, 
Cultural Identity, and AI.” Feminist Philosophy Quarterly 8 (3/4): article 9. 

Fazelpour, Sina, and David Danks. 2021. “Algorithmic Bias: Senses, Sources, 
Solutions.” Philosophy Compass 16, no. 8 (August): e12760. https://doi.org/10 
.1111/phc3.12760. 

Gendler, Tamar Szabó. 2011. “On the Epistemic Costs of Implicit Bias.” Philosophical 
Studies 156, no. 1 (October): 33–63. 

Goldberg, Yoav. 2021. “A Criticism of ‘On the Dangers of Stochastic Parrots: Can 
Language Models Be Too Big?’” GitHub post, January 23, 2021. https://gist.git 
hub.com/yoavg/9fc9be2f98b47c189a513573d902fb27. 

Goldin, Claudia, and Cecelia Rouse. 2000. “Orchestrating Impartiality: The Impact of 
‘Blind’ Auditions on Female Musicians.” American Economic Review 90, no. 4 
(September): 715–41. https://doi.org/10.1257/aer.90.4.715. 

Gonen, Hila, and Yoav Goldberg. 2019. “Lipstick on a Pig: Debiasing Methods Cover 
Up Systematic Gender Biases in Word Embeddings but Do Not Remove Them.” 
In Proceedings of the 2019 Conference of the North American Chapter of the 
Association for Computational Linguistics: Human Language Technologies, 
Volume 1, edited by Jill Burstein, Christy Doran, and Thamar Solorio, 609–14. 
Stroudsburg, PA: Association for Computational Linguistics. https://doi.org/10 
.18653/v1/N19-1061. 


Deery and Bailey – The Bias Dilemma: The Ethics of Algorithmic Bias in Natural-Language Processing 

Published by Scholarship@Western, 2022  27 

Hitchcock, Christopher. 2004. “Introduction: What Is the Philosophy of Science?” In 
Contemporary Debates in the Philosophy of Science, edited by Christopher 
Hitchcock, 1–19. Malden, MA: Blackwell.  

Hofstadter, Douglas. 2018. “The Shallowness of Google Translate.” Atlantic, January 
30, 2018. https://www.theatlantic.com/technology/archive/2018/01/the-
shallowness-of-google-translate/551570/.  

Hu, Lily, and Yiling Chen. 2017. “Fairness at Equilibrium in the Labor Market.” arXiv 
preprint, arXiv:1707.01590 [cs.GT]. https://doi.org/10.48550/arXiv.1707 
.01590. 

Kehl, Danielle, Priscilla Guo, and Samuel Kessler. 2017. “Algorithms in the Criminal 
Justice System: Assessing the Use of Risk Assessments in Sentencing.” 
Responsive Communities Initiative/ Berkman Klein Center for Internet & 
Society, Harvard Law School. http://nrs.harvard.edu/urn-3:HUL.InstRepos: 
33746041.  

Lacassagne, Doris, Jérémy Béna, and Olivier Corneille. 2022. “Is Earth a Perfect 
Square? Repetition Increases the Perceived Truth of Highly Implausible 
Statements.” Cognition 223 (June): 105052. https://doi.org/10.1016/j.cogni 
tion.2022.105052. 

Levy, Neil. 2017. “Implicit Bias and Moral Responsibility: Probing the Data.” 
Philosophy and Phenomenological Research 94, no. 1 (January): 3–26. 
https://doi.org/10.1111/phpr.12352. 

Liao, Shen-yi, and Bryce Huebner. “Oppressive Things.” Philosophy and 
Phenomenological Research 103, no. 1 (July): 92–113. https://doi.org/10.1111 
/phpr.12701. 

Mikolov, Tomas, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. “Efficient Estimation 
of Word Representations in Vector Space.” arXiv preprint, arXiv: 1301.3781 
[cs.CL]. https://doi.org/10.48550/arXiv.1301.3781. 

Nissim, Malvina, Rik van Noord, and Rob van der Goot. 2019. “Fair Is Better than 
Sensational: Man Is to Doctor as Woman Is to Doctor.” arXiv preprint, 
arXiv:1905.09866v2 [cs.CL]. https://doi.org/10.48550/arXiv.1905.09866. 

Noble, Safiya Umoja. 2018. Algorithms of Oppression: How Search Engines Reinforce 
Racism. New York: New York University Press. 

Puddifoot, Katherine. 2017. “Dissolving the Epistemic/Ethical Dilemma over Implicit 
Bias.” Philosophical Explorations 20 (Supplement 1): S73–S93. https://doi.org 
/10.1080/13869795.2017.1287295. 

Roth, Lorna. 2009. “Looking at Shirley, the Ultimate Norm: Colour Balance, Image 
Technologies, and Cognitive Equity.” Canadian Journal of Communication 34, 
no. 1 (March): 111–36. https://doi.org/10.22230/cjc.2009v34n1a2196. 

Schluter, Natalie. 2018. “The Word Analogy Testing Caveat.” Proceedings of the 2018 
Conference of the North American Chapter of the Association for Compu-


Feminist Philosophy Quarterly, 2022, Vol. 8, Iss. 3/4, Article 1 

Published by Scholarship@Western, 2022  28 

tational Linguistics: Human Language Technologies, Volume 2, edited by 
Marilyn Walker, Heng Ji, and Amanda Stent, 242–46. Stroudsburg, PA: 
Association for Computational Linguistics. https://doi.org/10.18653/v1/N18-
2039. 

Smith, Angela M. 2008. “Control, Responsibility, and Moral Assessment.” 
Philosophical Studies 138, no. 3 (April): 367–92. https://doi.org/10.1007/s11 
098-006-9048-x. 

Smith, David. 2013. “‘Racism’ of Early Colour Photography Explored in Art Exhibition.” 
Guardian, January 25, 2013. http://www.theguardian.com/artanddesign/20 
13/jan/25/racism-colour-photography-exhibition.  

Suresh, Harini, and John V. Guttag. 2020. “A Framework for Understanding 
Unintended Consequences of Machine Learning.” arXiv preprint, February 17, 
2020, revision, arXiv:1901.10002v3 [cs.LG]. https://doi.org/10.48550/arXiv 
.1901.10002.  

Wu, Yonghui, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, 
Wolfgang Macherey, Maxim Krikun, et al. 2016. “Google’s Neural Machine 
Translation System: Bridging the Gap between Human and Machine 
Translation.” arXiv preprint, arXiv:1609.08144 [cs.CL]. https://doi.org/10.485 
50/arXiv.1609.08144. 

 
Oisín Deery is an assistant professor of philosophy at York University, Toronto, 
Canada, and a lecturer and ARC DECRA Fellow at Macquarie University in Sydney, 
Australia. He has previously held positions at Monash University, Florida State 
University, the University of Arizona, and the University of Montreal. Oisín’s work 
straddles the philosophy of mind and action, moral psychology, and metaphysics. His 
work has focused primarily on developing a naturalistic understanding of human 
agency and responsibility, but recently he has also been working on issues related to 
artificial intelligence and agency. Oisín has published widely in journals such as 
Philosophical Studies, the Australasian Journal of Philosophy, and Philosophical 
Psychology. He is the author of a monograph entitled Naturally Free Action, published 
by Oxford University Press in 2021. 
 
Katherine Bailey has worked in a number of roles as a data scientist and software 
engineer. She is currently employed by Shopify Inc., in Toronto, Canada. Katherine has 
a longstanding interest in issues related to artificial intelligence and natural language 
processing. She has written on these issues in blog posts and in publications such as 
TechCrunch and has been invited to speak at various venues, including Google and the 
North American Association for Computational Linguistics.  


	14292 Deery and Bailey title page
	14292 Deery & Bailey final format