INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL
Online ISSN 1841-9844, ISSN-L 1841-9836, Volume: 16, Issue: 2, Month: April, Year: 2021
Article Number: 4207, https://doi.org/10.15837/ijccc.2021.2.4207

CCC Publications 

An Ensemble Machine Learning Approach to Understanding the
Effect of a Global Pandemic on Twitter Users’ Attitudes

B. Jia, D. Dzitac, S. Shrestha, K. Turdaliev, N. Seidaliev

Bokang Jia, Domnica Dzitac*, Samridha Shrestha, Komiljon Turdaliev, Nurgazy Seidaliev
Department of Computer Science
New York University Abu Dhabi, UAE
Saadiyat Island 129188, Abu Dhabi, UAE
*Corresponding author: domnica.dzitac@nyu.edu
Email: bj798@nyu.edu, sms1198@nyu.edu, kt1673@nyu.edu, nks369@nyu.edu

Abstract

It is thought that the COVID-19 outbreak has significantly fuelled racism and discrimination,
especially towards Asian individuals[10]. In order to test this hypothesis, in this paper, we build
upon existing work in order to classify racist tweets before and after COVID-19 was declared a global
pandemic. To overcome the difficult linguistic and unbalanced nature of the classification task, we
combine an ensemble of machine learning techniques such as a Linear Support Vector Classifiers,
Logistic Regression models, and Deep Neural Networks. We fill the gap in existing literature by (1)
using a combined Machine Learning approach to understand the effect of COVID-19 on Twitter
users’ attitudes and by (2) improving on the performance of automatic racism detectors. Here
we show that there has not been a sharp increase in racism towards Asian people on Twitter and
that users that posted racist Tweets before the pandemic are prone to post an approximately equal
amount during the outbreak. Previous research on racism and other virus outbreaks suggests that
racism towards communities associated with the region of the origin of the virus is not exclusively
attributed to the outbreak but rather it is a continued symptom of deep-rooted biases towards
minorities[13]. Our research supports these previous findings. We conclude that the COVID-19
outbreak is an additional outlet to discriminate against Asian people, instead of it being the main
cause.

Keywords: COVID-19, Coronavirus, Machine Learning, Natural Language Processing, Auto-
matic Hate-Speech Detection, Racism.

1 Introduction
In December of 2019, a new disease of the coronavirus family, COVID-19, was detected in Wuhan,

China. The World Health Organization (WHO) has declared the novel coronavirus a global pandemic
due to an exponential increase in COVID-19 infections in the past months reaching, as of March
1st 2021, over 119 million cases and resulting in approximately 2.6 million deaths worldwide[17].
Besides its effects on global health, the COVID-19 outbreak has significantly impacted the global
economy, travel, political dynamics, and the public’s actions as a whole[2]. Researchers suggest that


https://doi.org/10.15837/ijccc.2021.2.4207 2

it is only through collaboration we will success to efficiently combat the COVID-19 pandemic[9].
However, it is thought that the COVID-19 outbreak has significantly fuelled racism and discrimination,
especially towards Asian people[10]. Over the past months, extremely influential politicians made
public declarations that further associate the virus to China[10]. In addition, the number of racist
attacks, including hate crimes, has increased severely since the pandemic was declared a global threat
to our health[5]. Nonetheless, it is not the first time in history when an infectious disease outbreak is
associated with communities that later have to suffer social, economic or political consequences. Other
diseases such as SARS, the Middle East Respiratory Syndrome or Ebola had a negative impact on
the communities associated with the cause of the outbreak[7][13]. Previous research that investigated
the effect of Ebola on discrimination against African residents of Hong Kong suggests that social
stigmatization of Africans was a continued symptom of deep-rooted biases towards minorities in Hong
Kong not solely attributable to the 2014 Ebola outbreak. Rather, the outbreak was an additional
excuse to discriminate against the minority African community in Hong Kong[13]. Further, other
work that studied the effect of SARS on racism in Toronto, Canada suggests that the virus outbreak
in the city affected citizens’ attitude towards people who may be associated with the country of
the origin of SARS[12]. As a further matter, recent research shows that these types of stigmas affect
negatively the research and academic community as well[7]. However, previous work on the connection
between infectious disease outbreaks and racism did not look quantitatively at the impact of social
media on racism towards these targeted communities that are associated negatively with the region
of the origin of the virus. In this paper, we are filling this literature gap by studying the connection
between racism and social media. In order to do that, we analyse the dynamics in people’s attitude
towards Asian people on Twitter before and after the 29th of January 2020, the date when COVID-19
was officially declared a threat to our global health.

2 Related Work
Previous work has looked at multiple training methods for datasets in order to classify hate speech

or other similar linguistic tasks and the best-suggested method of training is the Linear Support
Vector Machine Learning Model (SVC)[1]. Thus, in our study we use a linear SVC classifier to
identify racist tweets that target Asian people between 1st of January and 31st of March 2020. The
classifier is an updated version inspired by the open-source system provided by Davidson et al.[4]
with adapted features that are meant to identify racism towards Asian people instead of just hate
speech, as proposed in the original system. To further reduce overfitting, we use two other methods
of training, namely a logistic regression and a Long Short Term Memory (LSTM) recurrent neural
network (RNN)[11]. Moreover, other related work concludes that pre-trained embedding weights for
Twitter datasets improve the performance of the classification tasks and thus, we included this in
our classifiers[6]. Previous research also shows that in order to extract a deeper level of features,
a Deep Neural Network Model is required[18]. However, all these methods of training models and
improvements added to existing systems are still highly dependent on the linguistic nature of the
classification task. According to previous literature, hate speech, particularly racism, is challenging
to classify because of the high amount of offensive language and swear words on online platforms [15].
The key difference between the two is based on linguistic distinction [8]. Previous related work also
suggests that hate speech towards certain groups comes from a set of stereotypical words, used in
either a positive or negative way due to how previous research treated the hate-speech identification
as a matter of word sense disambiguation [8]. The training data sets take the above definition of hate
speech into consideration.

3 Data
In order to investigate the dynamics in users’ attitudes on Twitter towards Asian people before

and after the COVID-19 outbreak, we use two different datasets of Tweets for training, validating and
testing our models and two datasets of Tweets for our actual study. The two datasets used for training
are collected by (1) Davidson et al[4] and (2) Waseem and Hovy [16]. Both datasets are manually


https://doi.org/10.15837/ijccc.2021.2.4207 3

annotated and contain only tweets written in English. The dataset provided by Davidson et al. has
24K tweets that contain hate speech keywords which were collected by using a crowd-sourced hate
speech lexicon. These tweets are labelled into three categories: “hate speech”, “offensive language”
and “neither”. Only 5% of the tweets in this dataset were labelled as “hate speech” and 76% were
labelled as “offensive language”. The remaining were labelled as “neither”[4]. On the other hand, the
dataset provided by Waseem and Hovy contains 16k tweets that contain racist and sexist instances of
hate speech. These tweets are labelled into three categories: “racism”, “sexism” and “neither”. The
annotators labelled 20% of the Tweets as “sexism” and 12% as “racism”. The remaining were labelled
as “neither”[16]. For the sake of our study, we only considered the tweets labelled as “racism” and
“neither". Moreover, in order to conduct our study, we used the first available COVID-19 dataset of
tweets collected by Chen et al[3]. This dataset contains tweet ids of users that mention the pandemic
since January 22, 2020. We downloaded the tweet ids that were collected by the above team based
on keywords listed in their open-source documentation. The tweets we downloaded were shared by
users between January 22 and March 31, 2020. In accordance with Twitter terms, we hydrate the
tweet ids in order to convert them to actual tweets by using the Twarc tool. All the tweets from
all three datasets had to undergo a preprocessing procedure. This includes standardizing counts of
URLs and mentions, removing punctuation and the excess of whitespaces, as well as tokenizing the
tweet by lowercasing and stemming the words. Besides these standard preprocessing tasks, we cleaned
the COVID-19 dataset to account only for tweets written in English and that are geo-located in the
United States of America. As a further matter, we used this data to train and run our models and
we identified all tweets that were labelled as “racist” towards Asian people. Then, we used the tweet
ids of all these racist tweets in order to trace down these users’ posts before their first mention of the
pandemic. In accordance with Twitter terms, we only had access to the last 32,000 tweets for each
user, however this was enough to form a new dataset of tweets before the COVID-19 outbreak. We
used the latter dataset to see whether these users had a negative attitude on Twitter towards Asians
even before the pandemic. This dataset has been preprocessed in a similar manner as the COVID-19
dataset.

4 Methodology

Figure 1: SVC Model Metrics

Figure 2: Confusion Matrix SVC Model

Analysing and quantifying the dynamics in the attitudes of Twitter users towards Asian people
before and after the COVID-19 pandemic is not an easy task due to the subtle linguistic distinction
between hate speech, specifically racism, and offensive language. Since existing methods primarily
focus upon hate speech, we have contributed to the research literature by developing our own im-
proved classification method consisting of an Ensemble Network of multiple detectors. We trained


https://doi.org/10.15837/ijccc.2021.2.4207 4

this network on datasets collected by both Davidson et al, and also Waseem and Hovy, as mentioned
above. This enabled us to build a strategy that was superior to existing methods. As a starting point,
we improved upon the hate speech classifier created by Davidson et al. Rather than simply using an
SVM model, our methodology combines this strategy with a Logistic Regression model. We extracted
features by running the pre-processing pipeline as proposed by Davidson et al. which utilized Term
Frequency-Inverse Document Frequency (TF-IDF), Penn POS tagging and Porter Stemming on uni-
grams, bigrams, and trigrams. This helps highlight important features within the tweets. These two
models were trained using the hate speech dataset of tweets provided by Davidson et al. The output
of these networks were put through an ensemble voting mechanism.

Figure 3: Ensemble Model Metrics

Figure 4: Confusion Matrix Ensemble Model

To boost the accuracy further and to introduce regularization, we also ran these features through
a ensemble model of 20 Deep Neural Network (DNN) models (See Fig 5,6,7). The tweets in this
ensemble model framework are first preprocessed by generating convolutional neural network word
embeddings from the raw twitter data. This embedding generation uses a pre-trained WordtoVec
NLP model that was pretrained on 400 million tweets [18]. These DNN models were trained using the
dataset collected by Waseem and Hovy.

Figure 5: Neural Network model

The output of this DNN collection was also fed through the voting mechanism. In order to reduce
overfitting, we use hard voting method which results in the most accurate vote for the three systems.
In addition to the DNN, we also applied a LSTM model, but unfortunately the results indicated that
a LSTM would not be optimal for tweets, which are short and contain dense meanings. The original


https://doi.org/10.15837/ijccc.2021.2.4207 5

system proposed by Davidson et al. had a precision and recall of 46% and 60%, respectively See Fig
1,2. Our novel ensemble method produced a precision of 50% with a recall of 70%. See Fig 3,4

Figure 6: Neural Network ensemble model
Figure 7: Combined ensemble model

We limit our study to the Twitter users in the US to prevent any country bias. Majority of the
tweets lack explicit country information, but there is a plethora of literature on the detection of tweet
country based on its several features. In this paper, we used a tweet country classification model
developed by Zubiaga et al [19]. The model used eight tweet-inherent features for classification such
as tweet content, the user’s self-reported location and the user’s real name and was trained on two
datasets, collected a year apart from each other. From the tests we conducted, the model had on
average more than 85% accuracy on detecting the tweets from the US, which we deemed satisfactory.

Finally, the classification system was also passed through a RegEx keyword filter, for example
"(R|r)acis.*" and "(C|c)hin.*", in order to adapt the system with features that are meant to identify
racism towards particular groups of people instead of just hate speech, as proposed in the original
system.

5 Results
We obtain three sets of results. The first set of results correspond to general rate of hateful tweets

by the US Twitter users. Specifically, we examine whether users become increasingly more hateful
after the outbreak of COVID-19 and its global spread.

We ran our classifier on the country-filtered COVID-19 related dataset. We find that collective
number of hateful tweets is generally steady during this period See Fig 4. However, there were two large
spikes in the number of hate-speech tweets early February (1) and early March (2). This corresponds
to 1. the rapid spread of the virus within China and first wave of significant active cases outside China
and 2. the significant increase in the number of cases globally, specifically in the US. See fig 11

Note: There is a known gap in Chen et al’s Covid-19 database around Feb 23 due to
connectivity issues, this decrease is reflected in Figures 8, 11 and 12 [3]

We note that the total number of tweets (before classification) varies over time. This could affect
the conclusions regarding individuals. Therefore, we also examined if the number of hate-speech tweets
corresponds to an increase in racist and hateful discussion on COVID-19. I.e, do people individually
become more hateful? A graph of the ratio of hate speech to total tweets was plotted See Fig 8. We
observe that the overall level of hate-speech remained steady (at around 1% of total tweets). This is


https://doi.org/10.15837/ijccc.2021.2.4207 6

a surprising finding as it indicates that the level of hate speech around COVID-19 has not increased
significantly over the course of the pandemic.

In order to examine if the recent pandemic has caused people to become more racist, we would
need to control for people’s historic tweet behaviour going back to before the pandemic, and after.
We selected approximately 60 thousand users who posted hateful tweets from the COVID-19 dataset
during the month of January, whom we label as the "haters". We selected the month of January as it
allowed us to have a fairly equal representation of the 4 months before and after the initial reaction
to COVID-19. The two separated periods correspond to September 2019 - January 2020 and January
2020 - May 2020, respectively.

Of this, we scraped 20 thousand Twitter timelines through the Twitter API. Because of the limits
posed by Twitter, we were only able to scrape the latest 1000 tweets of any given user. For many
users this only covers the recent few weeks, however, for the some it covers the entire pre-pandmic
and post-pandemic period. We aggregated the "hater" timeline tweets together, which produced the
recent-heavy graph of hate tweets near May See Fig 9.

Figure 8: Classified hate speech tweets related to COVID-19


https://doi.org/10.15837/ijccc.2021.2.4207 7

Figure 9: Classification of hate speech on racist users timelines

When we normalize this by the total number of timeline tweets, we obtain the ratio of real hate
tweets of these users See Fig 9. It is important to note that while there are small spikes in the
hateful tweets in February and March (corresponding to the rising cases in China and subsequently
in the US), the baseline level of hate-speech does not change significantly in the pre-pandemic and
post-pandemic periods. These "hater" users were already posting high level of hate speech before the
pandemic and the COVID-19 appears to be just another context in which they continue to express
their hate speech. This figure also shows that these "haters" are also the highest source of racist
hate-speech with approximately 4% of their tweets being hateful as compared to the average of 1%
See Fig 8


https://doi.org/10.15837/ijccc.2021.2.4207 8

Figure 10: Classified and normalized hate speech timelines towards Chinese people.

In the third set of results (See Fig 10), we examine whether there was an increase in the rate
of racism on Twitter specifically targeted at the Chinese. Similar to the findings on general racism,
we find that while the absolute volume of hate-speech increased towards Chinese people, the actual
proportion of tweets towards them remained fairly constant - and in fact, slightly decreased. This goes
to demonstrate that the pandemic did not cause people to become more racist towards the Chinese.

Figure 11: Hate Speech vs COVID-19 cases in the
US and China Figure 12: Chart of overt racism towards Chinese

In figure 11 we also overlay the cumulative number of real COVID-19 cases as obtained from Johns
Hopkins University [14]. We do this for the primary countries in question, China and the US. In the
figure, we observe that a increase in the number of COVID-19 cases can be seen directly preceding a
large increase in the number of hate tweets. This occurred for China late January, and again for the
US early March. This is likely correlated with panic within the community about an imminent spread


https://doi.org/10.15837/ijccc.2021.2.4207 9

of COVID-19 - leading to a temporary increase in racist, isolationist attitudes - which do not remain.
Finally, in figure 12 we analyzed whether overt racism i.e usage of clear racial slurs such as "chink"

changed over time. What we found, which corroborates the findings in the previous graphs, is that
usage of these obvious terms spiked during the peak spread of COVID-19 during late January and
Early March in the US and China, but quickly faded in popularity.

6 Discussion
Our results complement the findings in the existing literature regarding the relationship between

racism and various virus outbreaks, such as SARS and Ebola, which showed that racism towards a
particular group (a community from where the virus originated) is not directly caused by the virus
itself, but rather is a continued wave of existing bias towards them.

Our primary contribution has been the usage of an ensemble of ML models instead of relying
on one specific model. This enables us to increase the accuracy when identifying racist hate-speech
by increasing regularization and reducing over-fitting. It also enables us to capture more intricate
features of tweets that would otherwise be missed by a single model. This can be seen in our superior
accuracy. Nevertheless, this comes at a cost of lower recall levels. However, for this analysis involving
huge datasets (60M+ tweets), a higher precision is more desirable.

While we strove to make our analysis and results robust by following rigorous data processing
procedures and making use of a wide array of ML models, our work is not exempt from some limi-
tations. First limitation stems from that fact that we trained our models on out-of-distribution data
as opposed to an in-domain data. Davidson et al.[4] was originally designed to detect racism to-
wards African Americans (not Chinese as in our context) while the Waseem and Hovy dataset [16]
was designed for classification of Islamophobic tweets along with tweets targeting African Americans.
While we combined them in order to reduce their idiosyncratic limitations, training on in-domain data
would have yielded more accurate results in detecting tweets specifically racist towards Chinese. In
our future work we could overcome these limitations by manually annotating our own dataset (e.g.
using Amazon Mechanical Turk) to generate training data for detecting tweets racist towards Chinese.
Second limitation of our research is that we did not filter out potential tweets by bots. Since we found
that a proportion of hateful tweets was fairly constant regardless of the total tweets posted by the
users (bots tend to post more often than humans), we believe that this limitation would not alter the
general findings of our research.

To conclude, analyzing Twitter users’ attitudes towards racial groups in the context of a global
pandemic provides significant information on how humans behave during critical times. Even though
the racism appears not be directly caused by the virus itself, but rather a continued wave of existing
bias towards Asian people, it is important to understand this phenomenon. After all, according to
relevant literature, it is only through collaboration that we can efficiently combat the COVID-19
pandemic[9].

7 Acknowledgments
We would like to express our special thanks to our professors, Talal Rahwan and Bedoor AlShebli,

for the feedback, guidance and supervision provided throughout the development of our research
project. As a further matter, we would like to acknowledge Thomas Davidson, Dana Warmsley,
Michael Macy, and Ingmar Weber for the open-source system and dataset that enabled us to conduct
our research. On the same note, we would like to thank Zeerak Waseem and Dirk Hovy for the
open-source dataset that we used in training our models. Last but not least, we would like to express
our special thanks to Emily Chen, Kristina Lerman and Emilio Ferrara for providing the first public
dataset of tweets related to COVID-19 that was of great use in our research.

Funding

There was no funding required for this project.


https://doi.org/10.15837/ijccc.2021.2.4207 10

Author contributions

The authors contributed equally to this work.

Conflict of interest

The authors declare no conflict of interest.

References
[1] Aken, B., Risch, J., Löser, A. (2018). Challenges for Toxic Comment Classification: An In-Depth

Error Analysis CoRR , DOI: 10.18653/v1/W18-5105

[2] Atkenson, A. (2020). What Will Be the Economic Impact of COVID-19 in the US? Rough Esti-
mates of Disease Scenarios National Bureau of Economic Research, DOI: 10.3386/w26867

[3] Chen, E., Lerman, K., Ferrara, E. (2020). COVID-19: The First Public Coronavirus Twitter
Dataset JMIR Public Health Surveill , DOI: 10.2196/19273

[4] Davidson T., Warmsley D., Macy M., Weber I. (2017). Automated Hate Speech Detection and
the Problem of Offensive Language, AAAI Publications, Eleventh International AAAI Conference
on Web and Social Media, DOI: http://arxiv.org/abs/1703.04009

[5] Devakumar, D., Shannon, G., Bhopal, S. S., Abubakar, I.(2020). Racism and discrimination in
COVID-19 responses The Lancet, DOI:https://doi.org/10.1016/S0140-6736(20)30792-3

[6] Godin, F., Vandersmissen, B., De Neve, W., Van de Walle, R. (2015). Named Entity Recognition
for Twitter Microposts using Distributed Word Representations Proceedings of the Workshop on
Noisy User-generated Text , DOI: 10.18653/v1/W15-4322

[7] Hanasoge, S., Horiuchi, N., Huang, C., Jia, H., Kim, N. Y., Murao, M., Seo, M., Tan, R.,
Wilkinson, J. (2020). Visibility challenges for Asian scientists Nature Reviews Physics, DOI:
https://doi.org/10.1038/s42254-020-0162-z

[8] Kwok, I., Wang, Y. (2013). Locate the Hate: Detecting Tweets against Blacks,
AAAI Publications, Twenty-Seventh AAAI Conference on Artificial Intelligence , DOI:
10.5555/2891460.2891697

[9] Li J., Guo K., Viedma E. H., Lee H., Liu J., Zhong N., Gomes L. F. A. M., Filip F.G.,
Fang SC., Özdemir M.S., Liu X., Lu G., Shi Y. (2020), Culture versus Policy: More Global
Collaboration to Effectively Combat COVID-19 The Innovation, Volume 1, Issue 2, DOI:
https://doi.org/10.1016/j.xinn.2020.100023

[10] Nature (2020). Stop the coronavirus stigma now Nature 580, 165, DOI:
https://doi.org/10.1038/d41586-020-01009-0

[11] Pitsilis, G., Ramampiaro, H., Langseth, H. (2018). Effective hate-speech detection in Twitter data
using recurrent neural networks Appl Intell 48, DOI: https://doi.org/10.1007/s10489-018-1242-y

[12] Saars H. A., Keil, R. (2006). Global Cities and the Spread of Infectious Disease: The Case
of Severe Acute Respiratory Syndrome (SARS) in Toronto, Canada Urban Studies , DOI:
https://doi.org/10.1080/00420980500452458

[13] Siu, J. Y. (2015). Influence of social experiences in shaping perceptions of the Ebola virus among
African residents of Hong Kong during the 2014 outbreak: a qualitative study International
Journal for Equity in Health , DOI: https://doi.org/10.1186/s12939-015-0223-6

[14] Dong E, Du H, Gardner L. (2020). An interactive web-based dashboard to track COVID-19 in
real time Lancet Inf Dis. 20(5):533-534, DOI: 10.1016/S1473-3099(20)30120-1


https://doi.org/10.15837/ijccc.2021.2.4207 11

[15] Wang, W., Chen, L., Thirunarayan, K., Sheth, A. P. (2014). Cursing in English on Twitter
Association for Computing Machinery , DOI: https://doi.org/10.1145/2531602.2531734

[16] Waseem, Z., Hovy, D.(2016). Hateful Symbols or Hateful People? Predictive Features for Hate
Speech Detection on Twitter Proceedings of the NAACL Student Research Workshop , DOI:
10.18653/v1/N16-2013

[17] World Health Organization (2021). Coronavirus disease 2021 (COVID-19): situation report, 52
World Health Organization

[18] Zimmerman, S., Kruschwitz, U., Fox, C. (2018). Improving Hate Speech Detection with Deep
Learning Ensembles Proceedings of the Eleventh International Conference on Language Resources
and Evaluation (LREC 2018)

[19] Zubiaga, A., Voss, A., Procter, R., Liakata, M., Wang, B., Tsakalidis, A.(2016). To-
wards Real-Time, Country-Level Location Classification of Worldwide Tweets IEEE Trans-
actions on Knowledge and Data Engineering, Volume: 29, Issue: 9, Sept. 1 2017, DOI:
10.1109/TKDE.2017.2698463

Copyright ©2021 by the authors. Licensee Agora University, Oradea, Romania.
This is an open access article distributed under the terms and conditions of the Creative Commons
Attribution-NonCommercial 4.0 International License.
Journal’s webpage: http://univagora.ro/jour/index.php/ijccc/

This journal is a member of, and subscribes to the principles of,
the Committee on Publication Ethics (COPE).

https://publicationethics.org/members/international-journal-computers-communications-and-control

Cite this paper as:

Jia, B.; Dzitac, D.; Shrestha, S.; Turdaliev, K.; Seidaliev, N. (2021). An Ensemble Machine
Learning Approach to Understanding the Effect of a Global Pandemic on Twitter Users’ Attitudes,
International Journal of Computers Communications & Control, 16(2), 4207, 2021.

https://doi.org/10.15837/ijccc.2021.2.4207


	Introduction
	Related Work
	Data
	Methodology
	Results
	Discussion
	Acknowledgments