INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL Online ISSN 1841-9844, ISSN-L 1841-9836, Volume: 16, Issue: 2, Month: April, Year: 2021 Article Number: 4207, https://doi.org/10.15837/ijccc.2021.2.4207 CCC Publications An Ensemble Machine Learning Approach to Understanding the Effect of a Global Pandemic on Twitter Users’ Attitudes B. Jia, D. Dzitac, S. Shrestha, K. Turdaliev, N. Seidaliev Bokang Jia, Domnica Dzitac*, Samridha Shrestha, Komiljon Turdaliev, Nurgazy Seidaliev Department of Computer Science New York University Abu Dhabi, UAE Saadiyat Island 129188, Abu Dhabi, UAE *Corresponding author: domnica.dzitac@nyu.edu Email: bj798@nyu.edu, sms1198@nyu.edu, kt1673@nyu.edu, nks369@nyu.edu Abstract It is thought that the COVID-19 outbreak has significantly fuelled racism and discrimination, especially towards Asian individuals[10]. In order to test this hypothesis, in this paper, we build upon existing work in order to classify racist tweets before and after COVID-19 was declared a global pandemic. To overcome the difficult linguistic and unbalanced nature of the classification task, we combine an ensemble of machine learning techniques such as a Linear Support Vector Classifiers, Logistic Regression models, and Deep Neural Networks. We fill the gap in existing literature by (1) using a combined Machine Learning approach to understand the effect of COVID-19 on Twitter users’ attitudes and by (2) improving on the performance of automatic racism detectors. Here we show that there has not been a sharp increase in racism towards Asian people on Twitter and that users that posted racist Tweets before the pandemic are prone to post an approximately equal amount during the outbreak. Previous research on racism and other virus outbreaks suggests that racism towards communities associated with the region of the origin of the virus is not exclusively attributed to the outbreak but rather it is a continued symptom of deep-rooted biases towards minorities[13]. Our research supports these previous findings. We conclude that the COVID-19 outbreak is an additional outlet to discriminate against Asian people, instead of it being the main cause. Keywords: COVID-19, Coronavirus, Machine Learning, Natural Language Processing, Auto- matic Hate-Speech Detection, Racism. 1 Introduction In December of 2019, a new disease of the coronavirus family, COVID-19, was detected in Wuhan, China. The World Health Organization (WHO) has declared the novel coronavirus a global pandemic due to an exponential increase in COVID-19 infections in the past months reaching, as of March 1st 2021, over 119 million cases and resulting in approximately 2.6 million deaths worldwide[17]. Besides its effects on global health, the COVID-19 outbreak has significantly impacted the global economy, travel, political dynamics, and the public’s actions as a whole[2]. Researchers suggest that https://doi.org/10.15837/ijccc.2021.2.4207 2 it is only through collaboration we will success to efficiently combat the COVID-19 pandemic[9]. However, it is thought that the COVID-19 outbreak has significantly fuelled racism and discrimination, especially towards Asian people[10]. Over the past months, extremely influential politicians made public declarations that further associate the virus to China[10]. In addition, the number of racist attacks, including hate crimes, has increased severely since the pandemic was declared a global threat to our health[5]. Nonetheless, it is not the first time in history when an infectious disease outbreak is associated with communities that later have to suffer social, economic or political consequences. Other diseases such as SARS, the Middle East Respiratory Syndrome or Ebola had a negative impact on the communities associated with the cause of the outbreak[7][13]. Previous research that investigated the effect of Ebola on discrimination against African residents of Hong Kong suggests that social stigmatization of Africans was a continued symptom of deep-rooted biases towards minorities in Hong Kong not solely attributable to the 2014 Ebola outbreak. Rather, the outbreak was an additional excuse to discriminate against the minority African community in Hong Kong[13]. Further, other work that studied the effect of SARS on racism in Toronto, Canada suggests that the virus outbreak in the city affected citizens’ attitude towards people who may be associated with the country of the origin of SARS[12]. As a further matter, recent research shows that these types of stigmas affect negatively the research and academic community as well[7]. However, previous work on the connection between infectious disease outbreaks and racism did not look quantitatively at the impact of social media on racism towards these targeted communities that are associated negatively with the region of the origin of the virus. In this paper, we are filling this literature gap by studying the connection between racism and social media. In order to do that, we analyse the dynamics in people’s attitude towards Asian people on Twitter before and after the 29th of January 2020, the date when COVID-19 was officially declared a threat to our global health. 2 Related Work Previous work has looked at multiple training methods for datasets in order to classify hate speech or other similar linguistic tasks and the best-suggested method of training is the Linear Support Vector Machine Learning Model (SVC)[1]. Thus, in our study we use a linear SVC classifier to identify racist tweets that target Asian people between 1st of January and 31st of March 2020. The classifier is an updated version inspired by the open-source system provided by Davidson et al.[4] with adapted features that are meant to identify racism towards Asian people instead of just hate speech, as proposed in the original system. To further reduce overfitting, we use two other methods of training, namely a logistic regression and a Long Short Term Memory (LSTM) recurrent neural network (RNN)[11]. Moreover, other related work concludes that pre-trained embedding weights for Twitter datasets improve the performance of the classification tasks and thus, we included this in our classifiers[6]. Previous research also shows that in order to extract a deeper level of features, a Deep Neural Network Model is required[18]. However, all these methods of training models and improvements added to existing systems are still highly dependent on the linguistic nature of the classification task. According to previous literature, hate speech, particularly racism, is challenging to classify because of the high amount of offensive language and swear words on online platforms [15]. The key difference between the two is based on linguistic distinction [8]. Previous related work also suggests that hate speech towards certain groups comes from a set of stereotypical words, used in either a positive or negative way due to how previous research treated the hate-speech identification as a matter of word sense disambiguation [8]. The training data sets take the above definition of hate speech into consideration. 3 Data In order to investigate the dynamics in users’ attitudes on Twitter towards Asian people before and after the COVID-19 outbreak, we use two different datasets of Tweets for training, validating and testing our models and two datasets of Tweets for our actual study. The two datasets used for training are collected by (1) Davidson et al[4] and (2) Waseem and Hovy [16]. Both datasets are manually https://doi.org/10.15837/ijccc.2021.2.4207 3 annotated and contain only tweets written in English. The dataset provided by Davidson et al. has 24K tweets that contain hate speech keywords which were collected by using a crowd-sourced hate speech lexicon. These tweets are labelled into three categories: “hate speech”, “offensive language” and “neither”. Only 5% of the tweets in this dataset were labelled as “hate speech” and 76% were labelled as “offensive language”. The remaining were labelled as “neither”[4]. On the other hand, the dataset provided by Waseem and Hovy contains 16k tweets that contain racist and sexist instances of hate speech. These tweets are labelled into three categories: “racism”, “sexism” and “neither”. The annotators labelled 20% of the Tweets as “sexism” and 12% as “racism”. The remaining were labelled as “neither”[16]. For the sake of our study, we only considered the tweets labelled as “racism” and “neither". Moreover, in order to conduct our study, we used the first available COVID-19 dataset of tweets collected by Chen et al[3]. This dataset contains tweet ids of users that mention the pandemic since January 22, 2020. We downloaded the tweet ids that were collected by the above team based on keywords listed in their open-source documentation. The tweets we downloaded were shared by users between January 22 and March 31, 2020. In accordance with Twitter terms, we hydrate the tweet ids in order to convert them to actual tweets by using the Twarc tool. All the tweets from all three datasets had to undergo a preprocessing procedure. This includes standardizing counts of URLs and mentions, removing punctuation and the excess of whitespaces, as well as tokenizing the tweet by lowercasing and stemming the words. Besides these standard preprocessing tasks, we cleaned the COVID-19 dataset to account only for tweets written in English and that are geo-located in the United States of America. As a further matter, we used this data to train and run our models and we identified all tweets that were labelled as “racist” towards Asian people. Then, we used the tweet ids of all these racist tweets in order to trace down these users’ posts before their first mention of the pandemic. In accordance with Twitter terms, we only had access to the last 32,000 tweets for each user, however this was enough to form a new dataset of tweets before the COVID-19 outbreak. We used the latter dataset to see whether these users had a negative attitude on Twitter towards Asians even before the pandemic. This dataset has been preprocessed in a similar manner as the COVID-19 dataset. 4 Methodology Figure 1: SVC Model Metrics Figure 2: Confusion Matrix SVC Model Analysing and quantifying the dynamics in the attitudes of Twitter users towards Asian people before and after the COVID-19 pandemic is not an easy task due to the subtle linguistic distinction between hate speech, specifically racism, and offensive language. Since existing methods primarily focus upon hate speech, we have contributed to the research literature by developing our own im- proved classification method consisting of an Ensemble Network of multiple detectors. We trained https://doi.org/10.15837/ijccc.2021.2.4207 4 this network on datasets collected by both Davidson et al, and also Waseem and Hovy, as mentioned above. This enabled us to build a strategy that was superior to existing methods. As a starting point, we improved upon the hate speech classifier created by Davidson et al. Rather than simply using an SVM model, our methodology combines this strategy with a Logistic Regression model. We extracted features by running the pre-processing pipeline as proposed by Davidson et al. which utilized Term Frequency-Inverse Document Frequency (TF-IDF), Penn POS tagging and Porter Stemming on uni- grams, bigrams, and trigrams. This helps highlight important features within the tweets. These two models were trained using the hate speech dataset of tweets provided by Davidson et al. The output of these networks were put through an ensemble voting mechanism. Figure 3: Ensemble Model Metrics Figure 4: Confusion Matrix Ensemble Model To boost the accuracy further and to introduce regularization, we also ran these features through a ensemble model of 20 Deep Neural Network (DNN) models (See Fig 5,6,7). The tweets in this ensemble model framework are first preprocessed by generating convolutional neural network word embeddings from the raw twitter data. This embedding generation uses a pre-trained WordtoVec NLP model that was pretrained on 400 million tweets [18]. These DNN models were trained using the dataset collected by Waseem and Hovy. Figure 5: Neural Network model The output of this DNN collection was also fed through the voting mechanism. In order to reduce overfitting, we use hard voting method which results in the most accurate vote for the three systems. In addition to the DNN, we also applied a LSTM model, but unfortunately the results indicated that a LSTM would not be optimal for tweets, which are short and contain dense meanings. The original https://doi.org/10.15837/ijccc.2021.2.4207 5 system proposed by Davidson et al. had a precision and recall of 46% and 60%, respectively See Fig 1,2. Our novel ensemble method produced a precision of 50% with a recall of 70%. See Fig 3,4 Figure 6: Neural Network ensemble model Figure 7: Combined ensemble model We limit our study to the Twitter users in the US to prevent any country bias. Majority of the tweets lack explicit country information, but there is a plethora of literature on the detection of tweet country based on its several features. In this paper, we used a tweet country classification model developed by Zubiaga et al [19]. The model used eight tweet-inherent features for classification such as tweet content, the user’s self-reported location and the user’s real name and was trained on two datasets, collected a year apart from each other. From the tests we conducted, the model had on average more than 85% accuracy on detecting the tweets from the US, which we deemed satisfactory. Finally, the classification system was also passed through a RegEx keyword filter, for example "(R|r)acis.*" and "(C|c)hin.*", in order to adapt the system with features that are meant to identify racism towards particular groups of people instead of just hate speech, as proposed in the original system. 5 Results We obtain three sets of results. The first set of results correspond to general rate of hateful tweets by the US Twitter users. Specifically, we examine whether users become increasingly more hateful after the outbreak of COVID-19 and its global spread. We ran our classifier on the country-filtered COVID-19 related dataset. We find that collective number of hateful tweets is generally steady during this period See Fig 4. However, there were two large spikes in the number of hate-speech tweets early February (1) and early March (2). This corresponds to 1. the rapid spread of the virus within China and first wave of significant active cases outside China and 2. the significant increase in the number of cases globally, specifically in the US. See fig 11 Note: There is a known gap in Chen et al’s Covid-19 database around Feb 23 due to connectivity issues, this decrease is reflected in Figures 8, 11 and 12 [3] We note that the total number of tweets (before classification) varies over time. This could affect the conclusions regarding individuals. Therefore, we also examined if the number of hate-speech tweets corresponds to an increase in racist and hateful discussion on COVID-19. I.e, do people individually become more hateful? A graph of the ratio of hate speech to total tweets was plotted See Fig 8. We observe that the overall level of hate-speech remained steady (at around 1% of total tweets). This is https://doi.org/10.15837/ijccc.2021.2.4207 6 a surprising finding as it indicates that the level of hate speech around COVID-19 has not increased significantly over the course of the pandemic. In order to examine if the recent pandemic has caused people to become more racist, we would need to control for people’s historic tweet behaviour going back to before the pandemic, and after. We selected approximately 60 thousand users who posted hateful tweets from the COVID-19 dataset during the month of January, whom we label as the "haters". We selected the month of January as it allowed us to have a fairly equal representation of the 4 months before and after the initial reaction to COVID-19. The two separated periods correspond to September 2019 - January 2020 and January 2020 - May 2020, respectively. Of this, we scraped 20 thousand Twitter timelines through the Twitter API. Because of the limits posed by Twitter, we were only able to scrape the latest 1000 tweets of any given user. For many users this only covers the recent few weeks, however, for the some it covers the entire pre-pandmic and post-pandemic period. We aggregated the "hater" timeline tweets together, which produced the recent-heavy graph of hate tweets near May See Fig 9. Figure 8: Classified hate speech tweets related to COVID-19 https://doi.org/10.15837/ijccc.2021.2.4207 7 Figure 9: Classification of hate speech on racist users timelines When we normalize this by the total number of timeline tweets, we obtain the ratio of real hate tweets of these users See Fig 9. It is important to note that while there are small spikes in the hateful tweets in February and March (corresponding to the rising cases in China and subsequently in the US), the baseline level of hate-speech does not change significantly in the pre-pandemic and post-pandemic periods. These "hater" users were already posting high level of hate speech before the pandemic and the COVID-19 appears to be just another context in which they continue to express their hate speech. This figure also shows that these "haters" are also the highest source of racist hate-speech with approximately 4% of their tweets being hateful as compared to the average of 1% See Fig 8 https://doi.org/10.15837/ijccc.2021.2.4207 8 Figure 10: Classified and normalized hate speech timelines towards Chinese people. In the third set of results (See Fig 10), we examine whether there was an increase in the rate of racism on Twitter specifically targeted at the Chinese. Similar to the findings on general racism, we find that while the absolute volume of hate-speech increased towards Chinese people, the actual proportion of tweets towards them remained fairly constant - and in fact, slightly decreased. This goes to demonstrate that the pandemic did not cause people to become more racist towards the Chinese. Figure 11: Hate Speech vs COVID-19 cases in the US and China Figure 12: Chart of overt racism towards Chinese In figure 11 we also overlay the cumulative number of real COVID-19 cases as obtained from Johns Hopkins University [14]. We do this for the primary countries in question, China and the US. In the figure, we observe that a increase in the number of COVID-19 cases can be seen directly preceding a large increase in the number of hate tweets. This occurred for China late January, and again for the US early March. This is likely correlated with panic within the community about an imminent spread https://doi.org/10.15837/ijccc.2021.2.4207 9 of COVID-19 - leading to a temporary increase in racist, isolationist attitudes - which do not remain. Finally, in figure 12 we analyzed whether overt racism i.e usage of clear racial slurs such as "chink" changed over time. What we found, which corroborates the findings in the previous graphs, is that usage of these obvious terms spiked during the peak spread of COVID-19 during late January and Early March in the US and China, but quickly faded in popularity. 6 Discussion Our results complement the findings in the existing literature regarding the relationship between racism and various virus outbreaks, such as SARS and Ebola, which showed that racism towards a particular group (a community from where the virus originated) is not directly caused by the virus itself, but rather is a continued wave of existing bias towards them. Our primary contribution has been the usage of an ensemble of ML models instead of relying on one specific model. This enables us to increase the accuracy when identifying racist hate-speech by increasing regularization and reducing over-fitting. It also enables us to capture more intricate features of tweets that would otherwise be missed by a single model. This can be seen in our superior accuracy. Nevertheless, this comes at a cost of lower recall levels. However, for this analysis involving huge datasets (60M+ tweets), a higher precision is more desirable. While we strove to make our analysis and results robust by following rigorous data processing procedures and making use of a wide array of ML models, our work is not exempt from some limi- tations. First limitation stems from that fact that we trained our models on out-of-distribution data as opposed to an in-domain data. Davidson et al.[4] was originally designed to detect racism to- wards African Americans (not Chinese as in our context) while the Waseem and Hovy dataset [16] was designed for classification of Islamophobic tweets along with tweets targeting African Americans. While we combined them in order to reduce their idiosyncratic limitations, training on in-domain data would have yielded more accurate results in detecting tweets specifically racist towards Chinese. In our future work we could overcome these limitations by manually annotating our own dataset (e.g. using Amazon Mechanical Turk) to generate training data for detecting tweets racist towards Chinese. Second limitation of our research is that we did not filter out potential tweets by bots. Since we found that a proportion of hateful tweets was fairly constant regardless of the total tweets posted by the users (bots tend to post more often than humans), we believe that this limitation would not alter the general findings of our research. To conclude, analyzing Twitter users’ attitudes towards racial groups in the context of a global pandemic provides significant information on how humans behave during critical times. Even though the racism appears not be directly caused by the virus itself, but rather a continued wave of existing bias towards Asian people, it is important to understand this phenomenon. After all, according to relevant literature, it is only through collaboration that we can efficiently combat the COVID-19 pandemic[9]. 7 Acknowledgments We would like to express our special thanks to our professors, Talal Rahwan and Bedoor AlShebli, for the feedback, guidance and supervision provided throughout the development of our research project. As a further matter, we would like to acknowledge Thomas Davidson, Dana Warmsley, Michael Macy, and Ingmar Weber for the open-source system and dataset that enabled us to conduct our research. On the same note, we would like to thank Zeerak Waseem and Dirk Hovy for the open-source dataset that we used in training our models. Last but not least, we would like to express our special thanks to Emily Chen, Kristina Lerman and Emilio Ferrara for providing the first public dataset of tweets related to COVID-19 that was of great use in our research. Funding There was no funding required for this project. https://doi.org/10.15837/ijccc.2021.2.4207 10 Author contributions The authors contributed equally to this work. Conflict of interest The authors declare no conflict of interest. References [1] Aken, B., Risch, J., Löser, A. (2018). Challenges for Toxic Comment Classification: An In-Depth Error Analysis CoRR , DOI: 10.18653/v1/W18-5105 [2] Atkenson, A. (2020). What Will Be the Economic Impact of COVID-19 in the US? Rough Esti- mates of Disease Scenarios National Bureau of Economic Research, DOI: 10.3386/w26867 [3] Chen, E., Lerman, K., Ferrara, E. (2020). COVID-19: The First Public Coronavirus Twitter Dataset JMIR Public Health Surveill , DOI: 10.2196/19273 [4] Davidson T., Warmsley D., Macy M., Weber I. (2017). Automated Hate Speech Detection and the Problem of Offensive Language, AAAI Publications, Eleventh International AAAI Conference on Web and Social Media, DOI: http://arxiv.org/abs/1703.04009 [5] Devakumar, D., Shannon, G., Bhopal, S. S., Abubakar, I.(2020). Racism and discrimination in COVID-19 responses The Lancet, DOI:https://doi.org/10.1016/S0140-6736(20)30792-3 [6] Godin, F., Vandersmissen, B., De Neve, W., Van de Walle, R. (2015). Named Entity Recognition for Twitter Microposts using Distributed Word Representations Proceedings of the Workshop on Noisy User-generated Text , DOI: 10.18653/v1/W15-4322 [7] Hanasoge, S., Horiuchi, N., Huang, C., Jia, H., Kim, N. Y., Murao, M., Seo, M., Tan, R., Wilkinson, J. (2020). Visibility challenges for Asian scientists Nature Reviews Physics, DOI: https://doi.org/10.1038/s42254-020-0162-z [8] Kwok, I., Wang, Y. (2013). Locate the Hate: Detecting Tweets against Blacks, AAAI Publications, Twenty-Seventh AAAI Conference on Artificial Intelligence , DOI: 10.5555/2891460.2891697 [9] Li J., Guo K., Viedma E. H., Lee H., Liu J., Zhong N., Gomes L. F. A. M., Filip F.G., Fang SC., Özdemir M.S., Liu X., Lu G., Shi Y. (2020), Culture versus Policy: More Global Collaboration to Effectively Combat COVID-19 The Innovation, Volume 1, Issue 2, DOI: https://doi.org/10.1016/j.xinn.2020.100023 [10] Nature (2020). Stop the coronavirus stigma now Nature 580, 165, DOI: https://doi.org/10.1038/d41586-020-01009-0 [11] Pitsilis, G., Ramampiaro, H., Langseth, H. (2018). Effective hate-speech detection in Twitter data using recurrent neural networks Appl Intell 48, DOI: https://doi.org/10.1007/s10489-018-1242-y [12] Saars H. A., Keil, R. (2006). Global Cities and the Spread of Infectious Disease: The Case of Severe Acute Respiratory Syndrome (SARS) in Toronto, Canada Urban Studies , DOI: https://doi.org/10.1080/00420980500452458 [13] Siu, J. Y. (2015). Influence of social experiences in shaping perceptions of the Ebola virus among African residents of Hong Kong during the 2014 outbreak: a qualitative study International Journal for Equity in Health , DOI: https://doi.org/10.1186/s12939-015-0223-6 [14] Dong E, Du H, Gardner L. (2020). An interactive web-based dashboard to track COVID-19 in real time Lancet Inf Dis. 20(5):533-534, DOI: 10.1016/S1473-3099(20)30120-1 https://doi.org/10.15837/ijccc.2021.2.4207 11 [15] Wang, W., Chen, L., Thirunarayan, K., Sheth, A. P. (2014). Cursing in English on Twitter Association for Computing Machinery , DOI: https://doi.org/10.1145/2531602.2531734 [16] Waseem, Z., Hovy, D.(2016). Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter Proceedings of the NAACL Student Research Workshop , DOI: 10.18653/v1/N16-2013 [17] World Health Organization (2021). Coronavirus disease 2021 (COVID-19): situation report, 52 World Health Organization [18] Zimmerman, S., Kruschwitz, U., Fox, C. (2018). Improving Hate Speech Detection with Deep Learning Ensembles Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018) [19] Zubiaga, A., Voss, A., Procter, R., Liakata, M., Wang, B., Tsakalidis, A.(2016). To- wards Real-Time, Country-Level Location Classification of Worldwide Tweets IEEE Trans- actions on Knowledge and Data Engineering, Volume: 29, Issue: 9, Sept. 1 2017, DOI: 10.1109/TKDE.2017.2698463 Copyright ©2021 by the authors. Licensee Agora University, Oradea, Romania. This is an open access article distributed under the terms and conditions of the Creative Commons Attribution-NonCommercial 4.0 International License. Journal’s webpage: http://univagora.ro/jour/index.php/ijccc/ This journal is a member of, and subscribes to the principles of, the Committee on Publication Ethics (COPE). https://publicationethics.org/members/international-journal-computers-communications-and-control Cite this paper as: Jia, B.; Dzitac, D.; Shrestha, S.; Turdaliev, K.; Seidaliev, N. (2021). An Ensemble Machine Learning Approach to Understanding the Effect of a Global Pandemic on Twitter Users’ Attitudes, International Journal of Computers Communications & Control, 16(2), 4207, 2021. https://doi.org/10.15837/ijccc.2021.2.4207 Introduction Related Work Data Methodology Results Discussion Acknowledgments