Journal of Software Engineering Research and Development, 2021, 9:3, doi: 10.5753/jserd.2021.827  This work is licensed under a Creative Commons Attribution 4.0 International License.. An Empirical Study of Bugs in COVID­19 Software Projects Akond Rahman  [ Tennessee Technological University| arahman@tntech.edu ] Effat Farhana [ North Carolina State University| efarhan@ncsu.edu ] Abstract The dire consequences of the COVID­19 pandemic have influenced development of COVID­19 software i.e., software used for analysis and mitigation of COVID­19. Bugs in COVID­19 software can be consequential, as COVID­19 software projects can impact public health policy and user data privacy. The goal of this paper is to help practitioners and researchers improve the quality of COVID­19 software through an empirical study of open source software projects related to COVID­19. We use 129 open source COVID­19 software projects hosted on GitHub to conduct our empirical study. Next, we apply qualitative analysis on 550 bug reports from the collected projects to identify bug categories. We identify 8 bug categories, which include data bugs i.e., bugs that occur during mining and storage of COVID­19 data. The identified bug categories appear for 7 categories of software projects including (i) projects that use statistical modeling to perform predictions related to COVID­19, and (ii) medical equipment software that are used to design and implement medical equipment, such as ventilators. Based on our findings, we advocate for robust statistical model construction through better synergies between data science practitioners and public health experts. Existence of security bugs in user tracking software necessitates development of tools that will detect data privacy violations and security weaknesses. Keywords: bugs, covid­19, empirical study, pandemic, software quality 1 Introduction The novel Coronavirus disease (COVID­19) is a world­ wide pandemic that spreads through droplets generated from coughs or sneezes and by touching contaminated sur­ faces (John Hopkins University, 2020). As of May 31 2020, COVID­19 has caused 370,247 deaths across the world (John Hopkins University, 2020). Apart from causing thousands of deaths and creating long term health repercussions for vul­ nerable populations, COVID­19 has severely impacted the economic sector. According to a recent study (Erin Duffin, 2020), due to COVID­19 gross domestic product (GDP) will decrease from 3.0% to 2.4% worldwide. As of May 28 2020, nearly 41 million citizens reported unemployment in USA alone (Mitchell Hartman, 2020). More than 3.9 billion peo­ ple around the world were under some form of stay at home order due to COVID­19 (Alasdair Sandford, 2020). Health care professionals are at the frontline of combat­ ing COVID­19. Practitioners from other domains, such as software engineering have also joined forces to analyze and mitigate the negative consequences of COVID­19. For ex­ ample, statistical modeling was used to build a software that identifies pneumonia caused by COVID­19 from lung scan images (Tom Simonite, 2020). The software was used in 34 Chinese hospitals (Tom Simonite, 2020). In response to the food insecurity caused by COVID­19, practitioners have cre­ ated an interactive visualization software that displays free meal sites across USA (Why Hunger, 2020). The creators of the software envision in building a social movement to eradicate hunger and address economic inequalities. As an­ other example, Apple and Google have jointly announced of creating a software framework that will help practitioners build tools to trace COVID­19 infection status of mobile app users (Apple, 2020). The above­mentioned examples show COVID­19 software i.e., software used for analysis and miti­ gation of COVID­19, to have near­term and long­term effects on public health and society. Despite the above­mentioned advancements, COVID­19 software projects are susceptible to bugs. Let us consider Fig­ ure 1 in this regard. Figure 1 provides a snapshot of a bug re­ port related to statistical modeling (Begley, 2020a). We ob­ serve when implementing a statistical model the practition­ ers did not consider the correlation between intensive care unit (ICU) bed availability and death rate prediction. Further­ more, the number of ICU beds is incorrectly assumed to be 40,000 instead of 1,000. We hypothesize systematic analysis can reveal bug cate­ gories including statistical modeling bugs similar to Figure 1. In prior work researchers (Garcia et al., 2020; Rahman et al., 2020; Linares­Vásquez et al., 2017; Catolino et al., 2019; Thung et al., 2012; Wan et al., 2017) have documented the im­ portance of bug categorization. For example, for autonomous vehicle software Garcia et al. 2020 stated that categorization of bugs can help to construct bug detection and testing tools. Linraes­Vásquez et al. 2017 stated categorizing vulnerabil­ ities can help Android practitioners “in focusing their veri­ fication and validation activities”. According to Catolino et al. 2019, “understanding the bug type represents the first and most time­consuming step to perform in the process of bug triage”. In prior work, researchers have categorized bugs for infras­ tructure as code (IaC) (Rahman et al., 2020), autonomous vehicle (Garcia et al., 2020), and machine learning (Thung et al., 2012; Islam et al., 2019) software. However, COVID­ 19 software is different from previously studied software in the following aspects: (i) development context: unlike previ­ ously studied software projects, COVID­19 software is de­ veloped in response to a pandemic that infected 6.1 million individuals in five months (John Hopkins University, 2020), and (ii) public health: unlike previously studied software projects, COVID­19 software has direct implications on pub­ https://orcid.org/0000-0002-5056-757X mailto:arahman@tntech.edu mailto:efarhan@ncsu.edu An Empirical Study of Bugs in COVID­19 Software Projects Rahman and Farhana 2021 Figure 1. An example of a bug report related to statistical modeling in a software project called ‘neherlab/covid19_scenarios’. lic health and relevant policy making for inhabitants in 188 countries. In response to the pandemic, researchers have conducted studies related to modeling (Dehning et al., 2020; Yang and Wang, 2020; Tamm, 2020), biological science (Jin et al., 2020; Wang et al., 2020; De Clercq, 2006; Helms et al., 2020), social science (Van Bavel et al., 2020; Pulido et al., 2020; Evans et al., 2020; Will, 2020; Jarynowski et al., 2020), and policy making (Corey et al., 2020; Mello and Wang, 2020; Rourke et al., 2020; Kraemer et al., 2020). However, characterization of bugs in COVID­19 software remains an unexplored area. The scope of our paper is to get a systematic understand­ ing of bugs in COVID­19 software projects. In our paper, we refer to COVID­19 software projects as software projects that were created to analyze and mitigate the consequences of COVID­19. These projects were created in response to a global pandemic that created a worldwide impact on pub­ lic health, economy, and societal activities. Our hypothesis is that the utility of COVID­19 software projects and the ur­ gency associated with these projects can yield (i) manifesta­ tion of bugs unique to the COVID­19 reality, and (ii) bug res­ olution time. Furthermore, from our empirical analysis what categories of bugs appear for what types of COVID­19 soft­ ware projects. The goal of this paper is to help practitioners and re­ searchers improve the quality of COVID­19 software through an empirical study of open source software projects related to COVID­19. We answer the following research questions: • RQ1: What categories of open source COVID­19 software projects exist? We identify seven categories of software projects related to COVID­19: aggregation, education, medical equipment, mining, user tracking, statistical modeling, and volunteer management. • RQ2: What categories of bugs exist in open source COVID­19 software projects? How frequently do the identified bug categories appear? What is the res­ olution time for the identified bug categories? We identify eight bug categories: algorithm, data, depen­ dency, documentation, performance, security, syntax, and user interface. Except for mining and medical equip­ ment projects, for types of COVID­19 software projects the most frequently occurring bug category is UI. • RQ3: How similar are the identified bug cate­ gories to that with previously studied software projects? Identified bug categories for COVID­19 soft­ ware projects also appear for other software types, but their manifestation of the bugs is different for COVID­ 19 software projects. Contributions: We list our contributions as follows: • A categorization of bugs that appear in COVID­19 soft­ ware projects; • A categorization of OSS projects related to COVID­19; • An empirical study that identifies what category of bugs appear for what category of COVID­19 software projects; and • A comparison of bug categories for COVID­19 soft­ ware projects to that with previously studied software projects. We organize rest of the paper as follows: We discuss re­ lated work in Section 2. We provide the methodology to an­ swer the three research questions in Section 3 and provide the results in Section 4. We discuss our results with a sum­ mary of our findings in Section 5. We provide the limitations of our paper in Section 6. Finally, we conclude the paper in Section 7. Our constructed dataset is available as a public, citable repository (Rahman and Farhana, 2020). Overview of the Empirical Study An overview of our pa­ per is available in Figure 2. First, we mine software projects related to COVID­19 from GitHub by applying a filtering cri­ teria based on number of issues, number of developers etc. Next, we apply qualitative analysis technique called open coding (Saldana, 2015) on the README files of the col­ lected open source software (OSS) projects to identify what categories of OSS projects exist related to COVID­19. After characterizing the collected software projects, we again ap­ ply open coding on 550 bug reports from the collected OSS projects to identify bug categories. We also quantify the fre­ quency and resolution time of each bug category across the identified project categories. Finally, we conduct a scoping An Empirical Study of Bugs in COVID­19 Software Projects Rahman and Farhana 2021 review (Munn et al., 2018) to find the similarities in bug categories between COVID­19­related software projects and other categories of software projects. 2 Related Work Our paper is related with prior research that has focused on categorization of bugs in OSS projects. Mockus et al. 2002 studied the contribution nature in OSS Apache and Mozilla projects. They (Mockus et al., 2002) observed contributors who submit bug reports are approximately 8.2 times higher in number than contributors who address bugs in bug reports. Ma et al. 2017 investigated Python GitHub projects that are used in the scientific domain, and observed developers to use stack traces, as well as communicate with upstream devel­ opers, to identify root causes of bugs. Zhang et al. 2019 ex­ amined bug reports for mobile and desktop software hosted on GitHub, and identified differences on how the reports are constructed. Ray et al. 2014 studied the correlations between bugs and the language the software is being developed, and reported a modest correlation using an empirical study of 729 GitHub projects. Categorization of domain­specific OSS bugs has also been investigated: Thung et al. 2012, Garcia et al. 2020, Wan et al. 2017, Islam et al. 2019, and Rahman et al. 2020 in separate research papers used OSS projects to classify bug categories respectively, for machine learning, autonomous vehicle, blockchain, deep learning, and IaC. Our paper is also related with publications that have in­ vestigated the impact of COVID­19 on software develop­ ment. Ralph et al. 2020 surveyed 2,225 practitioners and re­ ported fear related to COVID­19 to affect productivity of software practitioners. Butler and Jaffe 2020 conducted a di­ ary study with 435 practitioners and reported practitioners to face challenges, such as having too many meetings and feel­ ing overworked while working from home due to COVID­19. Oliveira et al. 2020 surveyed 413 practitioners from Brazil and reported practitioners’ perceived productivity to increase due to fewer interruptions. From the above­mentioned discussion we observe bugs in software projects related to COVID­19 to be an under­ explored area. While there exists several bug categorization studies (Thung et al., 2012; Garcia et al., 2020; Wan et al., 2017; Islam et al., 2019; Rahman et al., 2020) no studies ex­ ist for COVID­19­related projects. The bug categorization­ related studies for IaC, block chain, and deep learning moti­ vated us to derive bug categories and quantify the identified bug categories. Wan et al. 2017’s paper on blockchain bugs motivated us to study bug resolution time for each identified bug category. In our paper, we study COVID­19 software bugs in the following manner: • categories of bugs; • frequency of identified bug categories; • resolution time of identified bug categories; and • categories of software projects. 3 Methodology In this section we provide the methodology to answers re­ search questions: RQ1, RQ2, and RQ3. 3.1 Methodology for RQ1: What categories of open source COVID­19 software projects exist? We define COVID­19 software projects as software projects used for analysis and mitigation of COVID­19. We hypoth­ esize multiple categories of COVID­19 software projects to exist in the OSS domain. We validate our hypothesis by sys­ tematically categorizing COVID­19 software projects. Our categorization will provide insights on how the software de­ velopment community has responded to the COVID­19 pan­ demic. We answer RQ1 by completing the following steps: 3.1.1 Dataset Collection We conduct our empirical analysis by collecting COVID­ 19 software projects hosted on GitHub. To collect these projects we use GitHub’s search utility (GitHub, 2020c), where we first identified projects tagged as ‘covid­19’. We use the search string ‘covid­19’, as it is a topic designated for COVID­19 by GitHub (GitHub, 2020a). Our assumption is that by using a GitHub­designated tag we can collect OSS projects hosted on GitHub that are related to COVID­19. OSS projects hosted on GitHub are susceptible to quality issues, as GitHub users often host repositories for personal purposes that are not reflective of real­world software de­ velopment (Munaiah et al., 2017). Upon collection of the projects we apply a set of filtering criteria so that we can identify projects that contain sufficient data for analysis. We describe the filtering criteria below: • Criterion­1: The project must have at least 2 developers. Our assumption is that this criterion will filter out projects used for personal purposes. • Criterion­2: The project has at least 5 open issues. We use this filtering criterion to identify projects that are actively maintained. Our assumption is that by using this criterion we will able to identify COVID­19 software projects that are not used for personal purposes as well as projects that are active. Prior research (Agrawal et al., 2018) has also used the count of issues to filter OSS projects hosted on GitHub to conduct empirical studies. • Criterion­3: The project must have at least two commits per month. Munaiah et al. 2017 used the threshold of at least two commits per month to determine which projects have enough development activity for software organiza­ tions. We use this threshold to filter projects with short de­ velopment activity. • Criterion­4: The README of the project is written in En­ glish. README projects related to COVID­19 can be non­ English. We do not include non­English projects as raters who will perform categorization are not familiar with non­ English languages, such as Spanish and Cantonese. An Empirical Study of Bugs in COVID­19 Software Projects Rahman and Farhana 2021 Public GitHub Filtered COVID-19 Projects Characterization of COVID-19 Projects Characterization of COVID-19 Software Bugs Figure 2. An overview of our empirical study. • Criterion­5: The project is related with COVID­19. We use the ‘topic’ 1 feature of GitHub to search and identify COVID­19 software projects. However, practitioners can mislabel projects using the ‘topic’ feature of GitHub po­ tentially including projects in our dataset that are not re­ lated with COVID­19. For example, from manual inspec­ tion we observe the ‘RehanSaeed/Schema.NET’ 2 project to be tagged as ‘covid­19’, even though it is not related with COVID­19. In fact, the project is used to convert blob objects into C# classes. 3.1.2 Qualitative Analysis of README files We apply a qualitative analysis called open coding (Saldana, 2015) on the content of README files for each of the down­ loaded projects from Section 3.1.1. README files describe the content of the project and give GitHub users an overview of the software project (Prana et al., 2019). We hypothesize that by systematically analyzing the content of the README files we can derive what types of software projects are devel­ oped that are related to COVID­19. In open coding a rater identifies and synthesizes patterns within unstructured text (Saldana, 2015). We select open cod­ ing because we can obtain detailed information on the soft­ ware project categories. We use a hypothetical example to demonstrate our process of open coding in Figure 3. First, we collect text from the README files for each of the col­ lected projects from Section 3.1.1. Next, we extract text snip­ pets that describe the purpose of the software project. For example, from the raw text ‘The COVID­19 Vulnerability In­ dex (CV19 Index) is a predictive model that identifies people who are likely to have a heightened vulnerability to severe complications from COVID­19’ we extract the text snippet ‘a predictive model’, as the extracted text snippet describes the purpose of the software project. Next, from the text snip­ pets ‘a predictive model’ and ‘modelling estimated deaths’ we generate an initial category called ‘Models to predict’. Two initial categories ‘Models to predict’ and ‘Models to un­ derstand’ are combined into one category ‘Statistical mod­ eling’, as they both indicate the descriptions of the software projects to be related with statistical modeling. 1https://github.com/topics 2https://github.com/RehanSaeed/Schema.NET The first and second authors conduct the open coding process separately. Both authors used Excel spreadsheets to conduct the open coding process manually. The first and second authors respectively an experience of 10 and 6 years in software engineering and has experience in con­ ducting open coding upon software project artifacts, such as commit messages (Rahman et al., 2020) and Stack Over­ flow posts (Farhana et al., 2019). Upon completion of the open coding process, the first and second authors identify agreements and disagreements. Disagreements are resolved upon discussion, agreement rate is calculated using Cohen’s Kappa (Cohen, 1960). During the discussion phase both au­ thors agreed present their justification, and recheck the cat­ egory derivation based on the discussion and revisiting con­ tent. The mapping determined upon discussion is considered final. One project can map to multiple categories. 3.1.3 Closed Coding We apply closed coding (Crabtree and Miller, 1999) to iden­ tify which project maps to the identified categories from Sec­ tion 3.1.2. Closed coding is the qualitative analysis technique where a rater maps an artifact to a pre­defined category by inspecting the artifact (Crabtree and Miller, 1999). The first and second author separately conduct closed coding on the collected README files. Both authors use Excel spread­ sheets to conduct closed coding. After completing the closed coding process the first and second authors identify agree­ ments and disagreements. Agreement rate is recorded using Cohen’s Kappa (Cohen, 1960). Disagreements are resolved using discussion. During the discussion phase both authors present their justification for disagreements. Next, based on the discussion the authors recheck the labeling based on the justification and content analysis. The categorization deter­ mined upon discussion is considered final. 3.1.4 Rater Verification The derived categories are susceptible to the bias of the first and second author. We mitigate the limitation by allocating an additional rater who applied closed coding for a subset of the README files. The additional rater who is not an author of the paper, is a fourth year PhD candidate in the Depart­ ment of Computer Science at Tennessee Technological Uni­ An Empirical Study of Bugs in COVID­19 Software Projects Rahman and Farhana 2021 README excerpt Raw Text Initial Category Category The COVID-19 Vulnerability Index (CV19 Index) is a predictive model that identifies people who are likely to have a heightened vulnerability to severe complications from COVID- 19 COVID-19 Agent-based Simulator (Covasim): a model for understanding novel coronavirus epidemiology Code for modelling estimated deaths and cases for COVID19 a predictive model a model for understanding modelling estimated deaths Models to predict Models to understand Statistical modeling Figure 3. A hypothetical example to demonstrate our process of open coding to categorize COVID­19 software projects. versity. The rater has a professional experience of 2 years in software engineering and has conduced qualitative analysis on software artifacts, such as bug reports. We randomly al­ locate a set of 100 README files mined from 100 projects to the rater. The rater applies closed coding on the content of the README files, to identify the mapping between each project and identified categories. Upon completion of closed coding we calculate Cohen’s Kappa (Cohen, 1960) between the rater and the first author, as well as with the second au­ thor, separately. 3.2 Methodology for RQ2: What categories of bugs exist in open source COVID­19 software projects? How frequently do the identified bug categories appear? What is the resolution time for the identified bug categories? In this section, we answer “RQ2: What categories of bugs appear in COVID­19 software projects? How frequently do the identified bug categories appear? What is the resolution time for each bug category?” A categorization of bugs for COVID­19 software projects can inform practitioners and re­ searchers about how software related to COVID­19 is devel­ oped and in which areas they can help. Furthermore, educa­ tors can learn about the software bugs that occur in a soft­ ware related to a pandemic and disseminate these findings in the classroom. Frequency of the identified bug categories can help us understand what categories of software tend to contain what types of software bugs and provide quality im­ provement suggestions accordingly. Quantifying the resolu­ tion time for bugs in software projects can help software en­ gineering researchers provide actionable guidelines to prac­ titioners. For example, Wan et al. 2017 observed that for blockchain software projects security bugs can take longer to fix compared to other bug categories. Based on their find­ ings Wan et al. 2017 recommended that blockchain project maintainers can adopt security analysis and repair tools to fix security bugs quickly. We provide the methodology to iden­ tify bug categories, quantify bug category frequency, and bug resolution time below: Methodology to Identify Bug Categories: We identify bug categories using the following steps: • Step#1­Filtering: We collect the 4,405 issue reports from the 129 projects and manually inspect each issue report. We do not rely on automated approaches, such as keyword search or using bug labels, as automated approaches tend to generate false positives, which may bias research results (Herzig et al., 2013). While inspect­ ing each issue report we use the following IEEE defini­ tion for bugs: “an imperfection that needs to be replaced or repaired” (IEEE, 2010), similar to prior work (Rah­ man et al., 2020). By completing this step we will obtain a set of closed issues reports that correspond to bugs. We use closed reports because as open bug reports are often incomplete and may not help in identifying bugs (Wan et al., 2017). The first and second author manually inspect individu­ ally to identify what issue reports correspond to bugs. We record agreement rate and Cohen’s Kappa (Cohen, 1960) between the first and second author. Disagree­ ments between the first and second author are resolved through discussions. The process is subjective and sus­ ceptible to the bias of the first and second author. We mitigate the bias by using an additional rater, who in­ spected randomly inspected 100 issue reports and clas­ sified them as bug reports and non­bug reports. The ad­ ditional rater is the fourth year PhD candidate at Ten­ nessee Technological University who is also involved in rater verification for RQ1. BugPropAll(x) = # of bug reports labeled as category x total # of bug reports ∗ 100% (1) An Empirical Study of Bugs in COVID­19 Software Projects Rahman and Farhana 2021 BugPropCateg(x, y) = # of bug reports labeled as x, of project type y # of bug reports for project type y ∗ 100% (2) • Step#2­Open coding: We apply open coding (Saldana, 2015) on the content of the collected bug reports from Step#1. Our open coding process is illustrated in Fig­ ure 4 using an example. First, we extract raw text from bug report titles and description, from which we gener­ ate initial categories. Next, we merge initial categories based on the commonalities and generate categories. Similar to deriving project categories, the first and sec­ ond author separately apply the process of open cod­ ing to generate bug categories. Upon completion of the process we quantify agreement rate and measure Co­ hen’s Kappa (Cohen, 1960). For disagreements we con­ duct discussion. Generated categories upon discussion is considered final. Methodology to Quantify Bug Category Frequency: We apply the following steps to quantify the frequency of identified bug categories: • Step#1­Closed coding: We apply closed coding (Crabtree and Miller, 1999) to map each identified category to the bug reports that we study. The first and second author sep­ arately apply closed coding for the collected bugs from Step#1. Upon completion, we calculate the agreement rate and Cohen’s Kappa (Cohen, 1960). Disagreements are re­ solved using discussion. • Step#2­Metric calculation: We quantify the frequency of the identified bug categories using two metrics: Bug­ PropAll’ and ‘BugPropCateg’. We use Equations 1 and 2 to respectively calculate ‘BugPropAll’ and ‘BugPropCateg’. The ‘BugPropAll’ metric refers to the proportion of bugs across all projects, and provides a holistic overview of the frequency of identified bug categories. The ‘BugProp­ Categ’ metric refers to the proportion of bugs for a certain project category, and provides a granular overview of bug category frequency for each software project types identi­ fied from Section 4.1.2. • Step#3­Rater verification: The use of first and second au­ thor as raters to conduct closed coding is susceptible to rater bias. We mitigate this limitation by allocating an addi­ tional rater. We assign randomly selected 250 bug reports to the additional rater who apply closed coding. We pro­ vide the additional rater with a document that provides def­ initions of each identified category with examples. Similar to our process of rater verification for project cate­ gorization, the additional rater is the fourth year PhD candi­ date in the Department of Computer Science in Tennessee Technological University. The fourth year PhD candidate is involved in the rater verification process for identifying project categories and labeling issue reports as bug reports. Methodology to Quantify Bug Resolution Time We use the open and closing timestamp for each closed bug report in our dataset to quantify the resolution time for each bug cate­ gory, similar to Wan et al. 2017. We calculate bug resolution time by computing the number of hours that have elapsed between when the bug report is opened and closed, and not re­opened again, as per our dataset , which was downloaded on April 04, 2020. We report bug resolution time for all bug categories, as well as for bug reports that belong to certain categories of software projects. 3.3 Methodology to Answer RQ3: How simi­ lar are the identified bug categories to that with previously studied software projects? We conduct a scoping review of publications related to soft­ ware bug categorization. Using a scoping review, researchers can synthesize results using a limited search (Anderson et al., 2008). According to Munn et al. 2018 “Researchers may con­ duct scoping reviews instead of systematic reviews where the purpose of the review is to identify knowledge gaps, scope a body of literature, clarify concepts or to investigate research conduct.”. Unlike a systematic literature review, a scoping review is less comprehensive, and can be used as a precursor to conduct a systematic literature review. Scoping review can be useful to collect emerging evidence, which eventually can be used to inform further research decisions (Anderson et al., 2008). For example, if a researcher is inexperienced in the do­ main of software fuzzing, and wants to get an understanding of existing topics such as practices and techniques to imple­ ment fuzzing, then a scoping review could be useful to that researcher of interest. We conduct a scoping review by identifying well­known venues where software engineering research is published. We select five conferences: International Conference on Soft­ ware Engineering (ICSE), Symposium on Foundations of Software Engineering (FSE), International Conference on Automated Software Engineering (ASE), International Con­ ference on Mining Software Repositories (MSR), and Inter­ national Symposium on Software Testing and Analysis (IS­ STA). We select these conferences because these conferences are considered reputed venues to publish literature related to software engineering (Emery Berger, 2021), and sponsored by special interest groups of the Association of Computing Machinery (ACM). We select conferences as they tend to have a shorter review cycle and are more likely to include recent advances in the field of interest (Vardi, 2009). We con­ duct the review by applying the following steps: • Step­1: We download all papers from 2010 to 2020 for each of the four conferences. We select papers from 2010 to 2020 to identify and synthesize state of the art bug tax­ onomies and categories used for a wide range of software projects. Papers that studied bug categories prior to 2010 may not give us an understanding of the state of art. Our hypothesis is that by identifying papers from the last 10 years we will get a better overview of what types of bugs appear for a wide range of software projects. An Empirical Study of Bugs in COVID­19 Software Projects Rahman and Farhana 2021 Bug report excerpt Raw Text Initial Category Category fix historical nyc data transition to borough/county level reporting Temperature data not saved in the backend Rajasthan district names are wrong fix nyc data data not saved in the backend district names are wrong Data bugs related to location Data bugs related to storage Data Bugs Figure 4. A hypothetical example to demonstrate our process of open coding to identify bug categories for software projects. • Step­2: We read the title, abstract, and keywords to deter­ mine if the downloaded papers are related to software bug categorization. • Step­3: Upon completion of Step­2, one rater reads each collected paper, and identifies topics discussed in the pa­ per of interest using qualitative analysis. For each paper the rater determines if the paper focuses on bug categoriza­ tion. If so, the rater documents the bug categories for the reported software project. Upon completion of the above­mentioned steps, we derive reported bug categories for multiple software projects. 4 Results In this section, we provide answers to the three research ques­ tions, RQ1, RQ2, and RQ3. 4.1 Answer to RQ1: What categories of open source COVID­19 software projects exist? We answer RQ1 by first providing summary statistics of our dataset in Section 4.1.1. Next, we report categories of the projects in Section 4.1.2. 4.1.1 Summary of Dataset Altogether we download 129 projects for analysis. Using the search feature we identify 3,276 public projects upon which we apply our filtering criterion. A complete break­ down of our filtering criterion is available in Table 1. At­ tributes of the projects are available in Table 2. ‘Languages’ in Table 2 correspond to the count of main programming lan­ guages of the collected projects as determined by GitHub’s linguist tool (GitHub, 2020b). Example languages include JavaScript, Python and R. A temporal evolution of the 129 COVID­19 software projects based on creation date is available in Figure 5. We observe sharp increase in project creation after Feb 29, 2020. Table 1. Filtering of COVID­19 projects used in paper. Criteria GitHub Initial 3,276 Criterion­1 (Devs >= 2) 1,287 Criterion­2 (Open issues >= 5) 169 Criterion­3 (Commits/month >= 2) 154 Criterion­4 (README is English) 131 Criterion­5 (Actually COVID­19) 129 Final 129 Table 2. Attributes of studied COVID­19 projects. Attributes Total Commits 38,152 Developers 2,243 Duration 12/2019­03/2020 Files 24,839 Issues 4,405 Languages 18 Releases 286 Projects 129 4.1.2 Categorization of COVID­19 Software Projects We identify 7 categories of COVID­19 software projects. We describe each of the categories below in alphabetic order: I: Aggregation:: This category includes software projects that curate data related to COVID­19 and present collected COVID­19 data in an aggregated format using vi­ sualizations. The purpose of these projects is to help users un­ derstand the spread of the COVID­19 disease over time and location. Software projects that belong to this category can be country specific as done in ‘juanmnl/covid19­monitor’ (juan­ mnl, 2020) and ‘dsfsi/covid19za’ (Marivate and Combrink, 2020) respectively, for Ecuador and South Africa. Aggrega­ tion of COVID­19 data can also be at a global level, for ex­ ample, ‘boogheta/coronavirus­countries’ (boogheta, 2020) is a software that aggregates COVID­19 data across the world and allows software users to compare the reported cases on a country­by­country basis. II: Education:: This category includes projects that pro­ vide utilities on educating people about COVID­19. Lack of knowledge related to infections and symptoms can con­ tribute to rapid spreading of COVID­19. The purpose of An Empirical Study of Bugs in COVID­19 Software Projects Rahman and Farhana 2021 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 20 19 −1 2− 22 20 19 −1 2− 29 20 20 −0 1− 07 20 20 −0 1− 14 20 20 −0 1− 21 20 20 −0 1− 28 20 20 −0 2− 07 20 20 −0 2− 15 20 20 −0 2− 22 20 20 −0 2− 29 20 20 −0 3− 07 20 20 −0 3− 15 20 20 −0 3− 22 20 20 −0 3− 29 20 20 −0 4− 04 Date C o u n t o f p ro je c ts Count of COVID−19 software projects over time Figure 5. Temporal evolution of COVID­19 software projects based on their creation date. We observe sharp increase in project creation after Feb 29, 2020. these projects is to build software, where users can ask questions and obtain answers. We observe two categories of software: first, question and answer websites similar to Stack Overflow 3, such as ‘nthopinion/covid19’ (nthopinion, 2020), where users can ask questions about COVID­19, and other users answer such questions. Second, we observe bot­ specific software, such as ‘deepset­ai/COVID­QA’ (deepset ai, 2020) that provides answers for questions related to COVID­19 automatically. III: Medical equipment:: This category includes projects to curate and maintain source code for the design and implementation of medical equipment used to treat COVID­ 19. The purpose of these projects is to create designs of COVID­19 related medical equipment, such as ventilators at scale, so that the growing need of medical equipment in hos­ pitals is satisfied. One example of such repository is ‘makers­ for­life/makair’ (makers­for life, 2020), which states the fol­ lowing in it’s README page: “Aims at helping hospitals cope with a possible shortage of professional ventilators dur­ ing the outbreak. Worldwide. ... We target a per­unit cost well under 500 EUR, which could easily be shrunk down to 200 EUR or even 100 EUR per ventilator given proper economies of scale, as well as choices of cheaper on­the­shelf compo­ nents”. The project includes design of the proposed ventila­ tors as CAD files, as well as relevant firmware available as C++ code files. Another example is the ‘popsolutions/openventila­ tor’ (popsolutions, 2020), which aims to provide cheap but reliable ventilators to treat COVID­19 in economically under­developed regions of the world. The software project initiated from a Facebook group called ‘Open Source COVID­19 Medical Supplies’ 4, where members discussed the scarcity of ventilators and the importance of creating cheap ventilators through efficient design. In the project we notice developers to create, build, and share designs using OpenSCAD scripts. OpenSCAD is an open source tool to build computer­aided design (CAD) objects 5. IV: Mining:: This category includes projects that provide APIs to mine COVID­19 data from data sources, such as the US Center for Disease Control and Prevention 3https://stackoverflow.com/ 4https://www.facebook.com/groups/opensourcecovid19medicalsupplies/ 5https://www.openscad.org/ (CDC) 2020, the World Health Organization (WHO) 2020, and data reported from local institutions. The purpose of this category of software is to provide utilities for software devel­ opers so that they can get real­time access to COVID­19 data to build aggregation software, discussed above. Because of the nature of the pandemic, access to real­time data is pivotal for accurate aggregation and analysis. The mining tools help developers to get such support. Mining software can be lo­ cation specific, for example ‘dsfsi/covid19africa’ (Marivate et al., 2020) is dedicated to curate and collate COVID­19 re­ lated data for African countries. V: User tracking:: This category includes software projects that collects information from users regarding their COVID­19 infection status. Tracking of user information can happen voluntarily, where the user voluntarily self re­ ports COVID­19 infection status. The ‘enigmampc/Safe­ Trace’ (enigmampc, 2020) software is an example where users self report their infection status as well as location his­ tory. Tracking of user information can also be done using inference, as done in ‘OpenMined/covid­alert’ (OpenMined, 2020), where the software collects user’s location informa­ tion to predict if the user is in a location with high infection density. One utility of these projects is to identify high­risk locations so that users can have an understanding of which nearby location can be avoided. Self reporting software have yielded benefits for China and South Korea (Huang et al., 2020). VI: Statistical modeling:: This category includes soft­ ware that use statistical models to predict attributes related to COVID­19. The purpose of the projects is to make pre­ dictions for the future based on existing data. Example us­ age of statistical models include (i) predicting death rate as done in ‘ImperialCollegeLondon/covid19model’ (Imperial­ CollegeLondon, 2020), (ii) automating the process of lung segmentation with computerized tomography (CT) scan, as done in ‘JoHof/lungmask’ (JoHof, 2020), (iii) predicting the impact of the COVID­19 pandemic on hospital demands as done in ‘neherlab/covid19_scenarios’ (neherlab, 2020), and (iv) predicting presence of COVID­19 with X­ray images us­ ing deep learning as done in ‘elcronos/COVID­19’ (elcronos, 2020). VII: Volunteer management:: This category includes software used to efficiently manage volunteering effort. The An Empirical Study of Bugs in COVID­19 Software Projects Rahman and Farhana 2021 purpose of this software is to build software platforms so that users can volunteer and participate in activities to help dis­ tressed families and communities. One example is the ‘covid­ volunteers’ (helpwithcovid, 2020) software, which provides a web portal where users can sign up for 650 projects that include donation of masks, personal protective equipment (PPEs), and testing of COVID­19 6. Platforms can be global, such as ‘covid­volunteers’, and also regional, for example ‘Applifting/pomuzeme.si’ (Applifting, 2020) creates a web portal so that people inside Czech Republic can volunteer. 4.1.3 Frequency of the Identified Categories Based on project count aggregation is the most frequent cat­ egory. Along with project count, we provide summary statis­ tics of projects that belong to each category in Table 3. We also observe on average user tracking projects to be more frequently released compared to other project types. We identify four software projects that belong to multiple categories. As an example, the ‘soroushchehresa/awesome­ coronavirus’ (soroushchehresa, 2020) project belongs to the categories: aggregation, mining, and statistical modeling. 4.1.4 Rater Agreement We report agreement rate for three steps: open coding, closed coding, and rater verification. Open coding: After completing open coding, the first and sec­ ond author respectively, identified 7 and 10 categories. The agreement rate is 70.0%, and the Cohen’s Kappa is 0.7, indi­ cating ‘substantial’ agreement (Landis and Koch, 1977). The authors disagreed on ‘Volunteering software related to local communities’, ‘Education bots’, and ‘Aggregated visualiza­ tions’, additional categories identified the second author. Disagreements were resolved through discussion. Both au­ thors provided justifications for their categorization. The first author pointed out that the category ‘Education bots’ can be merged with ‘Education’ as the category ‘Education’ encom­ passes all categories of knowledge software, such as bots and web applications. The first author also pointed out that ‘Volunteering software related to local communities’ can be merged with ‘Volunteer management’, as the category is an extension of the category ‘Volunteer management’. Further­ more, the first author also pointed out that ‘Aggregated visu­ alizations’ can be merged with ‘Aggregation’, as ‘Aggrega­ tion’ includes software that aggregates COVID­19 data and displays aggregated data with visualizations. The second au­ thor was convinced by the first authors’ justification and up­ dated her derived list of categories. Closed coding: During closed coding the first and second au­ thors mapped each of the 129 projects to an existing category. The agreement rate is 93.8%. The Cohen’s Kappa is 0.92. The authors disagreed on the labeling of 8 projects, which are resolved through discussion. During the discussion phase both authors agreed to present their justification, and recheck the labeling based on the justification and content analysis. The categorization determined upon discussion is considered final. 6https://helpwithcovid.com/medical Rater verification: We also measured the agreement rate be­ tween an additional rater and the authors for categorizing README files of projects. Cohen’s Kappa between the ad­ ditional rater and the first author for a randomly selected set of 50 README files is 0.73, indicating ‘substantial’ agree­ ment (Landis and Koch, 1977). Cohen’s Kappa between the additional rater and the second author for a randomly se­ lected set of 50 README files is 0.73, indicating ‘substan­ tial’ agreement (Landis and Koch, 1977). The agreement rate between the additional rater and the first and second author is respectively, 78.0% and 76.0%. 4.2 Answer to RQ2: What categories of bugs exist in open source COVID­19 software projects? How frequently do the identified bug categories appear? What is the resolu­ tion time for the identified bug categories? We answer RQ2 by first providing a breakdown of how we obtained our bug reports in Table 4 and 5. As shown in Ta­ ble 5, the categories with the most and least bug reports are re­ spectively, aggregation and medical equipment. One project can belong to multiple categories, and that is why the total count of bug reports does not total 550. Next, we describe the identified bug categories in Sec­ tion 4.2.1 by applying open coding on the collected 550 bug reports. The frequency of the identified bug categories is pro­ vided in Section 4.2.2. We provide details of rater verification in Section 4.2.3. Finally, we provide the bug resolution time in Section 4.2.4. 4.2.1 Bug Categories of COVID­19 Projects We identify 8 bug categories, which we describe below al­ phabetically: I: Algorithm:: This category corresponds to bugs when implementation of an algorithm does not follow expected be­ havior. An algorithm is a sequence of computational steps that transform input into output (Cormen et al., 2009). We ob­ serve algorithm bugs to include two sub­categories: (i) bugs related to statistical modeling algorithms, where statistical modeling results are incorrect due to incorrect assumptions and/or implementations, and (ii) bugs related to incorrect logic implemented in the software. Example: We provide examples for the two sub­ categories: • Statistical modeling: In a bug report titled “Death rates should increase when ICU’s are overwhelmed” (Beg­ ley, 2020a), a practitioner describes how incorrect as­ sumption can result in incorrect modeling behavior. The practitioner discusses that bed space is correlated with estimation of fatality rate. When bed space of hospi­ tals are exhausted hospitals will not be able to treat new COVID­19 new patients, which could potentially increase the fatality rate. The bug report provides evidence that if the context of COVID­19 is not correctly incorporated in statis­ tical models, those models will provide incorrect re­ sults. Incorrect statistical models can be consequential, An Empirical Study of Bugs in COVID­19 Software Projects Rahman and Farhana 2021 Table 3. Summary statistics of projects that belong to each category. Based on project count ‘Aggregation’ is the most frequent category as highlighted in green. Proj. Categ. Projects Com. Devs Files Iss. Rele. Aggregation 50 14,985 663 8,641 908 72 Mining 35 9,671 894 6,714 515 21 Stat. model. 22 7,214 429 3,464 491 38 Education 9 4,550 196 1,696 406 14 User track 9 2,020 152 2,291 119 286 Volunteer. 7 2,186 143 2,041 320 0 Med. equip. 3 859 38 790 14 63 Table 4. Filtering of bug reports from COVID­19 software projects. Initial 4,095 Criterion­1 (Closed issues) 2,965 Criterion­2 (Valid bug reports) 550 Final 550 Table 5. Count of bug reports for each category of COVID­19 software projects. Aggregation­related projects have the highest amount of bug reports. Project category Count (%) Aggregation 220 (40%) Mining 150 (27.3%) Stat. Model. 98 (17.8%) Education 58 (10.5%) Volunteer. 40 (7.3%) User Track 31 (5.6%) Med. Equip. 4 (0.7%) as countries are adopting public health policies specific to COVID­19. For example researchers have critiqued the statistical models derived by the Institute for Health Metrics and Evaluation at the University of Washing­ ton (IHME), and advised USA policymakers to use the modeling results with caution (Begley, 2020b). • Incorrect logic: In a bug report titled “Fix Prefecture Sorting” (reustle, 2020), a practitioner describes a sort­ ing bug which occurs when trying to visualize COVID­ 19 cases based on prefectures in Japan. A prefecture is an administrative jurisdiction in a country similar to a state or province (Hu and Qian, 2017). The bug occurred due to an incorrect logic that did not perform sorting by prefectures. II: Data:: This category corresponds to bugs that occur during mining and storage of COVID­19 data. As discussed in Section 4.1.2 we observed our dataset to include projects that mine and aggregate COVID­19 data. We observe four sub­categories of data bugs: (i) storage: bugs that occur while storing data in a database, (ii) mining: bugs that occur while retrieving data from data APIs, (iii) location: bugs where lo­ cation information in stored data is incorrect, and (iv) time series: bugs that correspond to missing data for a certain time period. Example: We provide examples for each of these sub­ categories below: • Storage: In a bug report titled “Temperature data not saved in the backend” (pavel ilin, 2020), a practitioner describes a bug where patient temperature data is in­ serted in the front­end but not stored in the database. • Mining: Bugs occur when COVID­19­related data is being mined. A practitioner describes a mining bug in a bug report titled “CDC Children scraper is out­ dated” (Timoeller, 2020). The mining tool mines data related to children affected by COVID­19. • Location: In a bug report titled “Rajasthan District names are wrong”, a practitioner describes that inserted location data for an Indian state called ‘Rajasthan’ is wrong (SinghRajenM, 2020). • Time series: Missing data was reported for a project and reported in a bug report titled “Data has a gap between 2020­3­11 and 2020­3­24” (zbraniecki, 2020). III: Dependency:: This category corresponds to bugs that occur when execution of the software is dependent on a software artifact that is either missing or incorrectly speci­ fied. For COVID­19 projects, an artifact can be an API or a build artifact. Example: In a bug report titled “Missing PostGIS” (va­ clavpavlicek, 2020), a practitioner describes that installation and execution of the software is prohibited due to a software package called ‘PostGIS’, which is used to store spatial and geographic measurements, such as area, distance, polygon, and perimeter in PostgreSQL databases. IV: Documentation:: This category corresponds to bugs that occur when incorrect and/or incomplete informa­ tion in specified in release notes, maintenance notes, and doc­ umentation files, such as README files. Example: In a bug report titled “Missing code of conduct”, a practitioner describes a ‘CODE_OF_CONDUCT.md’ file to be missing in a Markdown file that describes how practi­ tioners can contribute to the project (mdeous, 2020). V: Performance:: This category corresponds to bugs that cause performance discrepancies for the software. Per­ formance bugs are manifested in slow response of the web or mobile app. Example: In a bug report titled “Cluster animation slow­ ing down the browser. It also takes much time”, a practitioner describes how a performance bug related to an animation fea­ ture is slowing down a Firefox browser on Windows 10 (Sub­ ratappt, 2020). The performance bug was reported for a web­ site called ‘covid19india.org’ 7, which aggregates COVID­ 19 data for India and displays them. VI: Security:: This category corresponds to bugs that violate confidentiality, integrity, or availability for the soft­ ware. Example: In a bug report titled “Fix password reset proce­ dure” (landovsky, 2020), a practitioner describes a password reset bug, where the password reset procedure ends arbitrar­ ily after 500 login attempts. VII: Syntax:: This category corresponds to bugs related 7https://www.covid19india.org/ An Empirical Study of Bugs in COVID­19 Software Projects Rahman and Farhana 2021 Table 6. Frequency of identified bug categories. UI­related bugs are the most frequent. Bug category BugPropAll (%) UI 38.2 Data 30.9 Dependency 18.9 Algorithm 7.8 Syntax 6.7 Security 2.5 Performance 1.6 Documentation 1.4 with the syntax of the programming languages used to de­ velop the software. Example: We notice bugs related to data types in ‘ne­ herlab/covid19_scenarios’. In the bug report titled “Fix types and linting errors” (ivan aksamentov, 2020), a practitioner describes how linting and type checking was disabled for the project, which led to bugs related to linting and type check­ ing. VIII: UI:: This category corresponds to bugs that in­ volve the user interface (UI) of the software. UI bugs include navigation­related bugs on web pages, bugs related to acces­ sibility, displaying incorrect images, links, and color, and re­ sponsiveness. Example: In a bug report titled “accessibility fixes” (abquirarte, 2020) describes a UI bug related to accessibility. According to the bug report, a screen reader incorrectly renders check marks and crosses in front of the “Do’s and Don’t as M’s and N’s”. 4.2.2 Frequency of Identified Bug Categories Based on the ‘Proportion of Bugs Across All Projects (Bug­ PropAll)’ metric we observe UI bugs to be the most frequent category, whereas documentation is the least frequent cate­ gory. We provide a complete breakdown of the metric in Ta­ ble 6. Data bugs have four sub­categories: storage, mining, location, and time series. The frequency for storage, mining, location, and time series is respectively, 4.7%, 5.8%, 87.2%, and 2.3%. Algorithm bugs have two sub­categories: statisti­ cal modeling and wrong logic. The frequency for statistical modeling and wrong logic is respectively, 42.3% and 57.7%. We observe bug category frequency to vary for differ­ ent categories of projects. We provide the ‘Proportion of Bugs For a Certain Project Category (BugPropCat)’ val­ ues for each project category in Table 7. ‘AGG’, ‘MINE’, ‘STA’, ‘EDU’, ‘TRAK’, ‘VOL’ and ‘EQU’ respectively, cor­ responds to the seven project categories: aggregation, min­ ing, statistical modeling, education, user tracking, volunteer management system, and medical equipment. According to Table 7, except for mining and medical equipment software, the dominant bug category is UI. One possible explanation can be the analyzed software projects have UIs, which may have contributed to the frequency of UI bugs. For mining software the dominant bug category is data bugs i.e., bugs that occur due to storing and processing of COVID­19 data. For medical equipment software the dom­ inant bug category is dependency. We also notice algorithm bugs to be the second most frequent bug category for statis­ tical modeling software. Similar to prior work on machine learning (Thung et al., 2012), we expected algorithm bugs to be the most dominant category for statistical modeling. Sta­ tistical modeling software also have UIs for user interaction, and the count of UI bugs may have foreshadowed the count of algorithm bugs. 4.2.3 Rater Agreement and Verification We report agreement rate for four steps: issue labeling, open coding, closed coding, and rater verification. Labeling issues as bugs: While labeling collected issue re­ ports as bug reports and non­bug reports the agreement rate is 96.5% and the Cohen’s Kappa is 0.9. Open coding to identify bug categories: The first and sec­ ond author respectively, identified 9 and 10 categories. The agreement rate is 72.7%, and the Cohen’s Kappa is 0.70, indi­ cating ‘substantial’ agreement (Landis and Koch, 1977). The first author identified ‘database’ as a category not identified by the second author. Upon discussion both authors agreed that ‘database’ is related to data storage and belongs to the data category. The second author identified two additional categories ‘Public health data’ and ‘Type errors’. After dis­ cussing the definitions of all categories both authors agreed that ‘Public health data’ and ‘Type errors’ can respectively, be merged with data and syntax. Closed coding to quantify bug category frequency: Dur­ ing closed coding the first and second author mapped each project to an existing category. The agreement rate is 95.1% and the Cohen’s Kappa is 0.93. The authors disagreed on the labeling of 27 bug reports, which are resolved through dis­ cussion. Rater verification: For the randomly selected 250 issue re­ ports we allocate an additional rater who manually identi­ fied which of the issue reports are bug reports and non­bug reports. The Cohen’s Kappa between the additional rater and the first author is 0.80, indicating ‘substantial’ agree­ ment (Landis and Koch, 1977). The Cohen’s Kappa between the additional rater and the second author is 0.84, indicating ‘perfect’ agreement (Landis and Koch, 1977). The agreement rate between the additional rater and the first and second au­ thor is respectively, 89.0% and 93.0%. We have also measured the agreement rate between an ad­ ditional rater and the authors for categorizing bug reports. Cohen’s Kappa between the additional rater and the first au­ thor for a randomly selected set of 250 bug reports is 0.65, indicating ‘substantial’ agreement (Landis and Koch, 1977). Cohen’s Kappa between the additional rater and the second author for a randomly selected set of 250 bug reports is 0.68, indicating ‘substantial’ agreement (Landis and Koch, 1977). The agreement rate between the additional rater and the first and second author is respectively, 78.0% and 81.6%. 4.2.4 Resolution Time of Identified Bug Categories We provide bug resolution time as measured in hours for all bug categories in Table 8. From Table 8 we observe that based on min and median bug resolution times security bugs take the longest to resolve, followed algorithm bugs. We also observe data bugs to take as long as 548 hours to resolve. A breakdown of bug resolution time across the seven project categories is provided in Table 9. The ‘All’ row in An Empirical Study of Bugs in COVID­19 Software Projects Rahman and Farhana 2021 Table 7. Bug category frequency for each identified project type. All values are presented in (%). AGG MINE STA EDU TRAK VOL EQU Bug categ. Algorithm 6.8% 6.7% 22.4% 3.4% 0.0% 2.5% 0.0% Data 28.6% 60.6% 13.2% 15.5% 0.0% 12.5% 0.0% Dependency 16.3% 18.0% 18.3% 24.1% 9.7% 27.5% 75.0% Document 0.9% 1.3% 1.0% 0.0% 0.0% 10.0% 0.0% Performance 2.7% 2.0% 0.0% 0.0% 3.2% 0.0% 0.0% Security 1.8% 0.0% 3.0% 3.4% 6.4% 12.5% 0.0% Syntax 5.9% 3.3% 14.3% 17.2% 3.2% 10.0% 0.0% UI 50.0% 12.0% 34.7% 44.8% 77.4% 32.5% 25.0% Table 8. Resolution time of identified bug categories. Resolution times is measured in hours. Median resolution time is highest for security bugs. Bug category Min Median Max Security 1.240 13.9 144.6 Algorithm 0.041 13.5 172.7 Syntax 0.004 12.1 174.2 UI 0.003 11.8 254.2 Data 0.003 8.4 548.0 Performance 0.961 7.1 104.4 Dependency 0.014 2.4 379.4 Documentation 0.013 1.4 76.8 Table 9. Resolution time of bug categories grouped by project cate­ gories. We measure resolution time in hours. Median bug resolution time is highest for projects related to medical equipment software. Project category Min Median Max Medical Equipment 5.0 29.4 46.4 Volunteer Management System 0.013 21.1 174.2 User Tracking 0.124 16.5 294.5 Education 0.121 11.2 294.5 Aggregation 0.003 8.7 379.4 Statistical Modeling 0.004 7.2 168.3 Mining 0.005 2.5 548.1 All 0.003 7.4 548.0 Table 9 shows the minimum, median, and maximum bug res­ olution time for all bug categories measured in hours. In Table 9 we observe four instances where the minimum bug resolution time is less than 6 minutes (< 0.1 hours). One possible explanation can be practitioners’ habit of opening a bug report after they have developed the fix for a bug (Wan et al., 2017; Thung et al., 2012). In such cases, practitioners notice the bug early, construct the fix for the bug, and then submit the bug report by opening and closing the bug report promptly. Median bug resolution duration for each project type and bug category is provided in Table 10. ‘AGG’, ‘MINE’, ‘STA’, ‘EDU’, ‘TRAK’, ‘VOL’ and ‘EQU’ respectively, cor­ responds to the seven project categories: aggregation, min­ ing, statistical modeling, education, user tracking, volunteer management system, and medical equipment. We observe median bug resolution time to vary across bug categories as well as for project categories. 4.3 Answer to RQ3: How similar are the iden­ tified bug categories to that with previ­ ously studied software projects? We report our findings in Table 11. The ‘Bug category’ col­ umn presents the bug categories identified for COVID­19 software projects, whereas, the ‘Other software projects’ col­ umn presents the software projects for which the bug cate­ gory was observed according to papers identified from our scoping review. We observe bug categories for COVID­19 software projects to also be observable for other categories of software projects, such as deep learning and automated vehicle. 5 Discussion In this section, we first provide a summary of our findings in Section 5.1. Next, we provide a discussion on the implica­ tions of our findings in Section 5.2. 5.1 Summary Project category: Aggregation Definition: Aggregate COVID­19 data and present using visualizations Count : 50 out of 129 (38.7%) Most frequent bug category: UI bugs Median bug resolution time: 8.7 hours Project category: Mining Definition: Mine COVID­19 data Count : 35 out of 129 (27.1%) Most frequent bug category: Data bugs Median bug resolution time: 2.5 hours Project category: Statistical modeling Definition: Use of statistical models to make COVID­19 predictions Count : 22 out of 129 (17.0%) Most frequent bug category: UI bugs Median bug resolution time: 7.2 hours Project category: Education Definition: Educate people about COVID­19 Count : 9 out of 129 (6.9%) Most frequent bug category: UI bugs Median bug resolution time: 11.2 hours Project category: User tracking Definition: Track user data related to COVID­19 Count : 9 out of 129 (6.9%) Most frequent bug category: UI bugs Median bug resolution time: 16.5 hours Project category: Volunteer management Definition: Efficiently manage volunteering effort related to COVID­19 Count : 7 out of 129 (5.4%) Most frequent bug category: UI bugs Median bug resolution time: 21.1 hours Project category: Medical equipment Definition: Source code for design and implementation of medical devices Count : 3 out of 129 (2.3%) Most frequent bug category: Dependency bugs An Empirical Study of Bugs in COVID­19 Software Projects Rahman and Farhana 2021 Table 10. Median bug resolution time for each bug category and each project type measured in hours. ‘—’ indicates categories for which no bug reports exist. AGG MINE STA EDU TRAK VOL EQU Bug cat. Algorithm 9.8 10.8 13.9 10.1 — 13.5 — Data 12.2 4.4 15.2 17.0 — 42.0 — Dependency 5.6 0.1 0.3 4.5 5.3 2.9 22.4 Document 1.3 39.0 1.5 — — 6.9 — Performance 7.1 36.6 — — 1.5 — — Security 8.1 — 3.1 84.1 13.9 20.4 — Syntax 12.1 4.7 11.4 8.6 16.9 79.3 — UI 8.3 2.7 13.1 16.8 18.7 21.9 46.4 Table 11. Comparison of bug categories of COVID­19 software projects with that of other software project categories. Bug category Other software projects Security IaC (Rahman et al., 2020), OSS GitHub projects (Ray et al., 2014) Algorithm Autonomous vehicle (Garcia et al., 2020), OSS GitHub projects (Ray et al., 2014) Syntax IaC (Rahman et al., 2020), deep learning (Islam et al., 2019), OSS GitHub projects (Ray et al., 2014) UI Blockchain (Wan et al., 2017) Data Deep learning (Islam et al., 2019) Performance OSS GitHub projects (Ray et al., 2014) Dependency IaC (Rahman et al., 2020) Documentation Autonomous vehicle (Garcia et al., 2020), IaC (Rahman et al., 2020) Median bug resolution time: 29.4 hours 5.2 Implications We discuss the implications of our findings below: Security and privacy implications of user tracking soft­ ware: From Table 3 we observe 9 projects to be related with user tracking. While the benefits of user tracking software have been documented for countries, such as Russia and South Korea (Crowell Morning, 2020), this category of soft­ ware can have negative impacts on privacy of end­users. Data generated from user tracking software can be leveraged for marketing purposes. We make the following recommenda­ tions to preserve privacy of user data in user tracking soft­ ware: • Policymakers should construct policies specific to COVID­19 software that collects user data. • Practitioners who develop user tracking software should leverage existing privacy policy frameworks, such as the ‘National Institute of Standards and Technology (NIST) Privacy Framework’ 2020. • Privacy researchers can build tools that will automati­ cally detect and report privacy policy violations. Evidence from Table 7 shows that security bugs to exist for user tracking software. We advocate security researchers to systematically investigate if user tracking software includes security bugs. Recent news articles suggest that user track­ ing software, such as contract tracing apps may become more and more prevalent as Apple and Google are already provid­ ing frameworks to build software that tracks user data (Ap­ ple, 2020). Our hypothesis is that availability of these frame­ works will facilitate rapid development and deployment of mobile apps that collect user data. Security weaknesses in these apps can provide malicious users opportunity to con­ duct large­scale data breaches. We notice anecdotal evidence in this regard: a researcher has identified vulnerabilities in a user tracking app that could leak user location data (Green­ berg, 2020). Panelists at EuroCrypt 2020, a cryptography research conference, discussed limitations of user tracking mobile apps for COVID­19 with respect to API design, in­ door location tracking, and informing users about privacy risks (EuroCrypt, 2020a) (EuroCrypt, 2020b). Towards constructing correct statistical models: From Section 4.2.1 we have observed statistical modeling bugs to exist. Bugs related to statistical modeling can be conse­ quential because based on the predictions generated by sta­ tistical models, policymakers enforce public health policies. One possible explanation for buggy statistical models can be attributed to the quality of datasets using which statistical models are build (Koerth et al., 2020). For example, fatality prediction models that are built using the ‘Diamond Princess Cruise Ship Dataset’ may not be applicable for a specific geo­ graphic region with low population density. Another possible explanation can be a lack of context and knowledge related to public health specific that hinders model builders to identify appropriate independent variables to construct the models. Incorrect estimation of hospital beds from our discussion in Section 4.2.1 is one example. Other examples of independent variables related to public health includes staff availability, count of known cases, hospitalization rate etc. (Attia, 2020). According to a health expert (Attia, 2020), statistical mod­ els that predicted 2.4 million US residents to die, assumed a hospitalization rate of 15­20%, which in reality was 5%. Based on our findings and above­mentioned explanations we make two recommendations: • Automated testing for COVID­19 modeling: We hope to see novel research in the domain of COVID­19 that will test the correctness of constructed statistical models used in forecasting in an automated manner. In recent years, we have seen research efforts that test deep learn­ ing models (Tian et al., 2018; Pei et al., 2017; Ma et al., 2018). We expect similar research pursuits for COVID­ 19 statistical modeling. • Better synergies between data science and public health An Empirical Study of Bugs in COVID­19 Software Projects Rahman and Farhana 2021 practitioners: Construction and verification of COVID­ 19 statistical modeling should involve practitioners from public health and data science. Public health prac­ titioners within a specific locality can provide necessary context that data scientists can incorporate in their sta­ tistical models. Implications for Educators: Our findings have implica­ tions for educators involved in teaching the following topics: • Data science: Educators who teach data science can use the examples of statistical modeling bugs to highlight the value of considering the full context and related lim­ itations that accompany statistical modeling. • Information security and privacy: User tracking soft­ ware can be discussed in information security and pri­ vacy courses to demonstrate the value of protecting user data. Such discussion can also include privacy policy frameworks that are already in place, such as the NIST Privacy framework (National Institute of Standard and Technology, 2020). • Software engineering: Our categorization of bugs re­ lated to COVID­19 software development can be dis­ cussed to demonstrate that understanding and repair of bugs requires contextualization. Benchmark for practitioners and researchers: Ta­ bles 6— 10 can be used as a measuring stick by practitioners and researchers who are involved with COVID­19 software projects. Practitioners can estimate their bug resolution ef­ forts by comparing median resolution times for bugs in their COVID­19 software projects to that of Tables 8, 9, and 10. Compared to prior work related to blockchain and machine learning (Thung et al., 2012; Wan et al., 2017), median bug resolution time is lower for COVID­19 software projects. We provide two possible explanations: one possible explanation can be related to the sense of urgency. Practitioners may have realized that bugs in COVID­19 software projects could ham­ per the analysis or mitigation of COVID­19, and therefore, needs immediate attention. Another possible explanation can be the limitations of our dataset. The age of our software projects does not exceed four months and that may have bi­ ased median bug resolution time. We advocate for future re­ search that will confirm or refute our explanations. Recurrence­related implications: Researchers (Kissler et al., 2020; Chen et al., 2020) have provided evidence that support the recurring nature of COVID­19. About the re­ currence of COVID­19 Kissler et al. 2020 stated “a resur­ gence in contagion could be possible as late as 2024.”. We hypothesize that COVID­19’s recurrence will lead to more COVID­19 software building. Whether or not our findings hold for these newly constructed COVID­19 software can be validated through a replication of our paper. We expect to observe more categories of COVID­19 software projects as well as more bug categories. 5.3 Differences between COVID­19 Software Projects and Other Software Projects We provide the differences that we have noticed between COVID­19 software projects and other software projects, which we discuss in the following subsections: 5.3.1 Differences in Bug Manifestation A non­COVID­19 software project does not have the con­ text of public health consequences that are associated with a COVID­19 software project. We define a COVID­19 soft­ ware project to be a software project that is related with an­ alyzing and mitigating the consequences of COVID­19. By definition, we include software projects that directly captures the consequences related to public health, which is absent from a traditional software project. We observe empirical ev­ idence that shows the unique context of COVID­19 to yield differences in bugs and bug resolution time when compared with other software projects. Let us consider the case of algorithm bugs. Algorithm bugs manifest in COVID­19 projects as well as in machine learn­ ing and autonomous vehicle projects. A machine learning project that uses statistical modeling can have algorithm bugs that generates erroneous predictions. For a COVID­19 soft­ ware project that predicts death rates, a bug related to the modeling algorithm can have serious consequences, as pub­ lic health policies are derived based on these models, as it oc­ curred during incorrect estimation of hospitalization rate (At­ tia, 2020). As discussed in Section 4.3 algorithm­related bugs also appear for autonomous vehicles but presence of such bugs manifest in components unique to autonomous vehicle projects, such as lane positioning and navigation, and traffic light processing. We have observed that data bugs appear for both deep learning projects and COVID­19 software projects. The dif­ ference is for COVID­19 we have the concepts of location, as practitioners tend to miss important location­related data for COVID­19, e.g., not able to identify states in India that are observing an outbreak of COVID­19. In the case of deep learning projects, data bugs are related with structure and type of training data. As another example, dependency­related bugs appear for both IaC scripts and COVID­19 software projects. In the case of IaC, dependency­related bugs are related to an IaC­related artifact, such as Puppet manifest, class, or a module, upon which execution of an IaC script is dependent upon (Rahman et al., 2020). For COVID­19 software project dependencies are related with API and build artifacts, such as Maven depen­ dencies. This difference with respect to dependent artifacts also highlight the differences between COVID­19 software projects and IaC­based software projects. In short, our findings suggest that while commonalities for bug categories between COVID­19 software projects and other software projects, the manifestation and artifacts re­ lated to the bug categories are different from other categories of software projects. 5.3.2 Difference in Bug Resolution Time Our findings indicate that median bug resolution time is lower for OVID­19 software projects than that of blockchain and machine learning projects. Based on our findings, we conjecture that the sense of urgency might have motivated practitioners to fix bugs in COVID­19 software projects. An Empirical Study of Bugs in COVID­19 Software Projects Rahman and Farhana 2021 5.3.3 Differences with Existing Healthcare­related Soft­ ware Projects Our findings also demonstrate differences between COVID­ 19 software projects and other projects related to healthcare domain. To illustrate these differences we use Janamanchi et al. 2009’s work. Janamanchi et al. 2009 studied 174 open­ source software projects related to the health domain and identified 11 categories of software projects that do not in­ clude the three categories of projects that we have iden­ tified for COVID­19 software projects: volunteer manage­ ment, user tracking, and education. The inception and spread of COVID­19 have motivated software practitioners to cre­ ate a wide range of software projects, such as projects related to user tracking and volunteer management so that people are aware about the consequences and hygiene practices related to COVID­19. In the context of COVID­19 software projects, projects related to user tracking focus on tracking user loca­ tion data emitted from smartphones to assess the proximity of individuals who might be exposed to COVID­19. Software projects related to volunteer management are related with managing volunteers to address COVID­19­related societal issues, such as food banking. A pandemic of this nature was not experienced by health professionals prior to 2020. Exist­ ing research related to software projects that belong to health domain were not able to perform characterization of COVID­ 19 software projects and identify project categories unique to COVID­19. Janmanchi et al. 2009 did not systematically study the types of bugs that appear in health care software projects. Our paper complements Janamanchi et al. 2009’s work by studying healthcare­related projects that are related with COVID­19 by characterizing the bugs and the types of software projects related to COVID­19 in which the bugs ap­ pear in. 6 Threats to Validity We describe the limitations of our paper as following: Conclusion validity: We have used raters who derived the software and bug categories. Both raters are authors of the paper. Our derived categories are susceptible to the authors’ bias. We mitigate this limitation by allocating another rater who is not the author of the paper who verified our ratings. Our categories might not be comprehensive because our categorization for projects and bugs is limited to the dataset that we collected. The bug resolution time could be limiting as our dataset includes projects that have a duration of four months. We use the topic ‘covid­19’ to identify and filter COVID­ 19 software projects from GitHub. Any software project that is not labeled as ‘covid­19’ will not be included in our dataset. Our datasets have limited lifetime as the COVID­19 was discovered in December 2019, and the lack of maturity in our datasets may influence our analysis. We mitigate this limita­ tion by identifying projects using a filtering criteria so that we can identify projects with sufficient development activ­ ity. Internal validity: For RQ1 and RQ2 we use ourselves, the authors of the paper, as raters who conduct open and closed coding on README files and bug reports. Our research is susceptible to mono­method bias, as our categorization and labeling may be influenced by the authors’ implicit expecta­ tions and hypotheses about the study. External validity: Our findings are not comprehensive. We have not analyzed projects hosted outside GitHub and private projects hosted on GitHub. We mitigate this limita­ tion by analyzing 129 software projects that belong to 7 cat­ egories. Also, as we have used open coding to determine cat­ egories, our findings may not be identified by other raters. We mitigate this limitation by conducting rater verification, where we use a rater who is not the author of the paper. 7 Conclusion The COVID­19 pandemic has impacted people all over the world causing thousands of deaths. Software practitioners have joined the fight in combating the spread and mitigating the dire consequences of COVID­19. An understanding of COVID­19 software categories and software bugs can give us clues on how the software engineering community can help even further in combating COVID­19. We conduct an empirical study with 129 COVID­19 soft­ ware projects hosted on GitHub. We identify 7 categories of software projects: aggregation, mining, statistical models, ed­ ucation, volunteer management, user tracking, and medical equipment. By applying open coding on 550 bug reports, we identify 8 categories of bugs: algorithm, data, dependency, documentation, performance, security, syntax, and UI. We observe bug category frequency to vary with project cate­ gories, e.g., for mining projects data­related bugs is the most frequently occurring category. Our findings have implications for educators, practition­ ers, and researchers. Educators can use our categorization of COVID software projects and related bugs to educate stu­ dents about the security and privacy implications of COVID­ 19 software. Privacy researchers can build tools that will check if user tracking software related to COVID­19 are not leaking user data. Practitioners in the data science do­ main can learn from our categorization of statistical model­ ing bugs to understand limitations of constructed statistical models and verify underlying assumptions that accompany constructed statistical models. Based on our findings we also advocate for better synergies between data scientists and pub­ lic health experts so that statistical modeling bugs can be miti­ gated. We hope our paper will advance further research in the domain of COVID­19 software. Acknowledgements We thank the PASER group at Tennessee Technological University for their useful feedback. We also thank Farzana Ahamed Bhuiyan of Tennessee Technological University for her help as an additional rater. The research was partially supported by the National Science Foundation (NSF) award # 2026869. An Empirical Study of Bugs in COVID­19 Software Projects Rahman and Farhana 2021 References abquirarte (2020). accessibility fixes. github.com/ cagov/covid19/issues/137. [Online; accessed 10­ May­2020]. Agrawal, A., Rahman, A., Krishna, R., Sobran, A., and Men­ zies, T. (2018). We don’t need another hero?: The im­ pact of ”heroes” on software development. In Proceed­ ings of the 40th International Conference on Software En­ gineering: Software Engineering in Practice, ICSE­SEIP ’18, pages 245–253, New York, NY, USA. ACM. Alasdair Sandford (2020). Coronavirus: Half of humanity now on lockdown as 90 countries call for confinement. https://www.euronews.com/2020/04/02/. [Online; accessed 17­Apr­2020]. Anderson, S., Allen, P., Peckham, S., and Goodwin, N. (2008). Asking the right questions: scoping studies in the commissioning of research on the organisation and deliv­ ery of health services. Health research policy and systems, 6(1):7. Apple (2020). Privacy­preserving contact tracing. https:// www.apple.com/covid19/contacttracing. [Online; accessed 25­May­2020]. Applifting (2020). pomuzeme.si. github.com/ Applifting/pomuzeme.si. [Online; accessed 09­ May­2020]. Attia, P. (2020). Comparing covid­19 to past pandemics, preparing for the future, and reasons for optimism. https: //peterattiamd.com/ameshadalja/. [Online; ac­ cessed 21­May­2020]. Begley, S. (2020a). Death rates should increase when icu’s are overwhelmed. https://github.com/neherlab/ covid19_scenarios/issues/7. [Online; accessed 10­ May­2020]. Begley, S. (2020b). Influential covid­19 model uses flawed methods and shouldn’t guide u.s. policies, critics say. https://www.statnews.com/2020/04/17/. [Online; accessed 10­May­2020]. boogheta (2020). boogheta/coronavirus­countries. https: //github.com/boogheta/coronavirus-countries. [Online; accessed 09­May­2020]. Butler, J. L. and Jaffe, S. (2020). Challenges and gratitude: A diary study of software engineers working from home during covid­19 pandemic. Catolino, G., Palomba, F., Zaidman, A., and Ferrucci, F. (2019). Not all bugs are the same: Understanding, char­ acterizing, and classifying bug types. Journal of Systems and Software, 152:165 – 181. CDC (2020). Cases, data, and surveillance. https://www.cdc.gov/coronavirus/2019-ncov/ cases-updates/index.html. [Online; accessed 09­May­2020]. Chen, D., Xu, W., Lei, Z., Huang, Z., Liu, J., Gao, Z., and Peng, L. (2020). Recurrence of positive sars­cov­2 rna in covid­19: A case report. International Journal of Infec­ tious Diseases, 93:297 – 299. Cohen, J. (1960). A coefficient of agreement for nomi­ nal scales. Educational and Psychological Measurement, 20(1):37–46. Corey, L., Mascola, J. R., Fauci, A. S., and Collins, F. S. (2020). A strategic approach to covid­19 vaccine r&d. Sci­ ence. Cormen, T. H., Leiserson, C. E., Rivest, R. L., and Stein, C. (2009). Introduction to algorithms. MIT press. Crabtree, B. F. and Miller, W. L. (1999). Doing qualitative research. sage publications. Crowell Morning (2020). Mobile applications for covid tracking & tracing – balancing the need for personal information and privacy rights in the time of coro­ navirus. https://www.crowell.com/NewsEvents/ AlertsNewsletters/all/. [Online; accessed 20­May­ 2020]. De Clercq, E. (2006). Potential antivirals and antiviral strate­ gies against sars coronavirus infections. Expert review of anti­infective therapy, 4(2):291–302. deepset ai (2020). deepset­ai/covid­qa. https://github. com/deepset-ai/COVID-QA. [Online; accessed 09­ May­2020]. Dehning, J., Zierenberg, J., Spitzner, F. P., Wibral, M., Neto, J. P., Wilczek, M., and Priesemann, V. (2020). Inferring change points in the spread of covid­19 reveals the effec­ tiveness of interventions. Science. elcronos (2020). elcronos/covid­19. https://github. com/elcronos/COVID-19. [Online; accessed 09­May­ 2020]. Emery Berger (2021). Csrankings: Computer science rankings. http://csrankings.org/#/index?all&us. [Online; accessed 31­February­2021]. enigmampc (2020). Safetrace. github.com/enigmampc/ SafeTrace. [Online; accessed 09­May­2020]. Erin Duffin (2020). Impact of the coronavirus pan­ demic on the global economy ­ statistics & facts. https://www.statista.com/topics/6139/ covid-19-impact-on-the-global-economy/. [Online; accessed 08­May­2020]. EuroCrypt (2020a). Eurocrypt 2020 program. https:// eurocrypt.iacr.org/2020/program.php. [Online; accessed 16­May­2020]. EuroCrypt (2020b). s­212 panel discussion on contact trac­ ing. https://youtu.be/Xt4P8E_Y-xc. [Online; ac­ cessed 16­May­2020]. Evans, A. B., Blackwell, J., Dolan, P., Fahlén, J., Hoekman, R., Lenneis, V., McNarry, G., Smith, M., and Wilcock, L. (2020). Sport in the face of the covid­19 pandemic: to­ wards an agenda for research in the sociology of sport. Farhana, E., Imtiaz, N., and Rahman, A. (2019). Synthesiz­ ing program execution time discrepancies in julia used for scientific software. In 2019 IEEE International Confer­ ence on Software Maintenance and Evolution (ICSME), pages 496–500. Garcia, J., Feng, Y., Shen, J., Almanee, Sumaya Xia, Y., and Chen, Q. A. (2020). A comprehensive study of au­ tonomous vehicle bugs. In Proceedings of the 42nd Inter­ national Conference on Software Engineering, ICSE ’20. to appear. GitHub (2020a). Covid­19 : Github topics. https:// github.com/topics/covid-19. [Online; accessed 07­ May­2020]. github.com/cagov/covid19/issues/137 github.com/cagov/covid19/issues/137 https://www.euronews.com/2020/04/02/ https://www.apple.com/covid19/contacttracing https://www.apple.com/covid19/contacttracing github.com/Applifting/pomuzeme.si github.com/Applifting/pomuzeme.si https://peterattiamd.com/ameshadalja/ https://peterattiamd.com/ameshadalja/ https://github.com/neherlab/covid19_scenarios/issues/7 https://github.com/neherlab/covid19_scenarios/issues/7 https://www.statnews.com/2020/04/17/ https://github.com/boogheta/coronavirus-countries https://github.com/boogheta/coronavirus-countries https://www.cdc.gov/coronavirus/2019-ncov/cases-updates/index.html https://www.cdc.gov/coronavirus/2019-ncov/cases-updates/index.html https://www.crowell.com/NewsEvents/AlertsNewsletters/all/ https://www.crowell.com/NewsEvents/AlertsNewsletters/all/ https://github.com/deepset-ai/COVID-QA https://github.com/deepset-ai/COVID-QA https://github.com/elcronos/COVID-19 https://github.com/elcronos/COVID-19 http://csrankings.org/#/index?all&us github.com/enigmampc/SafeTrace github.com/enigmampc/SafeTrace https://www.statista.com/topics/6139/covid-19-impact-on-the-global-economy/ https://www.statista.com/topics/6139/covid-19-impact-on-the-global-economy/ https://eurocrypt.iacr.org/2020/program.php https://eurocrypt.iacr.org/2020/program.php https://youtu.be/Xt4P8E_Y-xc https://github.com/topics/covid-19 https://github.com/topics/covid-19 An Empirical Study of Bugs in COVID­19 Software Projects Rahman and Farhana 2021 GitHub (2020b). Language savant. https://github.com/ github/linguist. [Online; accessed 07­May­2020]. GitHub (2020c). Search : Covid­19. https://github. com/search?q=covid-19. [Online; accessed 07­May­ 2020]. Greenberg, A. (2020). India’s covid­19 contact tracing app could leak patient locations. https://www.wired.com/ story/india-covid-19-contract-tracing-app/. [Online; accessed 23­May­2020]. Helms, J., Kremer, S., Merdji, H., Clere­Jehl, R., Schenck, M., Kummerlen, C., Collange, O., Boulay, C., Fafi­ Kremer, S., Ohana, M., et al. (2020). Neurologic features in severe sars­cov­2 infection. New England Journal of Medicine. helpwithcovid (2020). helpwithcovid/covid­ volunteers. https://github.com/helpwithcovid/ covid-volunteers. [Online; accessed 09­May­2020]. Herzig, K., Just, S., and Zeller, A. (2013). It’s not a bug, it’s a feature: How misclassification impacts bug prediction. In Proceedings of the 2013 International Conference on Soft­ ware Engineering, ICSE ’13, page 392–401. IEEE Press. Hu, F. Z. and Qian, J. (2017). Land­based finance, fiscal autonomy and land supply for affordable housing in ur­ ban china: A prefecture­level analysis. Land Use Policy, 69:454 – 460. Huang, Y., Sun, M., and Sui, Y. (2020). How digital contact tracing slowed covid­19 in east asia. https://hbr.org/2020/04/ how-digital-contact-tracing-slowed-covid-19. [Online; accessed 09­May­2020]. IEEE (2010). Ieee standard classification for software anomalies. IEEE Std 1044­2009 (Revision of IEEE Std 1044­1993), pages 1–23. ImperialCollegeLondon (2020). Imperialcollegelon­ don/covid19model. https://github.com/ ImperialCollegeLondon/covid19model. [Online; accessed 09­May­2020]. Islam, M. J., Nguyen, G., Pan, R., and Rajan, H. (2019). A comprehensive study on deep learning bug characteristics. In Proceedings of the 2019 27th ACM Joint Meeting on Eu­ ropean Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2019, page 510–520, New York, NY, USA. Association for Computing Machinery. ivan aksamentov (2020). Fix types and linting er­ rors. https://github.com/neherlab/covid19_ scenarios/issues/101. [Online; accessed 10­May­ 2020]. Janamanchi, B., Katsamakas, E., Raghupathi, W., and Gao, W. (2009). The state and profile of open source software projects in health and medical informatics. International Journal of Medical Informatics, 78(7):457–472. Jarynowski, A., Wójta­Kempa, M., Płatek, D., and Czopek, K. (2020). Attempt to understand public health relevant social dimensions of covid­19 outbreak in poland. Avail­ able at SSRN 3570609. Jin, Z., Zhao, Y., Sun, Y., Zhang, B., Wang, H., Wu, Y., Zhu, Y., Zhu, C., Hu, T., Du, X., et al. (2020). Structural ba­ sis for the inhibition of sars­cov­2 main protease by anti­ neoplastic drug carmofur. Nature Structural & Molecular Biology, pages 1–4. John Hopkins University (2020). Corona Virus Resource Center. https://coronavirus.jhu.edu/. [Online; ac­ cessed 31­May­2020]. JoHof (2020). Johof/lungmask. https://github.com/ JoHof/lungmask. [Online; accessed 09­May­2020]. juanmnl (2020). covid19­monitor. github.com/juanmnl/ covid19-monitor. [Online; accessed 09­May­2020]. Kissler, S. M., Tedijanto, C., Goldstein, E., Grad, Y. H., and Lipsitch, M. (2020). Projecting the transmission dynamics of sars­cov­2 through the postpandemic period. Science. Koerth, M., Bronner, L., and Mithani, J. (2020). Why it’s so freaking hard to make a good covid­19 model. https://fivethirtyeight.com/features/ why-its-so-freaking-hard-to-make/. [Online; accessed 22­May­2020]. Kraemer, M. U., Yang, C.­H., Gutierrez, B., Wu, C.­H., Klein, B., Pigott, D. M., du Plessis, L., Faria, N. R., Li, R., Hanage, W. P., et al. (2020). The effect of human mo­ bility and control measures on the covid­19 epidemic in china. Science, 368(6490):493–497. Landis, J. R. and Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1):159–174. landovsky (2020). Fix password reset procedure. https:// github.com/Applifting/pomuzeme.si/issues/99. [Online; accessed 10­May­2020]. Linares­Vásquez, M., Bavota, G., and Escobar­Velasquez, C. (2017). An empirical study on android­related vulnerabil­ ities. In Proceedings of the 14th International Conference on Mining Software Repositories, MSR ’17, pages 2–13, Piscataway, NJ, USA. IEEE Press. Ma, L., Zhang, F., Sun, J., Xue, M., Li, B., Juefei­Xu, F., Xie, C., Li, L., Liu, Y., Zhao, J., and Wang, Y. (2018). Deep­ mutation: Mutation testing of deep learning systems. In 2018 IEEE 29th International Symposium on Software Re­ liability Engineering (ISSRE), pages 100–111. Ma, W., Chen, L., Zhang, X., Zhou, Y., and Xu, B. (2017). How do developers fix cross­project correlated bugs? a case study on the github scientific python ecosystem. In Proceedings of the 39th International Conference on Soft­ ware Engineering, ICSE ’17, page 381–392. IEEE Press. makers­for life (2020). makers­for­life/makair. https:// github.com/makers-for-life/makair. [Online; ac­ cessed 09­May­2020]. Marivate, V. and Combrink, H. M. (2020). Use of available data to inform the covid­19 outbreak in south africa: A case study. Data Science Journal, 19(1):1–7. Marivate, V., Nsoesie, E., Bekele, E., and open COVID­19 data working group, A. (2020). Coronavirus COVID­19 (2019­nCoV) Data Repository for Africa. mdeous (2020). Missing code of conduct. https://github. com/reach4help/reach4help/issues/135. [Online; accessed 10­May­2020]. Mello, M. M. and Wang, C. J. (2020). Ethics and governance for digital disease surveillance. Science. Mitchell Hartman (2020). Covid­19 job­ less claims are now over 40 million. many https://github.com/github/linguist https://github.com/github/linguist https://github.com/search?q=covid-19 https://github.com/search?q=covid-19 https://www.wired.com/story/india-covid-19-contract-tracing-app/ https://www.wired.com/story/india-covid-19-contract-tracing-app/ https://github.com/helpwithcovid/covid-volunteers https://github.com/helpwithcovid/covid-volunteers https://hbr.org/2020/04/how-digital-contact-tracing-slowed-covid-19 https://hbr.org/2020/04/how-digital-contact-tracing-slowed-covid-19 https://github.com/ImperialCollegeLondon/covid19model https://github.com/ImperialCollegeLondon/covid19model https://github.com/neherlab/covid19_scenarios/issues/101 https://github.com/neherlab/covid19_scenarios/issues/101 https://coronavirus.jhu.edu/ https://github.com/JoHof/lungmask https://github.com/JoHof/lungmask github.com/juanmnl/covid19-monitor github.com/juanmnl/covid19-monitor https://fivethirtyeight.com/features/why-its-so-freaking-hard-to-make/ https://fivethirtyeight.com/features/why-its-so-freaking-hard-to-make/ https://github.com/Applifting/pomuzeme.si/issues/99 https://github.com/Applifting/pomuzeme.si/issues/99 https://github.com/makers-for-life/makair https://github.com/makers-for-life/makair https://github.com/reach4help/reach4help/issues/135 https://github.com/reach4help/reach4help/issues/135 An Empirical Study of Bugs in COVID­19 Software Projects Rahman and Farhana 2021 are still waiting for unemployment benefits. https://www.marketplace.org/2020/05/28/ covid-19-jobless-claims-unemployment-benefits-waiting/. [Online; accessed 31­May­2020]. Mockus, A., Fielding, R. T., and Herbsleb, J. D. (2002). Two case studies of open source software development: Apache and mozilla. ACM Trans. Softw. Eng. Methodol., 11(3):309–346. Munaiah, N., Kroh, S., Cabrey, C., and Nagappan, M. (2017). Curating github for engineered software projects. Empiri­ cal Software Engineering, pages 1–35. Munn, Z., Peters, M. D., Stern, C., Tufanaru, C., McArthur, A., and Aromataris, E. (2018). Systematic review or scop­ ing review? guidance for authors when choosing between a systematic or scoping review approach. BMC medical research methodology, 18(1):143. National Institute of Standard and Technology (2020). Nist privacy framework. https://www.nist.gov/ privacy-framework. [Online; accessed 24­May­2020]. neherlab (2020). covid19_scenarios. github.com/ neherlab/covid19_scenarios. [Online; accessed 09­ May­2020]. nthopinion (2020). nthopinion/covid19. https://github. com/nthopinion/covid19. [Online; accessed 09­May­ 2020]. Oliveira, E., Leal, G., Valente, M. T., Morandini, M., Prik­ ladnicki, R., Pompermaier, L., Chanin, R., Caldeira, C., Machado, L., and de Souza, C. (2020). Surveying the im­ pacts of covid­19 on the perceived productivity of brazil­ ian software developers. In Proceedings of the 34th Brazil­ ian Symposium on Software Engineering, SBES ’20, page 586–595, New York, NY, USA. Association for Comput­ ing Machinery. OpenMined (2020). covid­alert. github.com/OpenMined/ covid-alert. [Online; accessed 09­May­2020]. Paul, R., Baltes, S., Gianisa, A., Torkar, R., Kovalenko, V., Marcos, K., Nicole, N., Yoo, S., Xavier, D., Tan, X., et al. (2020). Pandemic programming. Empirical Software En­ gineering, 25(6):4927–4961. pavel ilin (2020). Temperature data not saved in the backend. https://github.com/ COVID-19-electronic-health-system/ Corona-tracker/issues/351. [Online; accessed 10­May­2020]. Pei, K., Cao, Y., Yang, J., and Jana, S. (2017). Deepxplore: Automated whitebox testing of deep learning systems. In Proceedings of the 26th Symposium on Operating Systems Principles, SOSP ’17, page 1–18, New York, NY, USA. Association for Computing Machinery. popsolutions (2020). popsolutions/openventilator. https: //github.com/popsolutions/openventilator. [Online; accessed 09­May­2020]. Prana, G. A., Treude, C., Thung, F., Atapattu, T., and Lo, D. (2019). Categorizing the content of github readme files. Empirical Softw. Engg., 24(3):1296–1327. Pulido, C. M., Villarejo­Carballido, B., Redondo­Sama, G., and Gómez, A. (2020). Covid­19 infodemic: More retweets for science­based information on coronavirus than for false information. International Sociology, page 0268580920914755. Rahman, A. and Farhana, E. (2020). Dataset for Pa­ per ­ COVID­19­EMSE. https://figshare.com/s/ 7044678e1d7e7feb1efb. [Online; accessed 22­January­ 2021]. Rahman, A., Farhana, E., Parnin, C., and Williams, L. (2020). Gang of eight: A defect taxonomy for infrastructure as code scripts. In Proceedings of the 42nd International Conference on Software Engineering, ICSE ’20. to ap­ pear. Ray, B., Posnett, D., Filkov, V., and Devanbu, P. (2014). A large scale study of programming languages and code quality in github. In Proceedings of the 22Nd ACM SIG­ SOFT International Symposium on Foundations of Soft­ ware Engineering, FSE 2014, pages 155–165, New York, NY, USA. ACM. reustle (2020). Fix prefecture sorting. https://github. com/reustle/covid19japan/issues/15. [Online; ac­ cessed 05­Mar­2021]. Rourke, M., Eccleston­Turner, M., Phelan, A., and Gostin, L. (2020). Policy opportunities to enhance sharing for pan­ demic research. Science, 368(6492):716–718. Saldana, J. (2015). The coding manual for qualitative re­ searchers. Sage. SinghRajenM (2020). Rajasthan district names are wrong. https://github.com/covid19india/ covid19india-react/issues/321. [Online; accessed 10­May­2020]. soroushchehresa (2020). soroushchehresa/awesome­ coronavirus. github.com/soroushchehresa/ awesome-coronavirus. [Online; accessed 16­May­ 2020]. Subratappt (2020). Cluster animation slowing down the browser. it also takes much time. https://github.com/ covid19india/covid19india-react/issues/497. [Online; accessed 10­May­2020]. Tamm, M. V. (2020). Covid­19 in moscow: prognoses and scenarios. FARMAKOEKONOMIKA. Modern Pharma­ coeconomic and Pharmacoepidemiology, 13(1):43–51. Thung, F., Wang, S., Lo, D., and Jiang, L. (2012). An empir­ ical study of bugs in machine learning systems. In 2012 IEEE 23rd International Symposium on Software Reliabil­ ity Engineering, pages 271–280. Tian, Y., Pei, K., Jana, S., and Ray, B. (2018). Deeptest: Automated testing of deep­neural­network­driven au­ tonomous cars. In Proceedings of the 40th International Conference on Software Engineering, ICSE ’18, page 303–314, New York, NY, USA. Association for Comput­ ing Machinery. Timoeller (2020). Cdc children scraper is outdated. https: //github.com/deepset-ai/COVID-QA/issues/43. [Online; accessed 10­May­2020]. Tom Simonite (2020). Software that reads ct lung scans had been used primarily to detect cancer. now it’s retooled to look for signs of pneumonia caused by coronavirus. https://www.wired.com/story/ chinese-hospitals-deploy-ai-help-diagnose/. [Online; accessed 08­May­2020]. vaclavpavlicek (2020). Missing postgis. https://github. https://www.marketplace.org/2020/05/28/covid-19-jobless-claims-unemployment-benefits-waiting/ https://www.marketplace.org/2020/05/28/covid-19-jobless-claims-unemployment-benefits-waiting/ https://www.nist.gov/privacy-framework https://www.nist.gov/privacy-framework github.com/neherlab/covid19_scenarios github.com/neherlab/covid19_scenarios https://github.com/nthopinion/covid19 https://github.com/nthopinion/covid19 github.com/OpenMined/covid-alert github.com/OpenMined/covid-alert https://github.com/COVID-19-electronic-health-system/Corona-tracker/issues/351 https://github.com/COVID-19-electronic-health-system/Corona-tracker/issues/351 https://github.com/COVID-19-electronic-health-system/Corona-tracker/issues/351 https://github.com/popsolutions/openventilator https://github.com/popsolutions/openventilator https://figshare.com/s/7044678e1d7e7feb1efb https://figshare.com/s/7044678e1d7e7feb1efb https://github.com/reustle/covid19japan/issues/15 https://github.com/reustle/covid19japan/issues/15 https://github.com/covid19india/covid19india-react/issues/321 https://github.com/covid19india/covid19india-react/issues/321 github.com/soroushchehresa/awesome-coronavirus github.com/soroushchehresa/awesome-coronavirus https://github.com/covid19india/covid19india-react/issues/497 https://github.com/covid19india/covid19india-react/issues/497 https://github.com/deepset-ai/COVID-QA/issues/43 https://github.com/deepset-ai/COVID-QA/issues/43 https://www.wired.com/story/chinese-hospitals-deploy-ai-help-diagnose/ https://www.wired.com/story/chinese-hospitals-deploy-ai-help-diagnose/ https://github.com/Applifting/pomuzeme.si/issues/164 An Empirical Study of Bugs in COVID­19 Software Projects Rahman and Farhana 2021 com/Applifting/pomuzeme.si/issues/164. [On­ line; accessed 10­Mar­2021]. Van Bavel, J. J., Baicker, K., Boggio, P. S., Capraro, V., Ci­ chocka, A., Cikara, M., Crockett, M. J., Crum, A. J., Dou­ glas, K. M., Druckman, J. N., et al. (2020). Using social and behavioural science to support covid­19 pandemic re­ sponse. Nature Human Behaviour, pages 1–12. Vardi, M. Y. (2009). Conferences vs. journals in computing research. Communications of the ACM, 52(5):5–5. Wan, Z., Lo, D., Xia, X., and Cai, L. (2017). Bug characteris­ tics in blockchain systems: A large­scale empirical study. In 2017 IEEE/ACM 14th International Conference on Min­ ing Software Repositories (MSR), pages 413–424. Wang, C., Li, W., Drabek, D., Okba, N. M., van Haperen, R., Osterhaus, A. D., van Kuppeveld, F. J., Haagmans, B. L., Grosveld, F., and Bosch, B.­J. (2020). A human monoclonal antibody blocking sars­cov­2 infection. Na­ ture Communications, 11(1):1–6. WHO (2020). Global research on coronavirus disease (covid­19). https://www.who.int/emergencies/ diseases/novel-coronavirus-2019/ global-research-on-novel-coronavirus-2019-ncov. [Online; accessed 09­May­2020]. Why Hunger (2020). Why hunger. https://whyhunger. org/map.php. [Online; accessed 08­May­2020]. Will, C. M. (2020). ‘and breathe...’? the sociology of health and illness in covid­19 time. Sociology of Health & Illness. Yang, C. Y. and Wang, J. (2020). A mathematical model for the novel coronavirus epidemic in wuhan, china. Mathe­ matical Biosciences and Engineering, 17(3):2708–2724. zbraniecki (2020). Data has a gap between 2020­3­11 and 2020­3­24. https://github.com/covidatlas/ coronadatascraper/issues/375. [Online; accessed 10­May­2020]. Zhang, T., Chen, J., Luo, X., and Li, T. (2019). Bug reports for desktop software and mobile apps in github: What’s the difference? IEEE Software, 36(1):63–71. https://github.com/Applifting/pomuzeme.si/issues/164 https://github.com/Applifting/pomuzeme.si/issues/164 https://www.who.int/emergencies/diseases/novel-coronavirus-2019/global-research-on-novel-coronavirus-2019-ncov https://www.who.int/emergencies/diseases/novel-coronavirus-2019/global-research-on-novel-coronavirus-2019-ncov https://www.who.int/emergencies/diseases/novel-coronavirus-2019/global-research-on-novel-coronavirus-2019-ncov https://whyhunger.org/map.php https://whyhunger.org/map.php https://github.com/covidatlas/coronadatascraper/issues/375 https://github.com/covidatlas/coronadatascraper/issues/375 Introduction Related Work Methodology Methodology for RQ1: What categories of open source COVID-19 software projects exist? Dataset Collection Qualitative Analysis of README files Closed Coding Rater Verification Methodology for RQ2: What categories of bugs exist in open source COVID-19 software projects? How frequently do the identified bug categories appear? What is the resolution time for the identified bug categories? Methodology to Answer RQ3: How similar are the identified bug categories to that with previously studied software projects? Results Answer to RQ1: What categories of open source COVID-19 software projects exist? Summary of Dataset Categorization of COVID-19 Software Projects Frequency of the Identified Categories Rater Agreement Answer to RQ2: What categories of bugs exist in open source COVID-19 software projects? How frequently do the identified bug categories appear? What is the resolution time for the identified bug categories? Bug Categories of COVID-19 Projects Frequency of Identified Bug Categories Rater Agreement and Verification Resolution Time of Identified Bug Categories Answer to RQ3: How similar are the identified bug categories to that with previously studied software projects? Discussion Summary Implications Differences between COVID-19 Software Projects and Other Software Projects Differences in Bug Manifestation Difference in Bug Resolution Time Differences with Existing Healthcare-related Software Projects Threats to Validity Conclusion