BRAIN. Broad Research in Artificial Intelligence and Neuroscience ISSN: 2068-0473 | e-ISSN: 2067-3957 Covered in: Web of Science (WOS); PubMed.gov; IndexCopernicus; The Linguist List; Google Academic; Ulrichs; getCITED; Genamics JournalSeek; J-Gate; SHERPA/RoMEO; Dayang Journal System; Public Knowledge Project; BIUM; NewJour; ArticleReach Direct; Link+; CSB; CiteSeerX; Socolar; KVK; WorldCat; CrossRef; Ideas RePeC; Econpapers; Socionet. 2020, Volume 11, Issue 4, pages: 168-184 | https://doi.org/10.18662/brain/11.4/147 Predicting COVID- 19 Incidence Using Data Mining Techniques: A case study of Pakistan Saba NOOR¹, Waseem AKRAM 2 *, Touseef AHMED 3 , Qurat-ul-Ain 4 1 Department of Computer Science, Lahore Garrison University, Lahore, Pakistan 2 Department of Computer Science, COMSATS University Islamabad, Pakistan, imwaseem.khan@yahoo.com 3 Department of Computer Science, Lahore Garrison University, Lahore, Pakistan 4 Department of Computer Science, Lahore Garrison University, Lahore, Pakistan Abstract: The Outbreak of Coronavirus (COVID-19) came to the world in early December 2019. The early cases of coronavirus were reported in Wuhan City, Hubei Province, China. Till May 18, 2020, 198 countries have been affected by this life-threatening disease. The most common and known traits of COVID-19 are tiredness, fever, and dry cough. In this paper, we have discussed the Predictive data mining approach for COVID-19 predictions. In Predictive data mining, a model is developed and trained using supervised learning and then it predicts the behavior of provided data. Predictive data mining is a renowned technique known to many health organizations for the classification and prediction of diseases such as Heart disease and various types of cancers etc. There are several factors for comparing the model's accuracy, scalability, and interpretability. This predictive model is compared to the basics of its accuracy. In this proposed approach, we have used WEKA as it provides a vast collection of many machine learning algorithms. The main objective of this paper is to forecast the possible future incidence of corona cases in Pakistan. This study concludes that the number of corona cases will increase swiftly. If the government take proactive steps and strictly implement precautionary measures, then Pakistan may be able to overcome this pandemic. Keywords: WEKA; Predictive data mining; COVID-19. How to cite: Noor, S., Akram, W., Ahmed, T., & Qurat-ul- Ain (2020). Predicting COVID-19 Incidence Using Data Mining Techniques: A case study of Pakistan. BRAIN. Broad Research in Artificial Intelligence and Neuroscience, 11(4), 168-184. https://doi.org/10.18662/brain/11.4/147 https://doi.org/10.18662/brain/11.4/147 mailto:imwaseem.khan@yahoo.com https://doi.org/10.18662/brain/11.4/147 BRAIN. Broad Research in December, 2020 Artificial Intelligence and Neuroscience Volume 11, Issue 4 169 Introduction The disease of COVID-19 was originated in Wuhan city of province Hubei, China. Chunyun, the days of mass migration for the yearly Spring Festival. To limit the spread of COVID-19, Chinese authorities adopted an extraordinary approach on 23 January 2020. These guidelines comprised of national-wide quarantine, Limited and strict traveling policies, and vast surveillance of covid-19 alleged cases. The Covid-19 was confirmed to reach Pakistan on February 26, 2020, when a student returning from Iran tested positive. By March 18, Cases of COVID-19 has been reported all across the country, as of June 04, 2020, there have been about 85264 confirmed cases with 28923 recoveries and 1688 deaths in the country. The aim of this is study mainly centered on forecasting the breakout trends in Pakistan base on the breakout pattern in China, Spain as the density of population is dense in these countries. We wanted to illustrate, how these precautionary measures restricted the outbreak. Current Condition in Pakistan According to the Ministry of Health, Government of Pakistan, the total number of confirmed cases is 85264 and 1770 deaths on Thursday, June 06, 2020. Punjab province is most affected with confirmed cases (31104), then the province of Sindh with confirmed cases (32910), province of Khyber Pakhtunkhwa (11373), province of Baluchistan (5224), Gilgit Baltistan (824), Federal city (3054) and Kashmir with total cases of 285. The results are shown in Table 1. The overall Covid-19 case history of Pakistan is as in Figure1. Table 1 : Results of Confirmed, Deaths and Recovered cases in different province of Pakistan (covid.gov.pk) Sr. Province Confirmed cases Deaths Recovered Cases 01 Punjab 31104 607 7712 02 Sindh 32901 555 16022 03 Khyber Pakhtunkhwa 11373 500 3150 04 Baluchistan 5224 51 2021 05 Islamabad 3544 38 518 06 Azad Jammu & Kashmir 285 7 173 07 Gilgit Baltistan 824 12 532 https://en.wikipedia.org/wiki/Pakistan Predicting COVID-19 Incidence Using Data Mining Techniques: A Case Study … Saba NOOR, et al. 170 Figure 1: COVID-19 Case History (covid.gov.pk) Method The predictive model is based on a time-series cumulative dataset of coronavirus confirmed, recovered, and mortalities. Definitions S. Zhang et al. (2020) defines corona disease as Coronavirus disease (COVID-19) is a transferrable disease spread by a recently discovered coronavirus. A positive case of COVID-19 infection was defined as a case with a positive result for viral nucleic acid testing in respiratory specimens. A suspected case can be defined as a case with symptoms of COVID-19 infection but not confirmed by viral nucleic acid testing. Dataset Datasets are collected from Humandata.org. That track Global-time series data. The extracted data for the model is updated to June 04, 2020. Model Development Forecasting the COVID-19 incidence in Pakistan, the Linear regression-based approach is mainly focused on comparing the performance of Model RMSE, and MAE is proposed. https://www.sciencedirect.com/topics/medicine-and-dentistry/nucleic-acid BRAIN. Broad Research in December, 2020 Artificial Intelligence and Neuroscience Volume 11, Issue 4 171 Literature Review Twenty kinds of literature were reviewed for this study. The objective of this literature review is to discover how different models perform according to the given scenarios. In (Yang et al., 2020) presented a study, predicting the epidemic trend of Coronavirus in China. They used a Modified SEIR model with an AI approach trained in the late 2003 SARS dataset, to predict the outbreak. This study concludes that the breakout trend will start to decline by end of April. In (S. Zhang et al., 2020) proposed a study to estimate the reproductive cases of COVID-19 to predict daily cases on Diamond Princess cruise ship. They used serial interval distribution of existing daily incidence and estimate reproductive numbers of COVID-19 based on approximately Poisson distribution. The outcome of this study states that the number of new cases will gradually increase and cumulative COVID-19 cases may reach 1514 in the next ten days. The paper by (Binti Hamzah et al., 2020) developed an online tracker for daily statistics and analysis of corona cases, this paper aims to forecast the active confirmed, recovered cases of COVID-19 within and outside of China. This study uses Susceptible-Exposed-Infectious-Recovered (SEIR) as a predictive model. His study concludes that the peak of the outbreak will reach in late May 2020 with cases exceeding 76000 and start to decline in early July 2020. Authors in Qasim et al., (2020) use a mathematical model, sequence mean weight (TSMW) to predict COVID-19 cases across Pakistan. This model finds out that the count of patients may reach 77,905 with at least 8285 confirmed cases and 1382 death in the next 45 days, till 29 th April 2020. Autoregressive–moving-average model (ARMA) is a hypothesis-based testing model firstly proposed in 1951; it is mainly used for un-stationary time-series data. This same heuristic can be used in Simulation modeling (L. Li et al., 2020) in which an earlier digital prototype is developed to analyze the performance of this model before deploying it. In Simulation modeling, the heuristic can be used for digital prototyping. This study represented a baseline of the transmission process of COVID-19 by using a new model based on Gaussian distribution theory. This paper finds out the key factors of virus spread, such as the incubation period of the virus, reproductive number, and daily infections. The study (Y. Li et al., 2020) developed a dynamic time series model to forecast the short term trend of COVID-19 spread inside China. The model is based on different mathematical formulas. This study concludes that in China, total cases of coronavirus may reach to 36,343 after one week (February 8,2020). Wuhan will peak its confirmed cases on March 2020. After which the infection rate will start decline Predicting COVID-19 Incidence Using Data Mining Techniques: A Case Study … Saba NOOR, et al. 172 throughout China. This study ignore some factors that can impact the result, factors such as birth rate or natural deaths Artificial intelligence with another predicting model can provide more realistic figures as (Yang et al., 2020). Table 2 below provides summary on the literature review. Table 2: A detailed comparative analysis table of different techniques S R # Title Publi shed Technique Advantages Disadvantages Remarks 1 Yang et al. (2020) 2020 Artificial- intelligence and (SEIR) It’s used an AI- based model trained on past SARS dataset for more effective predictions This study did not take into consideration of phase-adjusted protective measures and Realtime changing parameters, which can disturb the prediction accuracy Using Mathematical tools for Epidemic prediction but it considers homogeneous Population 2 S. Zhang et al. (2020) 2020 Heuristic models provide mortality and disease spread The heuristic models define only one phase of the breakout but fail when the disease grows toward another stage. Useful for finding mortality and disease spread. 3 Binti Hamzah et al. (2020) 2020 Susceptible- Exposed- Infectious- Recovered Model (SEIR) assess the decline in efficient contacts when the completion of the acute and extreme closure of the society Since Outbreak in not successfully contained, so it Does not provide an entirely accurate prediction Also, Provide political as well as economic expected Predictions 4 Qasim et al. (2020) 2020 Mathematical derivation Model Model Can Validate new future data The lower bound remains close to actual data for the same situation provide a generic prediction of expected COVID-19 cases BRAIN. Broad Research in December, 2020 Artificial Intelligence and Neuroscience Volume 11, Issue 4 173 5 L. Li et al. (2020) 2020 Simulation Model provide transmission of the virus using Gaussian distribution, Accurate results The model can provide Accurate results for Developed countries, but for undeveloped countries, the simulation propagation may not be as accurate use Simulation of propagation which can be reliable for factors of Spread 6 Petropoulo s & Makridakis (2020) 2020 S-Curve model and univariate time series model Past patterns like precautionary measures will remain effective as data is accurate It does not provide Long term prediction of cases as well as cumulative cases predictions Using time- series real- time data, predictions can be used for Govt. 7 Ardabili et al. (2020) 2020 SIR Model Provide generalized Ability of ML models and accuracy for different lead times SIR model cannot provide promising results, where independent individual for social distancing Use the ML model as well SIR model and provide differentiation of accuracy 8 Yan et al. (2020) 2020 Supervised B oost-classifier Represents a simple clinical test for precisely qualify death risk the model will not remain the same and start varying for different data Identify high death risk patients in the early stages of the disease 9 Wynants et al. (2020) 2020 PROBATE support medical decision making Update Coronavirus predictive models are available This Paper evaluates other prediction models 10 Y. Li et al. (2020) 2020 SEIQDR, TS Modelama model Dynamic Model Does not consider natural deaths and birth Useful in developed countries 11 Qin et al. (2020) 2020 ARMA More useful as SMI data is more user relative Big data from Unauthentic source It can be useful for undeveloped countries as SMI is more user relative Predicting COVID-19 Incidence Using Data Mining Techniques: A Case Study … Saba NOOR, et al. 174 12 Tiwari et al. (2020) 2020 Time series forecasting Method Predicts daily cumulative cases effectively Ignores Social- economic factors The model can be implemented by using WHO data rather than Govt. provided 13 Fong et al. (2020) 2020 Machine learning and Multiple regression Forecast with relatively lowest prediction error Small data can be used for a fully observable situation not for partial observable Prediction using small data requires an accurate data set 14 Stübinger & Schneider (2020) 2020 Dynamic Time Warping Forecast the breakout of COVID-!9 using a lead-lag structure Cannot predict the long term spread Produce results based on Differentiatio n of different countries 15 Chakrabort y & Ghosh (2020) 2020 Wavelet- based Forecasting This model is most suitable for nonstationary data. Not provide an accurate result for stationary data Lack of accuracy as compared to ARMA model 16 Avery et al. (2020) 2020 Phenomenol ogical modeling Interpreting the limited data Require Govt. provided accurate facts and figures Can guide in for economic policymaking during this epidemic 17 Janies et al. (2008) 2020 SDT (Demarcation of sequence e characters) Provide interpretation of Zoonotic potential of Coronavirus It lacks Out grouping and rooting criteria Provide bio medic details of COVID-19 18 G. Zhang et al. (2020) 2020 Improved SEIR Dynamically predicts results Insufficient test cases can predict small scale predictions 19 Wu et al. (2020) 2020 EIR- metapopulati on model Provide support validity for the forecast. Ignore the traveling factor of disease spread It can also nowcast current situation that happening BRAIN. Broad Research in December, 2020 Artificial Intelligence and Neuroscience Volume 11, Issue 4 175 20 Murray (2020) 2020 Statistical Model Explicitly supports age structure variation Not dynamic as more accurate data will be a need as no. of patients increases Can help for better management of health resource in case of instant increase of cases Statement of the Problem On February 26, 2020, Pakistan registered its first Covid-19 case, and on March 25, 2020, Pakistan confirmed its first death in Lahore due to Covid-19. Since from February 26, Covid-19 Outbreak keep its spread in Pakistan, and as the government of Pakistan lifted most of the lockdown, It is essential to predict what Pakistan affords this ease. Moreover, it can be able to overcome this pandemic, or it will become another America or Wuhan! Objectives of the Study The main objective of this study is to predict Coronavirus incidence and its trends across the different regions of the country by using an Efficient and Strong Model. ● Analysing the current trend of Covid-19 in Pakistan. ● Developing a reliable predictive model ● Predicting the coronavirus cases using Linear Regression. Methodology Based on the gathered outbreak data, this model tried to discover the transmission rule of the coronavirus, forecast the breakout situation. The dataset of Coronavirus is collected from humadata.org and verified through figures provided by the Ministry of Health of Pakistan. To carry out this prediction, Weka, a tool of data mining which was developed by The University of Waikato, New Zealand. Weka applies different algorithms on datasets and provides results. There are four major phases of this model. In the first phase data pre-processing and data, transformation is carried out. The second phase of this study comprises of model training, in which Linear Regression as a forecasting algorithm is used. During the training of the model, cumulative confirmed, recovered and mortalities cases area fed as the dependent variable and time-series data variable as the independent variable. Predicting COVID-19 Incidence Using Data Mining Techniques: A Case Study … Saba NOOR, et al. 176 Linear regression plots straight lines on scatter graph, so the possibility of outliers is minimum, but it has been observed that in daily cases, total daily cases decline or incline against the plotted curves causing outliers. In such regard, it is better to use mode instead of median as it provides a Real-time accurate average. The third phase of models validates the accuracy of the model, in which RSME and MAE are considered. The fourth and last phases provide results and forecasts for the next 58 days. Figure 2 represents a detailed overview of Proposed Methodology with all these four phases of the model as: Figure 2: A detail overview of Proposed Methodology Data-Pre-Processing: The Required data for this model is filtered from the global pool time-series dataset. The attributes of this dataset are confirmed cases, recovered cases, and deaths to date. Data-Transformation: Data pre-processing provides the national data set required by the model. However, as Weka is being used as a data mining tool for this model, the default format for data is. arff, so after extracting meaningful data for this model, data transformation is applied in which data is transformed from CSV to arff. Training and evaluation of data: For training and testing of this model, 80% of data are used for training and 20% for evaluation. BRAIN. Broad Research in December, 2020 Artificial Intelligence and Neuroscience Volume 11, Issue 4 177 Model Training: The main objective of training the model is for learning so it can generate and predict. This model uses Linear regression for forecasting. 80% of data is feed to the model, and 20% of data is used for its performance and efficiency evaluation. Validation of Model: To evaluate the accuracy of the Model RMSE is mainly targeted. Results This is a dynamic predictive model that can predict data changes, and the model can predict cases on a daily or weekly basis. This model works with three datasets and predicts cumulative, confirmed cases of patients infected with Covid-19, death toll, and recovered cases. As the number of confirmed cases increases rapidly in the last three weeks and from Figure 3 generated by this model concludes that the confirmed cases in Pakistan are expected to increase rapidly. This model expects that the number of coronaviruses confirmed cases till July 31, 2020, will get a rapid rate with no effective policy to encourage masses for social distancing and other safety measures. Figure 3: Confirmed cases Predicting COVID-19 Incidence Using Data Mining Techniques: A Case Study … Saba NOOR, et al. 178 Figure 4: Recovered cases Figure 5: Death cases This model also predicts recovered cases and deaths for this period. The recovery rate will also swiftly increase with nearly 90,000 recoveries, in figure 4. From Figure 5, the death toll for this period is expected to remain at 12,000. BRAIN. Broad Research in December, 2020 Artificial Intelligence and Neuroscience Volume 11, Issue 4 179 Evaluation of results Evaluation of Model is performed using very well-known measures of accuracy, mean absolute error (MAE), and Root means square error (RMSE). The 58-day evaluation of predicted cases is represented in Figures 6, 7, and 8. Figure 6: Evaluation of Confirmed cases Figure 7: Evaluation of Recover cases Predicting COVID-19 Incidence Using Data Mining Techniques: A Case Study … Saba NOOR, et al. 180 Figure 8: Evaluation of Death cases Discussion and Limitations In this paper, we have reviewed many datamining kinds of literature and their techniques. Susceptible-Exposed-Infectious-Recovered Model (SIER) is a mathematical model that describes circumstances in which an individual with an infectious disease becomes a source of infection for others. As from the name, this model has four stages, with parameter β (beta), which controls the rate of spread, α (alpha) incubation rate, and γ (gamma), which is the recovery rate. This technique base on the SIR model. This technique forecast future events Just like COVID-19, which spreads through close contact of infected masses. In early 2020, one of the biggest pandemics of the 21 st century came to light with many mortalities and infected cases. The rate of transfer of this virus was very fast in both developed or underdeveloped countries, and it was essential to predict and analyze this rapid spread; till now, many researchers have proposed many techniques and models. The forecast has relatively the lowest prediction error as it used machine learning algorithms for the prediction of corona cases. Machine learning is one of the significant developments of the last ten years. It is an application of Artificial intelligence in which we train the machine by providing available data, and machines can then use artificial neural networks upon some pattern to BRAIN. Broad Research in December, 2020 Artificial Intelligence and Neuroscience Volume 11, Issue 4 181 provide results. Similarly, one of the most critical applications of AI is deep learning; it is a more advanced version of machine learning. The concepts of deep learning were proposed in the early 2000s, but the breakthrough in deep learning came after the winter of AI in 2010. As from the name, this model has four stages, with parameter β (beta), which controls the rate of spread, α (alpha) incubation rate, and γ (gamma), which is the recovery rate. This technique base on the SIR model. This technique forecast future events Just like COVID-19, which spreads through close contact of infected masses. The Heuristic model use some early profit estimation techniques, find the best cost-effective solutions; completeness is not guaranteed at some point where backtracking is not possible or not be efficient. This study is not considering any Social and economic factors such as Educational, economic, or relational beliefs. These factors may affect the spread. Conclusion The pandemic of coronavirus came to the world in early January 2019 and till June 04, 2020, almost the whole world is affected by it. This virus is closely related to bat coronaviruses causing COVID-19 disease. As explained earlier there are many known symptoms of this disease such as tiredness, fever, and dry cough. The disease of COVID-19 spread exponentially causing a rapid infection rate. This infection rate is even faster in the Third world and highly populated countries like India, Bangladesh Pakistan, etc, and the current situation of Pakistan is not satisfactory as the infection rate continuously rising, with very limited finical and medical resources. Pakistan must take proactive measures to gain control over this pandemic and this study can help policymakers to take comprehensive action as well as necessary future needs in the health sector. We have carried out this study to find out the future trend of the situation with the help of the data mining technique of Linear regression, with the help of three different cumulative data sets of recovered, deceased, confirmed cases, and our proposed methodology. This model finds out that the infection rate will gradually increase but at the same time, it is also observed that the recovery rate will increase rapidly as compared to the death rate. In the future, we will analyze this rapid rate of recovery as compared to the death and some other social factors like public awareness and personal belief of the public regarding reality and severeness of the Coronavirus. Predicting COVID-19 Incidence Using Data Mining Techniques: A Case Study … Saba NOOR, et al. 182 Conflict of Interest Statement The authors have no conflicts of interest to declare. Ethical Approval Approval was not required. Reference Ardabili, S. F., Mosavi, A., Ghamisi, P., Ferdinand, F., Varkonyi-Koczy, A. R., Reuter, U., Rabczuk, T., & Atkinson, P. M. (2020). COVID-19 Outbreak Prediction with Machine Learning. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.3580188 Avery, C., Bossert, W., Clark, A., Ellison, G., & Ellison, S. F. (2020). Policy Implications of Models of the Spread of Coronavirus: Perspectives and Opportunities for Economists. National Bureau of Economic Research. https://doi.org/10.3386/w27007 Binti Hamzah, F. A., Lau, C. H., Nazri, H., Ligot, D. C., Lee, G., Tan, C. L., & et al. (2020). CoronaTracker: World-wide Covid-19 outbreak data analysis and prediction. Bulletin of the World Health Organization, March, Submitted. Chakraborty, T., & Ghosh, I. (2020). Real-time forecasts and risk assessment of novel coronavirus (COVID-19) cases: A data-driven analysis. Chaos, Solitons and Fractals, 135. https://doi.org/10.1016/j.chaos.2020.109850 Fong, S. J., Li, G., Dey, N., Gonzalez-Crespo, R., & Herrera-Viedma, E. (2020). Finding an Accurate Early Forecasting Model from Small Dataset: A Case of 2019-nCoV Novel Coronavirus Outbreak. International Journal of Interactive Multimedia and Artificial Intelligence, 6(1), 132. https://doi.org/10.9781/ijimai.2020.02.002 Janies, D., Habib, F., Alexandrov, B., Hill, A., & Pol, D. (2008). Cladistics. 24, 111– 130. Li, L., Yang, Z., Dang, Z., Meng, C., Huang, J., Meng, H., Wang, D., Chen, G., Zhang, J., Peng, H., & Shao, Y. (2020). Propagation analysis and prediction of the COVID-19. Infectious Disease Modelling, 5, 282–292. https://doi.org/10.1016/j.idm.2020.03.002 Li, Y., Wang, B., Peng, R., Zhou, C., Zhan, Y., Liu, Z., Jiang, X., & Zhao, B. (2020). Mathematical Modeling and Epidemic Prediction of COVID-19 and Its Significance to Epidemic Prevention and. Annals of Infectious Disease and Epidemiology, 5(1), 1052. Murray, C. J. (2020). Forecasting COVID-19 impact on hospital bed-days, ICU-days, ventilator-days and deaths by US state in the next 4 months. 114. https://doi.org/10.1101/2020.03.27.20043752 BRAIN. Broad Research in December, 2020 Artificial Intelligence and Neuroscience Volume 11, Issue 4 183 Petropoulos, F., & Makridakis, S. (2020). Forecasting the novel coronavirus COVID-19. PLoS ONE, 15(3), 1–8. https://doi.org/10.1371/journal.pone.0231236 Qasim, M., Ahmad, W., Zhang, S., Yasir, M., & Azhar, M. (2020). Data model to predict prevalence of COVID-19 in Pakistan. https://doi.org/10.1101/2020.04.06.20055244 Qin, L., Sun, Q., Wang, Y., Wu, K. F., Chen, M., Shia, B. C., & Wu, S. Y. (2020). Prediction of number of cases of 2019 novel coronavirus (COVID-19) using social media search index. International Journal of Environmental Research and Public Health, 17(7). https://doi.org/10.3390/ijerph17072365 Stübinger, J., & Schneider, L. (2020). Epidemiology of Coronavirus COVID-19: Forecasting the Future Incidence in Different Countries. Healthcare, 8(2), 99. https://doi.org/10.3390/healthcare8020099 Tiwari, S., Kumar, S., & Guleria, K. (2020). Outbreak trends of CoronaVirus (COVID-19) in India: A Prediction. Disaster Medicine and Public Health Preparedness, May. https://doi.org/10.1017/dmp.2020.115 Wu, J. T., Leung, K., & Leung, G. M. (2020). Nowcasting and forecasting the potential domestic and international spread of the 2019-nCoV outbreak originating in Wuhan, China: a modelling study. The Lancet, 395(10225), 689–697. https://doi.org/10.1016/S0140-6736(20)30260-9 Wynants, L., Van Calster, B., Collins, G. S., Riley, R. D., Heinze, G., Schuit, E., Bonten, M. M. J., Damen, J. A. A., Debray, T. P. A., De Vos, M., Dhiman, P., Haller, M. C., Harhay, M. O., Henckaerts, L., Kreuzberger, N., Lohmann, A., Luijken, K., Ma, J., Andaur Navarro, C. L., … Van Smeden, M. (2020). Prediction models for diagnosis and prognosis of covid-19: Systematic review and critical appraisal. The BMJ, 369. https://doi.org/10.1136/bmj.m1328 Yan, L., Zhang, H. T., Goncalves, J., Xiao, Y., Wang, M., Guo, Y., Sun, C., Tang, X., Jing, L., Zhang, M., Huang, X., Xiao, Y., Cao, H., Chen, Y., Ren, T., Wang, F., Xiao, Y., Huang, S., Tan, X., … Yuan, Y. (2020). An interpretable mortality prediction model for COVID-19 patients. Nature Machine Intelligence, 2(5), 283–288. https://doi.org/10.1038/s42256-020- 0180-7 Yang, Z., Zeng, Z., Wang, K., Wong, S. S., Liang, W., Zanin, M., Liu, P., Cao, X., Gao, Z., Mai, Z., Liang, J., Liu, X., Li, S., Li, Y., Ye, F., Guan, W., Yang, Y., Li, F., Luo, S., … He, J. (2020). Modified SEIR and AI prediction of the epidemics trend of COVID-19 in China under public health interventions. Journal of Thoracic Disease, 12(3), 165–174. https://doi.org/10.21037/jtd.2020.02.64 Predicting COVID-19 Incidence Using Data Mining Techniques: A Case Study … Saba NOOR, et al. 184 Zhang, G., Pang, H., Xue, Y., Zhou, Y., & Wang, R. (2020). Forecasting and Analysis of Time Variation of Parameters of COVID-19 Infection in China Using An Improved SEIR Model. 1–6. https://doi.org/10.21203/rs.3.rs-16159/v1 Zhang, S., Diao, M. Y., Yu, W., Pei, L., Lin, Z., & Chen, D. (2020). Estimation of the reproductive number of novel coronavirus (COVID-19) and the probable outbreak size on the Diamond Princess cruise ship: A data-driven analysis. International Journal of Infectious Diseases, 93, 201–204. https://doi.org/10.1016/j.ijid.2020.02.033