Plane Thermoelastic Waves in Infinite Half-Space Caused


Decision Making: Applications in Management and Engineering  
ISSN: 2560-6018 
eISSN: 2620-0104  

 DOI: https://doi.org/10.31181/dmame0317102022r 

* Corresponding author. 
E-mail addresses: vishnurrr40@gmail.com  (V. K. Rai), santonabchakraborty@gmail.com (S. 
Chakraborty), s_chakraborty00@yahoo.co.in (S. Chakraborty),  

ASSOCIATION RULE MINING FOR PREDICTION OF  

COVID-19 

Vishnu Kumar Rai1, Santonab Chakraborty2 and Shankar Chakraborty1* 

1 Department of Production Engineering, Jadavpur University, Kolkata, West Bengal, 
India 

2 Industrial Engineering and Management Department, Maulana Abul Kalam Azad 
University of Technology, West Bengal, India 

 
Received: 13 August 2021;  
Accepted: 26 September 2022;  
Available online: 17 October 2022. 

 
Original scientific paper  

Abstract: COVID-19 is a raging pandemic that has created havoc with its 
impact ranging from loss of millions of human lives to social and economic 
disruptions of the entire world. Therefore, error-free prediction, quick 
diagnosis, disease identification, isolation and treatment of a COVID patient 
have become extremely important. Nowadays, mining knowledge and 
providing scientific decision making for diagnosis of diseases from clinical 
datasets has found wide-ranging applications in healthcare sector. In this 
direction, among different data mining tools, association rule mining has 
already emerged out as a popular technique to extract invaluable 
information and develop important knowledge-base to help in intelligent 
diagnosis of distinct diseases quickly and automatically. In this paper, based 
on 5434 records of COVID cases collected from a popular data science 
community and using Rapid Miner Studio software, an attempt is put 
forward to develop a predictive model based on frequent pattern growth 
algorithm of association rule mining to determine the likelihood of COVID-19 
in a patient. It identifies breathing problem, fever, dry cough, sore throat, 
abroad travel and attended large gathering as the main indicators of COVID-
19. Employing the same clinical dataset, a linear regression model is also 
proposed having a moderately high coefficient of determination of 0.739 in 
accurately predicting the occurrence of COVID-19. A decision support system 
can also be developed using the association rules to ease out and automate 
early detection of other diseases.  

Key words: COVID-19, Association rule mining, Frequent pattern growth, 
Prediction, Regression. 

mailto:vishnurrr40@gmail.com
mailto:santonabchakraborty@gmail.com
mailto:s_chakraborty00@yahoo.co.in


 Rai et al./Decis. Mak. Appl. Manag. Eng. (2022)  

2 

1. Introduction  

Coronavirus disease 2019 (COVID-19) is mainly caused by severe acute 
respiratory syndrome coronavirus 2 (SARS-CoV-2) which is contagious in nature. 
The transmission of COVID-19 occurs when a person inhales virus-containing 
respiratory droplets and airborne particles from an infected patient. The first known 
case of COVID was from Wuhan, China. It has now become a raging pandemic 
creating havoc with its impact ranging from loss of millions of lives to social and 
economic disruptions around the globe. The impact of COVID in India, being the 
second populous country, is threatening. In India, the first few cases were from 
Kerala resulting in the first wave. Total lockdown was imposed from 25th March, 
2020 and the number of active cases began to drop from September, 2020. A larger 
and much powerful second wave hit India on March 2021. Presently, India has the 
largest number of COVID cases in Asia. As of 12 June 2021, it has the second-highest 
number of confirmed cases in the world (after the United States) with 29.3 million 
reported cases of COVID-19 infection. It has also the third-highest number of COVID-
19 deaths (after the United States and Brazil) with 367,081 deaths. India became the 
first country to report over 400,000 new cases in a 24 hour period on 30 April 2021. 
As of 30 June 2021, the total number of confirmed cases in India is 3,03,62,848 with 
total number of deaths as 3,98,454. The rapid increase of cases in the peak of both 
first and second waves of COVID-19 had put tremendous pressure on the medical 
infrastructure leading to shortage of hospital beds, oxygen cylinders, vaccines and 
other medicines in the country. There was also a great chance that the primary 
health workers were being infected by the same disease while treating the COVID 
patients.  

The standard diagnostic procedure for COVID is to detect the presence of 
coronavirus’s nucleic acid in human body which is usually performed by real-time 
reverse transcription polymerase chain reaction (rRT-PCR), transcription-mediated 
amplification (TMA) or by reverse transcription loop-mediated isothermal 
amplification (RT-LAMP) test from a nasopharyngeal swab. Each of these testing 
procedures for COVID-19 is time consuming requiring plenty of resources. At the 
peak of waves, when there are millions of daily cases, it has become crucial for 
having a more quick and efficient approach to determine whether a person has 
COVID-19 or not. While detecting this disease and treating the infected patients, the 
concerned healthcare sector is generating huge volume of valuable information 
which can be effectively deployed for rapid diagnosis, identification and treatment of 
an individual. In the present day pandemic scenario, mining knowledge and 
providing scientific decision making for diagnosis of COVID-19 from the clinical 
dataset has turned out to be extremely important. With rapid development of 
computational facilities, data mining technology has gained increasing attention to 
discover interesting knowledge in the form of useful patterns, changes, associations, 
anomalies and structures from large volume of data stored in databases, data 
warehouses or other data repositories. Association rule mining is an effective data 
mining tool mainly deployed to extract association relationships or correlations/co-
occurrences among a given set of items. Due to its simplicity in framing rules based 
on ‘If-Then’ statements, association rule mining is now being extensively used to 
explain patterns from seemingly independent repositories, like transactional 
databases, relational databases or clinical databases (Kaur & Madan, 2015; Sabthami 
et al. 2016). The developed rules can assist the physicians in diagnosing patients 
based on the conditional probability while comparing the symptom relationships in 
the data from the past cases (Hareendran & Chandra, 2017; Cheng & Wang, 2017). In 


Association rule mining for prediction of COVID-19 

3 

this paper, an attempt is put forward to employ association rule mining as an 
effective predictive tool for diagnosing COVID-19 patients based on frequent pattern 
(FP) growing algorithm. Using a large clinical database containing 5434 records of 
COVID cases, breathing problem, fever, dry cough, sore throat, abroad travel and 
attended large gathering are identified as the most important COVID-19 indicators. 
With the help of Rapid Miner Studio software, the corresponding association rules 
are framed for prediction of COVID-19 disease in a patient. The developed regression 
model would also aid in COVID-19 diagnosis correlating the symptoms and 
likelihood of this disease. Its application would save a lot of time and resources in the 
case of huge influx of patients. Using this predictive model, the patients themselves 
can envisage whether they have the disease or not and can start taking necessary 
precautions (Stilou et al., 2001).  

Association rule mining, logistic regression, discriminant analysis etc. are 
different types of machine learning approaches. In association rule mining, simple 
‘If…Then’ clauses are framed to discover the existent relationships between 
independent relational databases, transactional databases, and other forms of data 
repositories, requiring simple counting to observe frequently occurring patterns, and 
similarities and dissimilarities among different objects. On the other hand, logistic 
regression employs binary variables where the response records either success or 
failure for a given event. It can also be extended to combine more than one 
independent continuous or categorical variable. Discriminant analysis develops a set 
of prediction equations based on independent variables for classifying new items and 
interpreting the relationship among the considered variables. The application of 
discriminant analysis assumes that the data is normally distributed and each 
attribute has the same variance. Among these machine learning approaches, 
association rule mining is the simplest one, requiring no assumption about the 
underlying distribution of the initial dataset, while framing easy to understand rules. 
With the help of different parameters, like support, confidence and lift, it can clearly 
identify the strongest rule penetrating more inside the problem. 

2. Association rule mining 

Association rule mining is one of the techniques of data mining to extract 
interesting relations (dependencies) and patterns/links among variables from large 
seemingly independent datasets in order to draw useful inferences and decisions for 
practical use. Application of association rule mining helps in generating simple ‘If-
Then’ statements to analyze frequently occurring patterns in a dataset and/or 
identify the inherent relationships between independent and dependant variables in 
the dataset. It can also frame useful rules from qualitative and categorical datasets 
which are often difficult to interpret (Ordonez et al., 2006). An association rule 
consists of two components, i.e. an antecedent (If) and a consequent (Then). An 
antecedent is an item found within the dataset and a consequent is an item observed 
in combination with the antecedent (Freeda & Florence, 2017). Thus, in a rule, X→Y, 
X is the antecedent (If) and Y is the consequent (Then). In a clinical database, the rule 
{Symptom1, Symptom2}→{Disease1} signifies that a patient having Symptom1 and 
Symptom2 would tend to have Disease1. For example, there is a set of symptoms A = 
{a1,a2,,…,an} and B indicates entries of multiple patients in a clinical database B = 
{b1,b2,…,bn}. Each patient contains a subset of elements in A–B  A, and the 

corresponding association rule is the implication X→Y, where X  A, Y  A and X∩Y = 

∅. In a clinical database, an antecedent is a specific symptom or combination of 


 Rai et al./Decis. Mak. Appl. Manag. Eng. (2022)  

4 

symptoms and a consequent can be a disease caused due to occurrence of the 
antecedent. The generated rules would thus help the concerned physicians in making 
faster decisions with correct diagnosis of a disease (Kulkarni & Mundhe, 2017; 
Lakshmi & Vadivu, 2017).  

The effectiveness of the developed association rules is usually validated using 
three parameters, i.e. support, confidence and lift. Support measures the percentage 
of items in the given dataset following a particular rule, i.e. how often a rule occurs in 
the dataset.  

Support(X→Y) = P(X∪Y)  
Confidence evaluates the probability of inclusion of item X also leading to the 

inclusion of Y. It is the conditional probability of how often a rule is found out to be 
true.  

Confidence(X→Y) = P(Y|X) = Support(X→Y)/Support(X) 
Lift finally measures the performance of an association rule.  
Lift(X→Y) = Confidence(X→Y)/Support(X→Y) 
Higher value of confidence signifies higher strength of a particular rule. On the 

other hand, higher value of lift symbolizes that having X and Y together is not 
accidental, but due to the presence of a meaningful relationship between them.  

Lift = 1 signifies that the probability of occurrence of antecedent and consequent 
is not dependant on each other. 

Lift > 1 determines the degree to which antecedent and consequent are 
dependent to each other. 

Lift < 1 signifies that one item has a negative effect on the other, i.e.  one item is a 
substitute for the other item present in the rule. 

There are three popular algorithms, i.e. apriori algorithm, ECLAT (equivalence 
class clustering and bottom-up Lattice transversal) algorithm and frequent pattern 
(FP) growth algorithm deployed for framing the relevant association rules from a 
given dataset (Prithiviraj & Porkodi, 2015). In apriori algorithm (which is an array-
based algorithm), frequent itemsets are used for generation of the association rules. 
It employs a breadth-first search and hash tree to efficiently identify the frequent 
itemsets in a transactional database. But, its application is time-consuming and the 
corresponding runtime may increase exponentially (Jain & Gautam, 2013; Sambasiva 
Rao & Uma Devi, 2017). The ECLAT adopts a depth-first search technique to find out 
the frequent itemsets in a relational database. It has less execution time than apriori 
algorithm. The FP growth is a tree-based algorithm which employs depth-first search 
to compress the dataset to form an FP-tree. It is faster than the other two algorithms 
and its runtime increases linearly. But for large FP-tree, it may not fit in the memory 
space, thus being expensive to build (Thamer, et al., 2020). As in this paper, FP 
growth algorithm is employed for development of the association rules for effective 
prediction of COVID-19, its procedural steps are detailed out here-in-under.  

Association rule mining using FP growth algorithm mainly consists of two steps, 
i.e. a) generation of frequent item sets and b) formation of association rules from the 
frequent item sets. To demonstrate generation of frequent item sets employing FP 
growth algorithm, let us consider a clinical dataset containing different symptoms for 
nine infected patients. In this database, P and S represent Patient and Symptom 
respectively.  

P1 = (S1, S2, S5); P2 = (S2, S4); P3 = (S2, S3); P4 = (S1, S2, S4); P5 = (S1, S3); P6 = 
(S2, S3); P7 = (S1, S3); P8 = (S1, S2, S3, S5) and P9 = (S1, S2, S3) 

Now, the corresponding FP-tree is developed based on the following steps: 


Association rule mining for prediction of COVID-19 

5 

a) Scan the dataset to determine the support count of each symptom. 
Remove the less-frequent symptom(s) and sort the frequent symptoms 
in descending order of their occurrence. 

b) Scan the dataset of one patient at a time resulting in formation of the FP-
tree. For each transaction, 

i) If it has a set of unique symptoms, form a new path and set the 
counter for each node to 1. 

ii) If it shares a set of common symptoms, increase the common 
symptom itemset node counters and create new nodes, if needed. 

c) This process needs to be continued until each patient case is mapped into 
the tree. 

This algorithm scans the database only twice while directly compressing it into 
the corresponding FP-tree. In this algorithm, minimum support (basically acts as a 
cut-off) can be used to classify the frequent and less-frequent item sets in a database. 
The less-frequent items are ignored while developing the FP-tree. Identification of 
the most appropriate cut-off for subsequent FP-tree generation is a critical task. 
Lower cut-off with minimum support may include many item sets resulting in less 
significant results. On the other hand, higher cut-off may result in finding out zero 
item sets with no generation of FP-tree. In this illustrative example, the support 
count of each symptom is determined as given below (in descending order): S2 = 7, 
S1 = 6, S3 = 6, S4 = 2 and S5 = 2. Now, the patient datasets are rearranged according 
to the descending order of support count of different symptoms.  

P1 = (S2, S1, S5); P2 = (S2, S4); P3 = (S2, S3); P4 = (S2, S1, S4); P5 = (S1, S3); P6 = 
(S2, S3); P7 = (S1, S3); P8 = (S2, S1, S3, S5) and P9 = (S2, S1, S3). Based on the dataset 
with nine patient cases and five symptoms in the illustrative example, the following 
FP-tree of Figure 1 is developed.  This FP-tree is generated while considering null as 
the root node. The count of each symptom for each patient case is highlighted in 
parenthesis at each node.  

 
Figure 1. FP-tree for the illustrative example 

Next, the developed FP-tree is mined. The lowest node of the tree is checked first 
along with its links. The lowest node represents the symptom with minimum support 
count. From the lowest node, traverse the path in the FP-tree to the null node. Each 
such path is termed as conditional pattern base. The conditional FP-tree is formed 


 Rai et al./Decis. Mak. Appl. Manag. Eng. (2022)  

6 

while counting the symptoms in the path. The symptoms meeting the minimum 
support of 2 are considered here for subsequent generation of frequent itemsets, as 
exhibited in Table 1. In this table, six 2-frequent and two 3-frequent itemsets are 
generated.  

Table 1. Generation of frequent itemsets for the illustrative example 

Symptom Conditional pattern 
base 

Conditional 
FP-tree 

Frequent itemsets 

S5 {{S2,S1:1},{S2,S1,S3:1}} [S2:2, S1:2] {S2,S5:2},{S1,S5:2},{S2,S1,S5:2} 
S4 {{S2,S1:1},{S2:1}} [S2:2] {S2,S4:2} 
S3 {{S2,S1:1},{S2:2},{S1:2}} [S2:4, S1:2], 

[S1:2] 
{S2,S3:4},{S1,S3:4},{S2,S1,S3:2} 

S1 {{S2:4} [S2:4] {S2,S1:4} 

Based on the frequent itemsets in Table 1, the corresponding association rules 
are thereby generated using the following steps: 

a) Generate all non-empty subsets of each frequent itemset U. 
b) For every non-empty subset F of U, formulate the rule : 
F→(U–F) if (support_count(U)/support_count(F) ≥ minimum_confidence 
where minimum_confidence is the threshold confidence level. 

For example, consider the first frequent itemset U = (S2,S1,S5). Generate all the 
non-empty subsets of U as F: {S1},{S2},{S5},{S1,S2},{S2,S5},{S1,S5},{S1,S2,S5}. For 
every non-empty subset F of U, the corresponding association rules are framed, as 
given in Table 2. In this table, support_count is the number of occurrences of all the 
elements in a set (U or F) together in the dataset, Confidence Calculated = 
(support_count(U)/support_count(F))×100, Support = ((support_count(U))/N)×100, 
Lift = Confidence Calculated/Support and N is the total number of patient cases in the 
example. Here, the threshold confidence value is arbitrarily taken as 80%. It can be 
noticed from this table that among the generated rules, only rules 3, 5 and 6 are 
accepted with their confidence levels greater than or equal to the set threshold value. 
Thus, for this example, the following association rules are developed: S5→(S2,S1); 
(S1,S5)→S2 and (S2,S5)→S1.  

Table 2. Association rules for the illustrative example 

Rule 
No. 

Association 
Rules 

support_ 
count(U) 

support_ 
count(F) 

Confidence 
Calculated 

Threshold 
Confidence 

N Support Lift 
Accepted/ 
Rejected 

1 S1→(S2,S5) 2 6 33.33 80 9 22.22 1.50 Rejected 
2 S2→(S1,S5) 2 7 28.57 80 9 22.22 1.29 Rejected 
3 S5→(S2,S1) 2 2 100.00 80 9 22.22 4.50 Accepted 
4 (S1,S2)→S5 2 4 50.00 80 9 22.22 2.25 Rejected 
5 (S1,S5)→S2 2 2 100.00 80 9 22.22 4.50 Accepted 
6 (S2,S5)→S1 2 2 100.00 80 9 22.22 4.50 Accepted 

7 
(S1,S2,S5) 

→(Null) 
       Rejected 

 
It has been noticed that the apriori algorithm of association rule mining has 
already been successfully deployed for prediction/diagnosis of heart diseases (Said 
et al., 2015; Domadiya & Rao, 2018; Jamsheela, 2021), dengue (Jahangir et al., 2018), 
brain tumor (Sengupta et al., 2013), chronic kidney disease (Alaiad et al., 2020), 
infectious diseases (Brossette et al., 1998), pandemic diseases (Burvin & 
Dhanalakshmi, 2018; Aiswarya et al., 2020), COVID-19 (Çelik, 2020; Shawkat et al., 
2021; Tandan  et al., 2021), pediatric primary care (Downs & Wallace, 2000), 


Association rule mining for prediction of COVID-19 

7 

treatment of patients in an emergency department (Sarıyer & Taşar, 2020) etc. In 
this paper, based on a huge dataset of COVID-19 patients and using the FP growth 
algorithm of association rule mining, an attempt is put forward to discover COVID-19 
symptom patterns and rules which would support the initial identification of severe 
COVID-19 cases for early treatment and isolation. Based on the most frequent 
symptoms, a first-order regression model is also developed to assist prediction of 
COVID-19.    

3. Data collection 

In order to predict COVID-19 based on development of the corresponding 
association rules, the related data is collected from Kaggle.com which is the world’s 
largest online data science community. The data consists of the symptoms and other 
factors responsible for COVID-19 infection. They are based on the guidelines 
provided by the World Health Organization (WHO) (www.who.int ) and the Ministry 
of Health and Family Welfare, India (https://main.mohfw.gov.in). The data is in ‘Yes’ 
and ‘No’ format, where ‘Yes’ represents the presence of a particular symptom and 
‘No’ denotes absence of it. Based on the given guidelines, the considered factors for 
COVID-19 infection are as follows: a) breathing problem, b) fever, c) dry cough, d) 
sore throat, e) running nose, f) asthma, g) chronic lung disease, h) headache, i) heart 
disease, j) diabetes, k) hyper tension, l) fatigue, m) gastrointestinal, n) abroad travel, 
o) contact with other COVID patient, p) attended large gathering, q) visited public 
exposed places, r) family working in public exposed places and s) wearing masks at 
all times. In this database, COVID-19 is treated as the decisional (target) variable. The 
clinical dataset is in tabular form containing 5434 records of infected COVID cases. 
The snapshot of a small portion of the considered dataset is shown in Figure 2.  

 
Figure 2. A portion of the COVID-19 dataset  

4. Rule mining for COVID-19 prediction 

In this paper, using COVID-19 symptom dataset and employing FP growth 
algorithm, the corresponding association rules are extracted for early detection of 


 Rai et al./Decis. Mak. Appl. Manag. Eng. (2022)  

8 

this disease based on the application of Rapid Miner Studio Educational 9.9.000 
Software. In this software, there are options to select different operators which can 
perform varying functions ranging from data input to data analysis. Each operator 
has an input node and an output node through which the data is processed. These 
operators are combined together to perform a specific task. In order to extract the 
association rules using this software, the following steps are adopted:  

a) Data input: The data is fed into the software through the Read CSV (comma-
separated values) Operator. The output node of Read CSV operator is 
connected to the input node of FP-Growth Operator.  

b) Finding the frequent itemsets: The FP-Growth Operator is utilized to find out 
the frequent itemsets in the dataset. The output of this operator is then 
connected to the input node of Create Association Rule Operator. 

c) Extraction of the association rules: The Create Association Rule Operator 
extracts the corresponding association rules while considering the input in 
the form of frequent itemsets from FP-Growth Operator. 

d) Displaying the results: The Create Association Rule Operator has two output 
nodes, i.e. ite-frequent item sets obtained and rul-association rules 
extracted. These nodes are finally connected to two result nodes to display 
both the frequent itemsets and association rules. 

Figure 3 portrays the flow diagram representing the positioning of different 
operators in a logical way to provide the intended results. The FP growth algorithm 
thus generates frequent itemsets with size ranging from 1 to 5. The frequent itemsets 
of sizes 3, 4 and 5 with their corresponding support values for COVID-19 prediction 
are depicted in Tables 3-5 respectively. In Table 5, there is a frequent itemset of size 
5, i.e. {Covid-19, Dry cough, Fever, Sore throat, Breathing problem} which signifies 
that frequent occurrence of these four symptoms would lead to COVID-19. A support 
value of 0.374 symbolizes that there are 37.4% patient cases having dry cough, fever, 
sore throat and breathing problem resulting in COVID-19 infection. The 
corresponding association rules developed by this software are provided in Table 6. 
In this table, ‘Premises’ signifies the antecedent and ‘Conclusion’ signifies the 
consequent of the association rule generated. Thus, the occurrence of any of these 
rules would lead to increased likelihood of this disease in a patient. In order to 
generate these rules, the values of minimum support and confidence are considered 
as 30% and 1 respectively. The minimum support value of 30% symbolizes that at 
least 30% of the patient database contains any of these nine rules. On the other hand, 
lift > 1 indicates the existence of meaningful relationships between the symptoms 
and COVID-19 prediction. The formation of these rules and interrelations among 
them are pictorially exhibited in Figure 4. In this figure, all the variables (symptoms) 
and rules are shown in different blocks. The block for each rule has the format: Rule 
X (Support of the rule X/Confidence of rule X) where X is the corresponding 
association rule number. It provides a visual information of the unique 
symptoms/factors for COVID-19 and inter-connections among the generated rules. 
Now, based on all these association rules, it can be concluded that there are six most 
important factors, i.e. breathing problem, fever, dry cough, sore throat, abroad travel 
and attended large gathering responsible for infection of this disease in a person.  


Association rule mining for prediction of COVID-19 

9 

 
Figure 3. Flow diagram for extraction of association rules 

Table 3. Frequent itemsets of size 3 

SIZE SUPPORT ITEM1 ITEM2 ITEM3 
3 0.611 Covid-19 Dry cough Fever 
3 0.593 Covid-19 Dry cough Breathing problem 
3 0.530 Covid-19 Dry cough Running  nose 
3 0.392 Covid-19 Dry cough Fatigue 
3 0.367 Covid-19 Dry cough Visited public exposed places 
3 0.399 Covid-19 Dry cough Headache 
3 0.349 Covid-19 Dry cough Contact with COVID patient 
3 0.415 Covid-19 Dry cough Hyper tension 
3 0.376 Covid-19 Dry cough Asthma 
3 0.353 Covid-19 Dry cough Attended large gathering 
3 0.424 Covid-19 Dry cough Abroad travel 
3 0.596 Covid-19 Fever Sore throat 
3 0.510 Covid-19 Fever Breathing problem 
3 0.384 Covid-19 Fever Running  nose 
3 0.351 Covid-19 Fever Fatigue 
3 0.377 Covid-19 Fever Visited public exposed places 
3 0.411 Covid-19 Fever Contact with COVID patient 
3 0.360 Covid-19 Fever Hyper tension 
3 0.378 Covid-19 Fever Attended large gathering 
3 0.381 Covid-19 Fever Abroad travel 
3 0.526 Covid-19 Sore throat Breathing problem 
3 0.370 Covid-19 Sore throat Running  nose 
3 0.376 Covid-19 Sore throat Visited public exposed places 
3 0.401 Covid-19 Sore throat Contact with COVID patient 
3 0.384 Covid-19 Sore throat Attended large gathering 
3 0.374 Covid-19 Sore throat Abroad travel 
3 0.376 Covid-19 Breathing problem Contact with COVID patient 
3 0.355 Covid-19 Breathing problem Attended large gathering 

Table 4. Frequent itemsets of size 4 

SIZE SUPPORT ITEM1 ITEM2 ITEM3 ITEM4 
4 0.519 Covid-19 Dry cough Fever Sore throat 
4 0.433 Covid-19 Dry cough Fever Breathing problem 
4 0.363 Covid-19 Dry cough Fever Contact with COVID patient 
4 0.354 Covid-19 Dry cough Fever Abroad travel 
4 0.447 Covid-19 Dry cough Sore throat Breathing problem 
4 0.352 Covid-19 Dry cough Sore throat Contact with COVID patient 
4 0.448 Covid-19 Fever Sore throat Breathing problem 
4 0.362 Covid-19 Fever Sore throat Contact with COVID patient 

 
 Rai et al./Decis. Mak. Appl. Manag. Eng. (2022)  

10 

Table 5. Frequent itemsets of size 5 

SIZE SUPPORT ITEM1 ITEM2 ITEM3 ITEM4 ITEM5 
5 0.374 Covid-19 Dry cough Fever Sore throat Breathing problem 

 
Table 6. Association rules generated for COVID-19 prediction  

RULE NO. PREMISES CONCLUSION SUPPORT CONFIDENCE LIFT 
1 Abroad travel Covid-19 0.451 1 1.24 

2 
Dry cough, Attended large 

gathering 
Covid-19 0.390 1 1.24 

3 Dry cough, Abroad travel Covid-19 0.424 1 1.24 

4 
Fever, Attended large 

gathering 
Covid-19 0.378 1 1.24 

5 Fever, Abroad travel Covid-19 0.381 1 1.24 

6 
Sore throat, Attended 

large gathering 
Covid-19 0.384 1 1.24 

7 Sore throat, Abroad travel Covid-19 0.374 1 1.24 

8 
Breathing problem, 

Attended large gathering 
Covid-19 0.355 1 1.24 

9 
Dr cough, Fever, Abroad 

travel 
 0.354 1 1.24 

 
Figure 4. Formation of association rules   

Considering the initial dataset, and breathing problem, fever, dry cough, sore 
throat, abroad travel and attended large gathering as the major factors for COVID-19 
infection, a linear regression model is developed using the following steps:  

a) Data input: The relevant data is fed into Rapid Miner software through Read 
CSV operator with COVID-19 as the decisional (dependent) variable. 

b) Data preprocessing: The Replace Operator is employed to convert the data 
from ‘yes’ and ‘no’ to 1 and 0 respectively. The Guess Types Operator is used 
to transform all the variable data into numerical data. 


Association rule mining for prediction of COVID-19 

11 

c) Data processing: The Set Role Operator changes COVID-19 variable as a 
special attribute. It would help in developing the corresponding regression 
model considering COVID-19 as the dependant variable.  

d) Model development: The Linear Regression Operator finally generates the 
regression equation from the given dataset. 

The corresponding flow diagram, as exhibited in Figure 5, develops the 
regression equation correlating infection of COVID-19 and the main medical factors 
in the following form: 

Y =  0.030 + (0.208×Breathing problem) + (0.175×Fever) + (0.243×Dry cough) + 
(0.193×Sore throat) + (0.189×Abroad travel) + (0.177×Attended large gathering)  

where Y is the target variable (presence of COVID-19). A value of Y less than 0.5 
signifies less likelihood of COVID-19 in a patient; on the other hand, a value greater 
than or equal to 0.5 identifies more likelihood of COVID-19 infection in a patient. A 
moderately high coefficient of determination (R2) value as 0.739 provides an 
indication of acceptable accuracy of the developed predictive model. It indicates that 
almost 73.9% variation of the dependent variable (presence/absence of COVID-19) 
can be explained by the independent variables (symptoms/factors). 

Figure 5. Flow diagram for regression equation  

5. Conclusion 

Keeping in mind the requirements of early detection of COVID-19 for faster isolation 
and treatment of an infected patient, this paper proposes the application of FP 
growth algorithm to find out the frequent itemsets and extract the association rues 
with their corresponding confidence and support values. It is noticed that six factors, 
i.e. breathing problem, fever, dry cough, sore throat, abroad travel and attended large 
gathering are mainly responsible for COVID-19 infection. A linear regression 
equation-based predictive model is also developed to correlate those factors and 
presence of COVID-19 in a patient. A moderately high coefficient of determination 
value suggests that almost 73.9% variation of the dependent variable 
(presence/absence of COVID-19) can be explained by the independent variables 
(symptoms/factors). It would help in early prediction of this disease, thus saving 
valuable time and resources. But, if a patient is asymptotic, this model would not 


 Rai et al./Decis. Mak. Appl. Manag. Eng. (2022)  

12 

provide accurate results. Among the existing machine learning techniques, 
association rule mining has several advantages, like capability of dealing with 
different forms of data repositories, development of easy to understand clauses, no 
assumption about the underlying distribution of the data, application of support, 
confidence and lift parameters for developing the strongest rule etc. As a future 
scope, it is suggested to develop and integrate association rule mining in a decision 
support system for early diagnosis of COVID-19 and other severe diseases, like 
kidney related problems, brain tumor etc. With more real-time clinical dataset, those 
diseases can be diagnosed much faster while evaluating coexistence of the 
symptoms. 

Author Contributions: V.K.R.: Data collection, software, analysis; Santonab.C.: Draft 
preparation, review, technical writing; Shankar C.: Technical writing, editing. 

Conflicts of Interest: The authors declare that they have no known competing 
financial interests or personal relationships that could have appeared to influence 
the work reported in this paper 

Funding: This research received no external funding. 

Data Availability Statement: The related data is collected from Kaggle.com which is 
the world’s largest online data science community. 

References  

Aiswarya, P., Bhanu Sridhar, M., & Kavitha, L. (2020). Detection and prediction of 
frequent diseases in India through association technique using apriori algorithm and 
random forest regression. International Journal of Engineering Research & 
Technology, 9(3), 386-393. 

Alaiad, A., Najadat, H., Mohsen, B., & Balhaf, K. (2020). Classification and association 
rule mining technique for predicting chronic kidney disease. Journal of Information & 
Knowledge Management, 19(1), 2040015. 

Hareendran, S., & Chandra, S.S. (2017).  Association rule mining in healthcare 
analytics. In: Data Mining and Big Data. Tan, Y. et al. (eds), Springer International 
Publishing, 31-39. 

Brossette, S.E., Sprague, A.P., Hardin, J.M., Waites, K.B., Jones, W.T., & Moser, S.A. 
(1998). Association rules and data mining in hospital infection control and public 
health surveillance. Journal of the American Medical Informatics Association, 5(4), 
373-381. 

Burvin, J.S., & Dhanalakshmi, K. (2018). Pandemic disease detection and prevention 
system using mining with graph-based approach. International Journal of Pure and 
Applied Mathematics, 118(20), 4355-4360. 

Çelik, A. (2020). Using apriori data mining method in COVID-19 diagnosis. Journal of 
Engineering Technology and Applied Sciences, 5(3), 121-131. 

Cheng, C-W., & Wang, M.D. (2017). Healthcare data mining, association rule mining, 
and applications. In: Health Informatics Data Analysis. Health Information Science, 
Xu, D., Wang, M., Zhou, F., & Cai, Y. (eds), Springer, Cham, 201-210. 


Association rule mining for prediction of COVID-19 

13 

Domadiya, N., & Rao, U.P. (2018). Privacy-preserving association rule mining for 
horizontally partitioned healthcare data: a case study on the heart diseases. Sadhana, 
43, 127-141. 

Downs, S., & Wallace, M. (2000). Mining association rules from a pediatric primary 
care decision support system. In: Proceeding of the   Annual  Symposium  of 
American Medical Informatics Association, Los Angeles, USA, 200-204. 

Freeda, D.S. & Florence, M.L. (2017). An overview of disease analysis using 
association rule mining. International Journal of Scientific & Engineering Research, 
8(4), 113-117. 

Jahangir, I., Abdul, B., Hannan, A., & Javed, S. (2018). Prediction of dengue disease 
through data mining by using modified apriori algorithm. In: Proceedings of the 4th 
ACM International Conference of Computing for Engineering and Sciences, Kuala 
Lumpur, 1-4. 

Jain, D., & Gautam, S. (2013). Implementation of apriori algorithm in health care 
sector: A survey. International Journal of Computer Science and Communication 
Engineering, 2(4), 26-32. 

Jamsheela, O. (2021). Analysis of association among various attributes in medical 
data of heart patients by using data mining methods. International Journal of Applied 
Science and Engineering, 18(2), 2020215. 

Kaur, J., &, Madan, N. (2015). Association rule mining: A survey. International Journal 
of Hybrid Information Technology, 8(7), 239-242. 

Kulkarni, A.R., & Mundhe, S.D. (2017). Data mining technique: An implementation of 
association rule mining in healthcare. International Advanced Research Journal in 
Science, Engineering and Technology, 4(7), 62-65. 

Lakshmi, K.S., & Vadivu, G. (2017). Extracting association rules from medical health 
records using multi-criteria decision analysis. Procedia Computer Science, 115, 290-
295. 

Ordonez, C., Ezquerra, N., & Santana, C.A. (2006). Constraining and summarizing 
association rules in medical data. Knowledge and Information Systems, 9(3), 259-
283. 

Prithiviraj, P., & Porkodi, R. (2015). A comparative analysis of association rule mining 
algorithms in data mining: A study. American Journal of Computer Science and 
Engineering Survey, 3(1), 1-10. 

Sabthami, J., Thirumoorthy, K., & Muneeswaran, K. (2016). Mining association rules 
for early diagnosis of diseases from electronic health records. Middle-East Journal of 
Scientific Research, 24, 248-253. 

Said, I.U., Adam, A.H., & Garko, A.B. (2015). Association rule mining on medical data 
to predict heart disease. International Journal of Science Technology and 
Management, 4(8), 26-35. 

Sambasiva Rao, P., & Uma Devi, T. (2017). Applicability of apriori based association 
rules on medical data. International Journal of Applied Engineering Research, 12(20), 
9451-9458. 


 Rai et al./Decis. Mak. Appl. Manag. Eng. (2022)  

14 

Sarıyer, G., & Taşar, C. Ö. (2020). Highlighting the rules between diagnosis types and 
laboratory diagnostic tests for patients of an emergency department: Use of 
association rule mining. Health Informatics Journal, 26(2), 1177-1193. 

Sengupta, D., Sood, M., Vijayvargia, P., Hota, S., & Naik, P.K. (2013). Association rule 
mining based study for identification of clinical parameters akin to occurrence of 
brain tumor. Bioinformation, 9(1), 555-559. 

Shawkat, M., Badawy, M., & Eldesouky, A.I. (2021). A novel approach of frequent 
itemsets mining for Coronavirus disease (COVID-19). European Journal of Electrical 
Engineering and Computer Science, 5(2), 5-12. 

Stilou, S., Bamidis, P.D., Maglaveras, N., & Pappas, C. (2001). Mining association rules 
from clinical databases: An intelligent diagnostic process in healthcare. In: Studies in 
Health Technology and Informatics, IOP Press, 84, 1399-1403. 

Tandan, M., Acharya, Y., Pokharel., S., &  Timilsina, M. (2021). Discovering symptom 
patterns of COVID-19 patients using association rule mining. Computers in Biology 
and Medicine, 131, 104249. 

Thamer, M., El-Sappagh, S., & El-Shishtawy, T. (2020). A semantic approach for 
extracting medical association rules. International Journal of Intelligent Engineering 
& Systems, 13(3), 280-292. 

© 2022 by the authors. Submitted for possible open access publication under 

the terms and conditions of the Creative Commons Attribution (CC BY) license 

(http://creativecommons.org/licenses/by/4.0/). 

 
https://ebooks.iospress.nl/bookseries/studies-in-health-technology-and-informatics
https://ebooks.iospress.nl/bookseries/studies-in-health-technology-and-informatics
https://www.sciencedirect.com/science/article/abs/pii/S0010482521000433?via%3Dihub#!