Knowledge Engineering and Data Science (KEDS) pISSN 2597-4602 Vol 5, No 2, December 2022, pp. 188–196 eISSN 2597-4637 https://doi.org/10.17977/um018v5i22022p188-196 ©2022 Knowledge Engineering and Data Science | W : http://journal2.um.ac.id/index.php/keds | E : keds.journal@um.ac.id This is an open access article under the CC BY-SA license (https://creativecommons.org/licenses/by-sa/4.0/) Predicting Heart Disease using Logistic Regression Mochammad Anshori 1,*, M. Syauqi Haris 2 Department Informatics, Institute of Health and Science Technology Rs. dr. Soepraoen Malang, Jl. S. Supriyadi No. 22, Malang, 65147, Indonesia 1 moanshori@itsk-soepraoen.ac.id*; 2 haris@itsk-soepraoen.ac.id; * corresponding author I. Introduction Heart disease (HD), or cardiovascular disease, is a major cause of death worldwide. Based on World Health Organization (WHO) report, there are 17.9 million deaths yearly, and almost 32% of all are passed away [1][2]. According to the WHO page, the cause of heart disease is a heart attack, stroke, and rheumatic. Everyone has the potential for heart disease, especially men compared to the woman. Unhealthy lifestyles, such as smoking, cholesterol, high blood pressure, obesity, alcohol, and hereditary history, become the most critical risk of heart disease [3]. Not all sufferers of heart disease end in death. A controlled lifestyle, such as eating habits and physical activity, can prevent the risk. Symptoms indicate heart disease, such as shortness of breath [4], physical fatigue [5], and pain in the chest, arms, shoulders, or back [6]. Heart disease can attack the sufferer and is not easy to cure because it needs special treatment. As a vital organ, heart health care must be highly guarded. The most effortless action to take as a preventive measure is to reduce smoking habits, have a healthy diet, be active in physical activities and stop consuming alcohol [7]. The various causes of heart disease may increase the prediction complexity. With the development of medical data sourced from the patient's health record, there is a great opportunity as a basic material in developing patient health. Currently, the use of computers has been applied in various fields. In health, it can be used to improve the decision-support system in medicine [8]. Especially, implementing machine learning as an analytical tool can find hidden patterns in the data [9]. This development follows up a high degree of prediction in terms of proper prevention. Prior studies on predicting and classifying heart disease using machine learning techniques are offered. These studies explore various features, methods, and their corresponding accuracies. Some of the notable findings include research that used K-nearest neighbors (KNN) with an accuracy of ARTICLE INFO A B S T R A C T Article history: Received 11 September 2022 Revised 25 October 2022 Accepted 23 December 2022 Published online 30 December 2022 A common risk of death is caused by heart disease. It is critical in the field of medicine to be able to diagnose cardiac disease in order to adequately prevent and treat patients. The most accurate method of prediction has the potential to both extend the patient's life and reduce the severity of their cardiac disease. The use of machine learning is one approach that may be taken to generate predictions. In this study, patient medical record information was used in conjunction with an algorithm for logistic regression in order to make heart disease diagnoses. The outcomes of the logistic regression have been utilized to achieve a high level of accuracy in the prediction of heart disease. To get the model coefficients needed for the equation, the experiment uses an iterative form of the logistic regression test. Iteration 14 produced the best results, with an accuracy of 81.3495% and an average calculation time of 0.020 seconds. The best iteration was reached at that point. The percentage of space that lies beneath the ROC curve is 89.36%. The findings of this study have significant implications for the field of heart disease prediction and can contribute to improved patient care and outcomes. Accurate predictions obtained through logistic regression can guide healthcare professionals in identifying individuals at risk and implementing preventive measures or tailored treatment plans. The computational efficiency of the model further enhances its applicability in real-time decision support systems. This is an open-access article under the CC BY-SA license (https://creativecommons.org/licenses/by-sa/4.0/). Keywords: Heart disease Cardiovascular disease Classification Machine learning Logistic regression http://u.lipi.go.id/1502081730 http://u.lipi.go.id/1502081046 http://journal2.um.ac.id/index.php/keds mailto:keds.journal@um.ac.id https://creativecommons.org/licenses/by-sa/4.0/ mailto:moanshori@itsk-soepraoen.ac.id https://creativecommons.org/licenses/by-sa/4.0/ 189 M. Anshori / Knowledge Engineering and Data Science 2022, 5 (2): 188–196 74% [10], information gain combined with KNN achieving 99.65% accuracy [11], decision tree (DT) method with 99.62% accuracy [12], and GCSA-DCNN model with 95.34% accuracy [5]. Additionally, feature selection and classification methods such as Chi-squared combined with BayesNet achieved 85% accuracy [13], the FCMIM-support vector machine (SVM) method attained an accuracy of 92.37% [14], PCA combined with random forest (RF) achieved 98.7% accuracy [8]. Other methods like logistic regression (LR) achieved accuracies of 92.58% [15] and 92.76% [9], while a machine learning framework utilizing PSO and support vector machine (SVM) classifier achieved 84.36% accuracy [16]. Ensemble classification techniques, including naive Bayes (NB), Bayesian network (BN), random forest (RF), and multilayer perceptron (MLP), achieved an accuracy of 85.48% [11]. These prior studies contribute to the understanding and development of machine learning approaches for heart disease prediction and classification. However, machine learning techniques are useful for predicting heart disease. Implementing the machine learning technique may be more advantageous and effective in terms of cost [17]. Various methods are used to predict heart disease accurately and with maximum accuracy. The methods used range from simple to hybrid methods with other methods aimed at increasing the accuracy of the classifier model. Several methods have been used, including NB [18], BN [19], RF [20], MLP [21], SVM [22], KNN [23], LR [24], DT [25], and deep convolutional neural network (DCNN) [26]. The method for preprocessing uses principal component analysis (PCA), chi-squared, and information gain. Optimization methods include particle swarm optimization (PSO), and ant colony optimization (ACO). This research applied a machine learning algorithm called logistic regression to predict heart disease risk based on risk factors from the patient health records. The logistic regression used is simple logistic regression without any optimization. With this reliability, this study offers the use of logistic regression in classifying heart disease. Previous studies use the same dataset with 14 features, which has resulted in an accuracy of 92.76% [9] and a total of 13 with an accuracy of 92.58% [13]. Based on the result above, logistic regression can provide high accuracy. The difference between the research conducted with previous research is based on the dataset used. This study uses a dataset with a number of features = 9. For the comparison to get the best model, a comparison method is implemented. The model comparisons are based on function classifiers, such as SVM (support vector model) and LDA (linear discriminant analysis). The aim of this study is to know the model of log regression while implemented in this dataset. the fundamental difference between this study and previous research lies in the dataset used. In this research, we used a new dataset that covers symptoms of heart disease that have a total feature less than previous research. The motivation behind this research stems from the pressing need to improve the accuracy of heart disease prediction models, given the significant impact of heart disease on global health. Accurate and reliable prediction models can aid healthcare professionals in identifying high-risk individuals and implementing timely preventive measures. By leveraging machine learning algorithms and exploring various features and methods, we aim to contribute to the development of more effective and efficient heart disease prediction models. The findings of this research can potentially enhance medical decision-making processes, improve patient outcomes, and ultimately reduce the burden of heart disease on individuals and healthcare systems. This research contributes to the existing body of knowledge on heart disease prediction by focusing on a specific dataset with a reduced number of features. While previous studies have achieved high accuracies using more comprehensive datasets, this research explores the potential of logistic regression with a limited feature set. By evaluating the performance of logistic regression and comparing it with other classifiers, such as SVM and LDA, we aim to provide insights into the effectiveness of logistic regression in predicting heart disease using a more compact dataset. The findings of this study can shed light on the trade-offs between feature selection and predictive accuracy, offering valuable guidance for future research and the development of practical heart disease prediction models. The remaining sections of this paper are organized as follows. Section II provides a detailed explanation of the methodology used, including data collection, data preparation, and the M. Anshori / Knowledge Engineering and Data Science 2022, 5 (2): 188–196 190 implementation of logistic regression, SVM, and LDA classifiers. Section III presents the experimental results and performance evaluation metrics, comparing the accuracies of different classifiers. Additionally, a discussion of the findings and their implications will be provided in this section. Finally, Section IV concludes the paper, by summarizing the key findings and their significance in the field of heart disease prediction, the limitations of the study, and potential areas for future research. II. Method In this research, a systematic methodology consisting of four stages represent in Figure 1. Figure 1 provides an overview of these stages, which include dataset loading, dataset preparation, model creation using the selected method, and result evaluation. Fig. 1. Research methodology The initial stage involves preparing the dataset for analysis. The dataset used in this research was obtained from the Mendeley dataset [1]. This dataset contains information on observable characteristics and risk factors associated with heart attacks. The data instances were collected from electronic health records of patients. In total, the dataset comprises 1319 data instances, each representing a patient's information. The data comparison with positive and negative labels can be seen in Figure 2. Fig. 2. Target class demographics Figure 2 provides a visualization of the distribution of positive and negative labels in the dataset. Based on Figure 2, 61% of the data was labeled positive, and the remaining 39% was labeled negative. From the figure, the instance data with a positive class has more quantity than those with a negative label. The dataset has features unlocked 9. The details of the features in the dataset are shown in Table 1. If observed, all data types of each feature are numeric. It indicates that the nominal data has been converted to numeric, making it easier for the model to perform calculations. On the other hand, it makes it easier for researchers to process data because there is no need to convert nominal data types. 191 M. Anshori / Knowledge Engineering and Data Science 2022, 5 (2): 188–196 Table 1. Dataset details No Feature Data type Range Description 1 age numeric 14, 103 Age of patient 2 gender numeric 0, 1 1 = male, 0 = female 3 impulse numeric 20, 1111 Heart rate 4 Pressure high numeric 42, 223 Systolic blood pressure 5 Pressure low numeric 38, 154 Diastolic blood pressure 6 glucose numeric 35, 541 Blood sugar 7 kcm numeric 0.321, 300 CK-MB 8 troponin numeric 0.001, 10.3 Test troponin 9 class nominal 0, 1 Positive/negative The second stage is separating the data between training data and test data. The data shared is used to build a classifier model. The scheme used in data sharing is the k-fold cross-validation method. This method is applied because the resulting model is more general and can avoid overfitting [27]. Cross-validation works based on the value of the parameter k. The value of k here determines how many data segments are shared between test data and training data. The illustration of cross-validation can be seen in Figure 3. Fig. 3. Illustration of cross-validation with k-fold = 10 Figure 3 shows cross-validation for this research with a value of k = 10. The gray cells will be the test data for each section and run iteratively for the value of k. The parameter k used in this study is 10-fold cross-validation, meaning the data is divided into 10 subsets. Each subset is used as the test set once, while the remaining nine subsets are combined to form the training set. This iterative process ensures that the model is evaluated on different combinations of training and test data, providing a more robust assessment of its predictive capabilities. By utilizing the k-fold cross-validation technique, this research aims to build a classifier model that can generalize well to unseen data. This approach helps to assess the model's performance and determine its ability to accurately predict heart disease in new and unseen cases The third stage is creating a LR classification model. LR is a mathematical model that uses probability estimation for each class [28]. LR is one of the supervised learning methods. In this case, LR uses to overcome the binary classification. However, generally, LR is also reliable in the case of multi-label classification. The advantages of LR are that it does not require a lot of parameter optimization and is easy to implement [29]. The LR model operates similarly to linear regression, as seen in (1). However, the primary distinction lies in the function used. In LR, the sigmoid function, shown in (2), is employed within the equation. By substituting the sigmoid function into (1), (3) is derived. Equation (4) represents the formulation of logistic regression as a logit, known as the log probability function. The term inside the brackets is referred to as the odds, representing the ratio of the probability of success to the M. Anshori / Knowledge Engineering and Data Science 2022, 5 (2): 188–196 192 probability of failure. The LR coefficients are estimated using the iteratively reweighted least squares (IRLS) method [30]. In each iteration, the dependent variable is adjusted to obtain the optimal LR coefficient. �̂� = 𝐸(𝑦|𝑥) = 𝛽0 + 𝛽1𝑥1 + ⋯ + 𝛽𝑛 𝑥𝑛+∈ (1) 𝜎(𝑍) = 1 1+𝑒−𝑧 (2) 𝐸(𝑦|𝑥) = 𝑠𝑖𝑔𝑚𝑎(𝛽0 + 𝛽1𝑥1 + ⋯ + 𝛽𝑛 𝑥𝑛 ) (3) 𝐸(𝑦|𝑥) = 1 1+𝑒−(𝛽0 +𝛽1 .𝑥1+⋯+ 𝛽𝑛.𝑥𝑛) (4) where �̂� represents the predicted value of the dependent variable y given the independent variables 𝑥1, 𝑥2, ..., 𝑥𝑛. The coefficients 𝛽0, 𝛽1, ..., 𝛽𝑛 are estimated parameters that determine the relationship between the independent variables and the dependent variable. The term ∈ represents the error term or residual. 𝑍 represents the linear combination of the coefficients and independent variables. Comparison is needed to obtain the best method. The model comparison that will be used is SVM and LDA. SVM generally works by splitting data class based on the hyperplane. The SVM function is shown in (5). 𝐿𝐷 = ∑ ∝𝑖 − 1 2⁄ 𝑛 𝑖=1 ∑ ∑ ∝𝑖 ∝𝑗 𝑦𝑖 𝑦𝑗 𝑥𝑖 𝑇𝑥𝑗 𝑛 𝑗=1 𝑛 𝑖=1 (5) 𝐿𝐷 represents the SVM function, ∝𝑖 and ∝𝑗 are the weights assigned to the data points, 𝑦𝑖 and 𝑦𝑗 are the class labels, and 𝑥𝑖 and 𝑥𝑗 are the feature vectors. The objective of SVM is to find the optimal weights that maximize the margin between the classes. On the other hand, LDA works by projecting all data vectors linearly. LDA optimize the distance between class and minimize the distance between inner class. The LDA formula is shown in (8). The equation is formed from covariance at (6) and pooled covariance at (7). 𝑐𝑖 = (𝑥𝑖 0)𝑇𝑥𝑖 0 𝑛𝑖 (6) 𝑐(𝑟,𝑠) = 1 𝑛 ∑ 𝑛𝑖𝑐𝑖 (𝑟, 𝑠) 𝑔 𝑖=1 (7) 𝑓𝑖 = 𝜇𝑖 𝐶 −1𝑥𝑘 𝑇 − 1 2⁄ 𝜇𝑖 𝐶 −1𝑥𝑖 𝑇 + ln (𝑝𝑖) (8) where 𝑐𝑖 represents the covariance for each class, 𝑛𝑖 is the number of instances in class 𝑖, 𝑥𝑖 0 denotes the centered data for class 𝑖, 𝑔 is the total number of classes, 𝜇𝑖 is the mean vector for class 𝑖, 𝐶 −1 is the inverse of the covariance matrix, 𝑥𝑘 𝑇 is the transpose of the centered data, and 𝑝𝑖 is the prior probability of class 𝑖. The researcher uses a performance reference as an accuracy value as a benchmark in comparing the results in the fourth stage. The formula for calculating accuracy is shown in (9) below. It also uses TPR (true positive rate) and FPR (false positive rate) to get the ROC curve value [31]. ROC here is valid for modeling errors/errors from the built classification model. FPR and TPR can be seen in (10) and (11) below for the accuracy formula. TP means that it is correct and predicted correctly, TN is correct, but the prediction is wrong, FP is wrong but predicted right, and FN is wrong and predicted wrong. 𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 𝑇𝑃+𝑇𝑁 𝑡𝑜𝑡𝑎𝑙 𝑑𝑎𝑡𝑎 (9) 𝑇𝑃𝑅 = 𝑇𝑃 𝑇𝑃+𝐹𝑁 100% (10) 𝐹𝑃𝑅 = 𝑇𝑃 𝑇𝑃+𝐹𝑁 100% (11) 193 M. Anshori / Knowledge Engineering and Data Science 2022, 5 (2): 188–196 III. Results and Discussion The results of this study are by observing the results of logistic regression performance. The application of logistic regression uses the Weka application [32]. There is no data preprocessing here because the data obtained is considered clean. The IRLS iteration test carried out to obtain the logistic regression coefficient. The parameter values tested are 2 to 30 with multiples of 2. The iteration test results can be seen in Figure 4. Fig. 4. Iteration parameter testing Figure 4 shows a graph of the change in accuracy of each iteration test. When the iteration is low, the accuracy obtained is also low. The greater the iteration value, the higher the accuracy value. At iteration = 10, there is a decrease in the accuracy value compared to the accuracy at iteration 8. It shows that iteration = 10 is the optimal locale because the accuracy increases and decreases again. Furthermore, at iteration = 14, it produces an accuracy that tends to be high, namely 81.35%. During this iteration, the logistic regression model can produce the best accuracy because when the accuracy is increased again, it decreases accuracy, and there tends to be no change in the increase or decrease in accuracy. Based on these findings, it can be concluded that the logistic regression model achieved the best accuracy at iteration = 14. This information is crucial for selecting the optimal logistic regression coefficients and maximizing the predictive power of the model. The accuracy of logistic regression was obtained, then the model was compared. The comparison is shown in Table 2. The table shows the evaluation measure such as accuracy, TPR, FPR, and computational time. The time value is second and obtained from ten times rials. The table shows the accuracy of log regression = 81.35%, SVM with linear kernel = 78.17%, and LDA gives accuracy = 69.75%. These results give the highest accuracy from the log regression model. Linearly, the TPR value is also rising to the increase in the accuracy value. Unlike the FPR value, which is inversely proportional, the value will be smaller if the TPR value increases. For the computational time, LDA gives the worst time equal to 0.17 seconds. SVM reach about 0.06 second, better than LDA. The best computational is gained from log regression, which only needs 0.02 seconds to do classification. Table 2. Classification results based on the LR model and its comparison. Evaluation Log Regression SVM (linear) LDA Accuracy 81.3495 78.1653 69.7498 TPR 81.3 78.2 69.7 FPR 18.7 23.4 39.5 Time (s) 0.02 0.06 0.17 In Table 2 several evaluations of the performance of the logistic regression obtained from the confusion matrix. Based on these results, it can be said that logistic regression can be used to predict heart disease with high accuracy. The TPR (sensitivity) was correctly calculated, and the calculated FPR was incorrectly identified [33]. Computational time is also included in the calculation. The computational time obtained resulted in 10 times of testing to get the average. The average value of M. Anshori / Knowledge Engineering and Data Science 2022, 5 (2): 188–196 194 computing time is 0.02 seconds. Based on the computational time generated, the prediction model with logistic regression has a relatively fast computation time. Next, consider Figure 5. Table 3. Confusion matrix Actual Predicted Positive Negative Positive 660 150 Negative 96 413 Since we know if log regression is the best model in this case, let us see the confusion matrix. Using iteration = 14, the results of the evaluation of the implementation of logistic regression are shown in the confusion matrix table in Table 3. The confusion matrix/error matrix is used to visualize the performance of the logistic regression algorithm. The confusion matrix represents the result between the actual and predicted values. The table shows the value of TP = 660, TN 413, FP = 96, FN = 150. Table 3 shows that the classifier cannot predict all the data accurately. From the confusion matrix table above, there are still misclassifications. Next, consider Figure 5. The picture represents ROC of the model performance. The ROC is generated based on the log regression model. Fig. 5.ROC curve Figure 5 shows the ROC curve, which is a combination of the x and y axes, TPR occupies the x- axis and FPR on the y-axis. By being able to visualize the performance of the classifier in making predictions [33]. ROC The value of the ROC curve in Figure 5 is 89.36. This value is good because it is close to 1, which is the best value of the ROC curve. A good curve has a value between 0.5 up to 1 it means that the curve produced by logistic regression is close to its best value. It is proven that the classifier's performance is suitable for predicting heart disease. Accurately predicting heart disease risk is crucial for developing effective decision-support systems in healthcare. The findings of this research contribute to the development of such systems by providing insights into the performance and feasibility of logistic regression as a predictive model. Integrating logistic regression-based algorithms into decision support systems can assist healthcare professionals in identifying individuals at high risk of heart disease and making informed decisions regarding prevention and treatment strategies. These findings highlight the effectiveness of logistic regression as a predictive model for heart disease. Despite misclassifications, the model exhibited high accuracy, relatively fast computational time, and a good ROC curve. These results donate to understanding logistic regression's potential in heart disease prediction and can inform the development of more accurate and efficient prediction models. 195 M. Anshori / Knowledge Engineering and Data Science 2022, 5 (2): 188–196 IV. Conclusion Referring to the results and discussion, the machine learning method, namely logistic regression, can predict heart disease based on the patient's electronic medical record. A dataset used in this study has a total feature = 9 and 1319 instances of data. Based on the iteration parameter test results, the increase in the iteration value affects the accuracy value of the classifier model. It was found that the best iteration that can produce the highest accuracy at iteration = 14. The given accuracy is 81.3495%. The difference in iteration values affects the performance of logistic regression, as evidenced by the increasing iteration value providing an increase in accuracy until finding the optimal point. Log regression is proven more reliable in making predictions with relatively high accuracy and relatively fast computation time. Further research for this study by comparing some machine learning models, namely SVM and LDA. Feature selection can be made in further research from this study to get a better model. Declarations Author contribution All authors contributed equally as the main contributor of this paper. All authors read and approved the final paper. Funding statement This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. Conflict of interest The authors declare no known conflict of financial interest or personal relationships that could have appeared to influence the work reported in this paper. Additional information Reprints and permission information are available at http://journal2.um.ac.id/index.php/keds. Publisher’s Note: Department of Electrical Engineering - Universitas Negeri Malang remains neutral with regard to jurisdictional claims and institutional affiliations. References [1] S. S. Maghdid and T. A. Rashid, “An Extensive Dataset for the Heart Disease Classification System,” Mendeley Data, 2022. [2] WHO, “Cardiovascular diseases,” World Health Organization, 2020. https://www.who.int/health- topics/cardiovascular-diseases#tab=tab_1 (accessed Aug. 08, 2022). [3] C. B. C. Latha and S. C. Jeeva, “Improving the accuracy of prediction of heart disease risk based on ensemble classification techniques,” Informatics Med. Unlocked, vol. 16, p. 100203, 2019. [4] A. Alshukry et al., “Clinical characteristics of coronavirus disease 2019 (COVID-19) patients in Kuwait,” PLoS One, vol. 15, no. 11, p. e0242768, Nov. 2020. [5] S. M. Nagarajan, V. Muthukumaran, R. Murugesan, R. B. Joseph, M. Meram, and A. Prathik, “Innovative feature selection and classification model for heart disease prediction,” J. Reliab. Intell. Environ., vol. 8, no. 4, pp. 333–343, Dec. 2022. [6] S.-J. Kim, “Global Awareness of Myocardial Infarction Symptoms in General Population,” Korean Circ. J., vol. 51, no. 12, p. 997, 2021. [7] R. Ndejjo, G. Musinguzi, F. Nuwaha, H. Bastiaens, and R. K. Wanyenze, “Understanding factors influencing uptake of healthy lifestyle practices among adults following a community cardiovascular disease prevention programme in Mukono and Buikwe districts in Uganda: A qualitative study,” PLoS One, vol. 17, no. 2, p. e0263867, Feb. 2022. [8] A. K. Gárate-Escamila, A. Hajjam El Hassani, and E. Andrès, “Classification models for heart disease prediction using feature selection and PCA,” Informatics Med. Unlocked, vol. 19, p. 100330, 2020. [9] S. M. M. Hasan, M. A. Mamun, M. P. Uddin, and M. A. Hossain, “Comparative Analysis of Classification Approaches for Heart Disease Prediction,” in 2018 International Conference on Computer, Communication, Chemical, Material and Electronic Engineering (IC4ME2), Feb. 2018, pp. 1–4. [10] M. Anshori, F. Mar’i, and F. A. Bachtiar, “Comparison of Machine Learning Methods for Android Malicious Software Classification based on System Call,” in 2019 International Conference on Sustainable Information Engineering and Technology (SIET), Sep. 2019, pp. 343–348. [11] P. Thombare, M. Ghalme, S. Raut, N. Dhakne, and P. R. Dholi, “Prediction of Heart Disease using Machine Learning Techniques,” Int. Res. J. Mod. Eng. Technol. Sci., vol. 04, no. 06, pp. 1099–1102.2022. http://journal2.um.ac.id/index.php/keds https://data.mendeley.com/datasets/65gxgy2nmg https://data.mendeley.com/datasets/65gxgy2nmg https://www.who.int/health-topics/cardiovascular-diseases#tab=tab_1 https://www.who.int/health-topics/cardiovascular-diseases#tab=tab_1 https://doi.org/10.1016/j.imu.2019.100203 https://doi.org/10.1016/j.imu.2019.100203 https://doi.org/10.1371/journal.pone.0242768 https://doi.org/10.1371/journal.pone.0242768 https://doi.org/10.1007/s40860-021-00152-3 https://doi.org/10.1007/s40860-021-00152-3 https://doi.org/10.1007/s40860-021-00152-3 https://doi.org/10.4070/kcj.2021.0320 https://doi.org/10.4070/kcj.2021.0320 https://doi.org/10.1371/journal.pone.0263867 https://doi.org/10.1371/journal.pone.0263867 https://doi.org/10.1371/journal.pone.0263867 https://doi.org/10.1371/journal.pone.0263867 https://doi.org/10.1016/j.imu.2020.100330 https://doi.org/10.1016/j.imu.2020.100330 https://doi.org/10.1109/IC4ME2.2018.8465594 https://doi.org/10.1109/IC4ME2.2018.8465594 https://doi.org/10.1109/IC4ME2.2018.8465594 https://doi.org/10.1109/SIET48054.2019.8985998 https://doi.org/10.1109/SIET48054.2019.8985998 https://doi.org/10.1109/SIET48054.2019.8985998 https://www.irjmets.com/uploadedfiles/paper/issue_6_june_2022/25520/final/fin_irjmets1654871784.pdf https://www.irjmets.com/uploadedfiles/paper/issue_6_june_2022/25520/final/fin_irjmets1654871784.pdf M. Anshori / Knowledge Engineering and Data Science 2022, 5 (2): 188–196 196 [12] H. Gulfam Ahmad and M. Jasim Shah, “Prediction of Cardiovascular Diseases ( CVDs ) Using Machine Learning Techniques in Health,” Azerbaijan J. High Perform. Comput., vol. 4, no. 2, pp. 267–279, Dec. 2021. [13] S. D. Desai, S. Giraddi, P. Narayankar, N. R. Pudakalakatti, and S. Sulegaon, “Back-Propagation Neural Network Versus Logistic Regression in Heart Disease Classification,” in Advances in Intelligent Systems and Computing, 2019, pp. 133–144. [14] W. Książek, M. Gandor, and P. Pławiak, “Comparison of various approaches to combine logistic regression with genetic algorithms in survival prediction of hepatocellular carcinoma,” Comput. Biol. Med., vol. 134, p. 104431, Jul. 2021. [15] J. P. Li, A. U. Haq, S. U. Din, J. Khan, A. Khan, and A. Saboor, “Heart Disease Identification Method Using Machine Learning Classification in E-Healthcare,” IEEE Access, vol. 8, pp. 107562–107582, 2020. [16] J. Vijayashree and H. P. Sultana, “A Machine Learning Framework for Feature Selection in Heart Disease Classification Using Improved Particle Swarm Optimization with Support Vector Machine Classifier,” Program. Comput. Softw., vol. 44, no. 6, pp. 388–397, Nov. 2018. [17] C. M. Bhatt, P. Patel, T. Ghetia, and P. L. Mazzeo, “Effective Heart Disease Prediction Using Machine Learning Techniques,” Algorithms, vol. 16, no. 2, p. 88, Feb. 2023. [18] L. Ali et al., “A Feature-Driven Decision Support System for Heart Failure Prediction Based on X2 Statistical Model and Gaussian Naive Bayes,” Comput. Math. Methods Med., vol. 2019, pp. 1–8, Nov. 2019. [19] A. Elsayad and M. Fakhr, “Diagnosis of cardiovascular diseases with bayesian classifiers,” J. Comput. Sci., vol. 11, no. 2, pp. 274–282, 2015. [20] S. Asadi, S. Roshan, and M. W. Kattan, “Random forest swarm optimization-based for heart diseases diagnosis,” J. Biomed. Inform., vol. 115, p. 103690, Mar. 2021. [21] K. Subhadra and B. Vikas, “Neural network based intelligent system for predicting heart disease,” Int. J. Innov. Technol. Explor. Eng., vol. 8, no. 5, pp. 484–487, 2019. [22] L. Ali et al., “An Optimized Stacked Support Vector Machines Based Expert System for the Effective Prediction of Heart Failure,” IEEE Access, vol. 7, pp. 54007–54014, 2019. [23] R. TR, U. K. Lilhore, P. M, S. Simaiya, A. Kaur, and M. Hamdi, “Predictive analysis of heart diseases with machine learning approaches,” Malaysian J. Comput. Sci., pp. 132–148, Mar. 2022. [24] S. I. Ayon, M. M. Islam, and M. R. Hossain, “Coronary Artery Heart Disease Prediction: A Comparative Study of Computational Intelligence Techniques,” IETE J. Res., vol. 68, no. 4, pp. 2488–2507, Jul. 2022. [25] M. M. Ghiasi, S. Zendehboudi, and A. A. Mohsenipour, “Decision tree-based diagnosis of coronary artery disease: CART model,” Comput. Methods Programs Biomed., vol. 192, p. 105400, Aug. 2020. [26] T. K. Sajja and H. K. Kalluri, “A Deep Learning Method for Prediction of Cardiovascular Disease Using Convolutional Neural Network,” Rev. d’Intelligence Artif., vol. 34, no. 5, pp. 601–606, Nov. 2020. [27] S. Nusinovici et al., “Logistic regression was as good as machine learning for predicting major chronic diseases,” J. Clin. Epidemiol., vol. 122, pp. 56–69, Jun. 2020. [28] D. Maulud and A. M. Abdulazeez, “A Review on Linear Regression Comprehensive in Machine Learning,” J. Appl. Sci. Technol. Trends, vol. 1, no. 4, pp. 140–147, 2020. [29] Z. Huang and D. Chen, “A Breast Cancer Diagnosis Method Based on VIM Feature Selection and Hierarchical Clustering Random Forest Algorithm,” IEEE Access, vol. 10, pp. 3284–3293, 2022. [30] A. Swift, R. Heale, and A. Twycross, “What are sensitivity and specificity?,” Evid. Based Nurs., vol. 23, no. 1, pp. 2–4, Jan. 2020. [31] E. Frank, M. A. Hall, and I. H. Witten, The WEKA workbench. Morgan Kaufmann, 2016. [32] K. Kirasich, T. Smith, and B. Sadler, “Random Forest vs Logistic Regression: Binary Classification for Heterogeneous Datasets,” SMU Data Sci. Rev., vol. 1, no. 3, p. 9, 2018. [33] L. de S. Rodrigues, E. T. Matsubara, and B. M. Nogueira, “Learning a Fast Bipartite Ranker for Text Documents Using Lexicographical Rankers and ROC Curves,” in 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Nov. 2017, pp. 1307–1312. https://doi.org/10.32010/26166127.2021.4.2.267.279 https://doi.org/10.32010/26166127.2021.4.2.267.279 https://doi.org/10.32010/26166127.2021.4.2.267.279 https://doi.org/10.1007/978-981-13-0680-8_13 https://doi.org/10.1007/978-981-13-0680-8_13 https://doi.org/10.1007/978-981-13-0680-8_13 https://doi.org/10.1016/j.compbiomed.2021.104431 https://doi.org/10.1016/j.compbiomed.2021.104431 https://doi.org/10.1016/j.compbiomed.2021.104431 https://doi.org/10.1109/ACCESS.2020.3001149 https://doi.org/10.1109/ACCESS.2020.3001149 https://doi.org/10.1134/S0361768818060129 https://doi.org/10.1134/S0361768818060129 https://doi.org/10.1134/S0361768818060129 https://doi.org/10.3390/a16020088 https://doi.org/10.3390/a16020088 https://doi.org/10.1155/2019/6314328 https://doi.org/10.1155/2019/6314328 https://doi.org/10.1155/2019/6314328 https://www.researchgate.net/profile/Alaa-Elsayad/publication/283028563_Diagnosis_of_Cardiovascular_Diseases_with_Bayesian_Classifiers/links/57b2f1c308aeac3177847e6e/Diagnosis-of-Cardiovascular-Diseases-with-Bayesian-Classifiers.pdf https://www.researchgate.net/profile/Alaa-Elsayad/publication/283028563_Diagnosis_of_Cardiovascular_Diseases_with_Bayesian_Classifiers/links/57b2f1c308aeac3177847e6e/Diagnosis-of-Cardiovascular-Diseases-with-Bayesian-Classifiers.pdf https://doi.org/10.1016/j.jbi.2021.103690 https://doi.org/10.1016/j.jbi.2021.103690 https://www.researchgate.net/profile/Vikas-Boddu/publication/332035370_Neural_network_based_intelligent_system_for_predicting_heart_disease/links/601f7b36299bf1cc26ac05de/Neural-network-based-intelligent-system-for-predicting-heart-disease.pdf https://www.researchgate.net/profile/Vikas-Boddu/publication/332035370_Neural_network_based_intelligent_system_for_predicting_heart_disease/links/601f7b36299bf1cc26ac05de/Neural-network-based-intelligent-system-for-predicting-heart-disease.pdf https://doi.org/10.1109/ACCESS.2019.2909969 https://doi.org/10.1109/ACCESS.2019.2909969 https://doi.org/10.22452/mjcs.sp2022no1.10 https://doi.org/10.22452/mjcs.sp2022no1.10 https://doi.org/10.1080/03772063.2020.1713916 https://doi.org/10.1080/03772063.2020.1713916 https://doi.org/10.1016/j.cmpb.2020.105400 https://doi.org/10.1016/j.cmpb.2020.105400 https://doi.org/10.18280/ria.340510 https://doi.org/10.18280/ria.340510 https://doi.org/10.1016/j.jclinepi.2020.03.002 https://doi.org/10.1016/j.jclinepi.2020.03.002 http://dx.doi.org/10.38094/jastt1457 http://dx.doi.org/10.38094/jastt1457 https://doi.org/10.1109/ACCESS.2021.3139595 https://doi.org/10.1109/ACCESS.2021.3139595 https://doi.org/10.1136/ebnurs-2019-103225 https://doi.org/10.1136/ebnurs-2019-103225 https://scholar.google.com/scholar?hl=id&as_sdt=0%2C5&q=E.+Frank%2C+M.+A.+Hall%2C+and+I.+H.+Witten%2C+%E2%80%9CThe+WEKA+workbench%2C%E2%80%9D+Data+Min.%2C+pp.+553%E2%80%93571%2C+2017%2C+doi%3A+10.1016%2Fb978-0-12-804291-5.00024-6.&btnG= https://core.ac.uk/download/pdf/216913541.pdf https://core.ac.uk/download/pdf/216913541.pdf https://doi.org/10.1109/ICDAR.2017.215 https://doi.org/10.1109/ICDAR.2017.215 https://doi.org/10.1109/ICDAR.2017.215