LONTAR KOMPUTER VOL. 13, NO. 1 APRIL 2022 p-ISSN 2088-1541 
DOI : 10.24843/LKJITI.2022.v13.i01.p06 e-ISSN 2541-5832 
Accredited Sinta 2 by RISTEKDIKTI Decree No. 158/E/KPT/2021 
 

60 
 

Optimizing Random Forest using Genetic Algorithm for 
Heart Disease Classification  

 
Parmonangan R. Togatoropa1, Megawati Sianturia2, David Simamoraa3, Desriyani Silaena4 

 
aFaculty of Informatics and Electrical Engineering, Institute of Technology Del 

Laguboti, Indonesia 
1mona.togatorop@del.ac.id 

2megawatiisianturii@gmail.com 
3davidsimamora007@gmail.com 
4desriyanisilaen17@gmail.com 

 
Abstract 

 
Heart disease is a leading cause of death worldwide, and the need for effective predictive systems 
is a major source of the need to treat affected patients. This study aimed to determine how to 
improve the accuracy of Random Forest in predicting and classifying heart disease. The 
experiments performed in this study were designed to select the most optimal parameters using 
an RF optimization technique using GA. The Genetic Algorithm (GA) is used to optimize RF 
parameters to predict and classify heart disease. Optimization of the Random Forest parameter 
using a genetic algorithm is carried out by using the Random Forest parameter as input for the 
initial population in the Genetic Algorithm. The Random Forest parameter undergoes a series of 
processes from the Genetic Algorithm: Selection, Crossover Rate, and Mutation Rate. The 
chromosome that has survived the evolution of the Genetic Algorithm is the best population or 
best parameter Random Forest. The best parameters are stored in the hall of fame module in the 
DEAP library and used for the classification process in Random Forest. The optimized RF 
parameters are max_depth, max_features, n_estimator, min_sample_leaf, and min_sample_leaf. 
The experimental process performed in RF uses the default parameters, random search, and grid 
search. Overall, the accuracy obtained for each experiment is the default parameter 82.5%, 
random search 82%, and grid search 83%. The RF+GA performance is 85.83%; this result is 
affected by the GA parameters are generations, population, crossover, and mutation. This shows 
that the Genetic Algorithm can be used to optimize the parameters of Random Forest. 
 
Keywords: Machine Learning, Random Forest (RF), Genetic Algorithm (GA), Default parameter, 
Random search, Grid search 
  
 
1. Introduction 

Heart disease, or coronary heart disease, is one of the biggest causes of death globally. 
According to WHO (World Health Organization), in 2015, an estimated 8.8 million people died 
from heart disease; in the United Kingdom (UK), at least 2.3 million people suffered from heart 
disease, and in 2014 this condition contributed to at least 69,000 total deaths [1]. The key risk 
factors that affect a person with heart disease are High blood pressure, high cholesterol, and 
smoking. Many medical issues such as lifestyle choices, including diabetes, obesity, poor 
nutrition, physical inactivity, and excessive alcohol consumption, may also put people at a higher 
risk of heart disease [2]. 

Computer-aided detection (CAD) is designed to provide automated predictions of heart disease 
[2]. As one of the modern methods of computer-assisted detection, machine learning is an 
emerging technology to analyze medical data and provide a prognosis on early detection results. 
Different researchers use machine learning to diagnose heart disease to compare data mining 
tools and machine learning to classify heart disease using the Cleveland dataset from UCI 
Machine Learning [2] [3] [4].  

Some researchers show that Random Forest (RF) accurately predicts heart disease because it 

mailto:1mona.togatorop@del.ac.id
mailto:2megawatiisianturii@gmail.com
mailto:3davidsimamora007@gmail.com
mailto:4desriyanisilaen17@gmail.com


LONTAR KOMPUTER VOL. 13, NO. 1 APRIL 2022 p-ISSN 2088-1541 
DOI : 10.24843/LKJITI.2022.v13.i01.p06 e-ISSN 2541-5832 
Accredited Sinta 2 by RISTEKDIKTI Decree No. 158/E/KPT/2021 
 

61 
 

performs better. Research [5] compared random forest with KNN for predicting heart disease. 
The results obtained are RF achieving 95% accuracy compared to KNN achieving 73% accuracy. 
Therefore, predictions made by RF are better than KNN [5]. Sravanthi [6] compared the 
classification methods of RF, decision tree, artificial neural network, SVM, Naive Bayes, and KNN 
on coronary datasets. Accuracy results obtained were 0.88%, 0.87%, 0.86%, 0.83%, 81% and 
0.77% respectively. RF has the highest accuracy. Besides predicting heart disease, RF is also 
used for other domains, such as forecasting new students [7]. Based on previous research, it was 
shown that RF has better accuracy and performance than other algorithms, so in this study, the 
Random Forest algorithm will be used to perform classification. 

Optimizing RF parameters can improve the accuracy of the prediction model [8] [9]. RF involves 
several hyperparameters controlling the structure of each tree, structure, size, and randomness of 
the forest [10]. Grid Search and Random Search can automatically find the optimal 
hyperparameter in RF. Grid Search is an optimization algorithm that searches all possible 
combinations in the search space [9]. Random Search [11] is an approach that randomly samples 
parameters defined by search space.  

Meanwhile, the Genetic Algorithm (GA) is one of the best-known machine learning algorithms for 
solving optimization problems [12] and gives the optimal value of a function. Genetic algorithms 
(GA) is an optimization strategy inspired by evolution. GA work by adopting the evolutionary 
process on a population of solutions [13]. GA is already used to solve various optimization cases. 
Currently, many researchers are using a GA to optimize the RF  hyperparameter [9] [12] [14] [15]. 
Research [16] has conducted a literature study on the use of GA for heart disease and concluded 
that the use of GA achieved an accuracy of up to 97.7%. Results show that GA can be used to 
optimize RF parameters.  

This research's novelty is optimizing RF hyperparameter for heart disease Classification. Due to 
the ability of GA to perform optimization, GA will be used as an optimization algorithm to optimize 
RF parameters. After that, the optimization results will be compared with Grid Search and 
Random Search. The purpose of using GA is to get an optimized hyperparameter and produce 
higher accuracy for Heart Disease Classification. The result is that using Random Forest with 
genetic algorithms has higher accuracy than using only Random Forest. 

 
2. Method 

The method designed to implement random forest optimization using a genetic algorithm consists 
of several stages. The design begins with the data preprocessing process, namely data merging, 
data cleaning to clean the data, and data reduction to remove features with high missing values. 
After doing the preprocessing stage, it will proceed to one of the two processes that have been 
passed. The Random Forest classification process is intended for Random Forest classification 
without going through an optimization process. The second process is the Random Forest 
optimization process using the Genetic Algorithm. This process produces the best parameters, 
which will be classified again using Random Forest. The classification results from both 
approaches will be evaluated. The research method used to optimize random forest parameters 
using genetic algorithms can be seen in Figure 1.  
    

LONTAR KOMPUTER VOL. 13, NO. 1 APRIL 2022 p-ISSN 2088-1541 
DOI : 10.24843/LKJITI.2022.v13.i01.p06 e-ISSN 2541-5832 
Accredited Sinta 2 by RISTEKDIKTI Decree No. 158/E/KPT/2021 
 

62 
 

Figure 1. Design system optimizing Random Forest Using Genetic Algorithm 

2.1. Data Preprocessing 

The dataset used in this research is a dataset taken from the UCI Machine Learning website [17]. 
The dataset has 13 attributes: attributes sex, fbs, exang, and target with binary data type; 
attributes cp, restecg, slope, ca, and thal with categorical data type; attributes age, trestbps, chol, 
thalach, and oldpeak with continu data type. 
Data preprocessing used in this research are data cleaning, data integration, and data reduction. 
Data preprocessing needs to be done to ensure the quality of the data used. The quality of the 
data decreases if the data obtained is incomplete, inconsistent, and contains special characters 
that are not needed. The Cleveland dataset and the Hungarian dataset were merged in data 
integration because the dataset has the same features. The Heart Disease dataset is data that is 
not too large and complex. When this dataset is put together, it creates 596 rows and 14 attributes. 
The percentage of missing values can be shown in Table 1. 
 

Table 1. Missing value of datasets 

No Atribut Total missing value 

1 Ca 49.41% 
2 Thal 44.39% 
3 Slope 31.83% 
4 Chol 3.85% 
5 Fbs 1.34% 
6 Exang 0.17% 
7 Thalach 0.17% 
8 Restecg 0.17% 
9 Trestbps 0.17% 

 
Data cleaning helps in the process of overcoming missing values, data inconsistencies, and 
detecting outliers. To overcome this, a preprocessing technique was carried out to see the number 
of missing values contained in the dataset; for continuous attributes, the missing values will be 
handled by their mean. Meanwhile, the categorical attributes will be input with '0'. Due to the three 
attributes we have dropped, the final data result is 596 rows, and 11 attributes are used for making 
machine learning models. The distribution of training and testing data states that the data splitting 
process uses the train_test_split library, with 80:20 data partitions, random_state is used to 
ensure that each run splitting the data will always be the same. 


LONTAR KOMPUTER VOL. 13, NO. 1 APRIL 2022 p-ISSN 2088-1541 
DOI : 10.24843/LKJITI.2022.v13.i01.p06 e-ISSN 2541-5832 
Accredited Sinta 2 by RISTEKDIKTI Decree No. 158/E/KPT/2021 
 

63 
 

2.2. Random Forest 

Random Forest is the most popular ensemble technique for probability prediction and estimation. 
The ensemble method is a way to improve the accuracy of the classification method by combining 
classification methods  [18]. Random Forest uses a decision tree as a basic classification method; 
this Random Forest ensemble method is used for classification and regression purposes or often 
referred to as CART (Classification and Regression Technique), which consists of several 
classifiers that have been trained where the predictors will be combined and classify the sample 
that has been selected [19]. Random Forest is a general term used as an aggregation scheme in 
a decision tree. Before it was called a random forest, this Algorithm was named Breimen Forest 
because Breiman proposed it. Mathematically the calculation of Breiman Forest can be expressed 
as: 

mM,n(x,θ1,…,θm,δn) =
1

M
∑ mn(x,θm,δn)

M

m=1

 
( 1 ) 

 
Random Forest is a collection of randomized trees that will be averaged. The above formula 
states that m_n is a random forest so that m_(M,n) is a random forest that you want to create with 
M randomized tree, with x stating the predicted value at the x-th tree, where, θ_1,…,θ_m is a 
random variable distributed with sample data _n. M expresses a randomized tree. So the output 
of Breiman Forest is the average prediction given by M trees. 

2.3. Genetic Algorithm  

The Genetic Algorithm (GA) is based on the principle of natural selection. Holland developed the 
genetic Algorithm as a helpful tool for search and optimization problems. The Genetic Algorithm 
is applied to a population of individuals P where individuals are categorized by chromosome Ck 
= (1,…, P). Chromosomes consist of several strings of symbols, known as genes Ck = Ck1,….., 
Ckn, and we can write N as the length of the string. Individuals are evaluated based on their 
respective fitness functions. Genetic Algorithms operate with three basic operators: selection, 
crossover, and mutation. Selection plays a role in selecting individuals with the best fitness values 
from the current generation to survive in the next generation. A crossover is a process of 
combining two parents to produce children. The mutation function is to make small changes to 
certain gene elements from the population and provide more ability to produce problem solutions 
optimization [20]. The genetic Algorithm looks for the best optimal solution during the evolution of 
chromosomes in terms of a defined fitness function [21]. 

The parameters used in the genetic Algorithm are the fitness function, the population size in each 
generation, the probability of crossover, the probability of mutation, and the number of generations 
formed. The following are the basic steps in the Genetic Algorithm. 
 

LONTAR KOMPUTER VOL. 13, NO. 1 APRIL 2022 p-ISSN 2088-1541 
DOI : 10.24843/LKJITI.2022.v13.i01.p06 e-ISSN 2541-5832 
Accredited Sinta 2 by RISTEKDIKTI Decree No. 158/E/KPT/2021 
 

64 
 

Figure 2 Basic steps of Genetic Algorithm 
 

Figure 2 shows the steps for the genetic algorithm process. First, initialize the population that 
designs a chromosome to represent the solution. Usually designed in the form of a binary string. 
After generating the initial population, genetic operators (selection, crossover, mutation) are 
applied to that population. The selection operator selects the most suitable chromosome by 
evaluating the fitness value of each chromosome. In general, accuracy is used as a fitness 
function for classification problems. Then the crossover operator swaps the genes of the two-
parent chromosomes to get a new child to reach a better solution. The mutation operator replaces 
randomly selected bits with very low probability. By applying this operator, a new population is 
formed. The above step is a step to create a new population and is carried out until the stopping 
condition is met [22]. 

2.4. Classification Task: Random Forest Algorithm 

The Random Forest Algorithm has several hyperparameters that can affect performance. 
Hyperparameters are parameters needed by machine learning methods to classify. Choosing the 
correct parameters can make a significant difference in the prediction results. Specifying this 
hyperparameter can be done manually by trying all possible values. However, doing so is time-
consuming because the number of possible combinations is very large.  
This study will conduct experiments on classifying random forests with default parameters, 
random search, grid search, and Genetic Algorithm. Random Search and Grid Search are used 
to see the performance of another optimization method without using a Genetic Algorithm.  

a. Default Parameter: When running the Random Forest algorithm, RF has parameters used 
to build the model. These parameters have their respective default values. Random 
Forest with default parameters also has good accuracy compared to other classification 
algorithms such as decision trees, Naïve Bayes, etc. This study examines the effect of 
RF parameters, including max_features, max_depth, n_estimator, min_sample_split, and 
min_sample_leaf. To determine the possible values of these parameters. The possible 
values obtained for each parameter are contained in Table 2. 

 
Table 2. Parameters and possible value of Random Forest 

Parameter Possible Value 

Max_features ‘sqrt’, ‘log2’ 
Max_depth 2, 5, 10, 20, 50, None 
N_estimator 100 – 1000 (interval 100) 
Min_sample_split 2 – 5 
Min_sample_leaf 1 – 4 

 
LONTAR KOMPUTER VOL. 13, NO. 1 APRIL 2022 p-ISSN 2088-1541 
DOI : 10.24843/LKJITI.2022.v13.i01.p06 e-ISSN 2541-5832 
Accredited Sinta 2 by RISTEKDIKTI Decree No. 158/E/KPT/2021 
 

65 
 

b. Random search: The strategy widely used to perform hyperparameter tuning is Random 
Search. Random search works by searching for every possibility in the parameter. Based 
on research [23] states that Grid Search has advantages when browsing a search space 
that is too large, while Random Search does not always produce good results. Research 
[5] performed hyperparameter optimization using Random Search and got higher results 
than the default parameters. Study [24] states that Random Search (RS) has advantages 
in multidimensional hyperparameters compared to grid search. The Random Search will 
return the best parameter by its process and do the classification.  

c. Grid search: Grid Search (GS) is one of the most commonly used methods for exploring 
the hyperparameter configuration space. The main disadvantage of GS is that when the 
configuration space is relatively high, GS is not efficient because the number of 
evaluations increases exponentially, so it requires a long computation time. Research 
[23] uses grid search for hyperparameter optimization because of its simplicity in 
implementation and parallelization and its reliability in low-dimensional space. Study [25] 
proposed system helps set hyperparameters using the grid search method. Based on the 
experiments, the Algorithm with the grid search hyperparameter setting gives more 
accurate results than the traditional approach (without setting the hyperparameter). The 
Grid Search will return the best parameter by its process and do the classification. And 
then, the three methods: Default Parameter, Random Search, and Grid Search, will be 
evaluated to see the model's accuracy. Grid Search and Random Search will be 
implemented using python's scikitlearn library. 

2.5. Proposed Method: RF-GA Optimization 

  
Figure 3. RF-GA Optimization 
 

Genetic algorithms are used before the classification process to improve the results of random 
forest classification. RF-GA Optimization: Random Forest-Genetic Algorithm Optimization is the 
proposed optimization method for this research. RF-GA optimization can be seen in Figure 3. 


LONTAR KOMPUTER VOL. 13, NO. 1 APRIL 2022 p-ISSN 2088-1541 
DOI : 10.24843/LKJITI.2022.v13.i01.p06 e-ISSN 2541-5832 
Accredited Sinta 2 by RISTEKDIKTI Decree No. 158/E/KPT/2021 
 

66 
 

Optimizing RF with GA begins by entering heart disease data and then preprocessing data on the 
dataset. The Genetic Algorithm performs to initialize the initial population (define chromosomes). 
Chromosomes are parameters of the machine learning algorithm used in this study, which is 
Random Forest. Parameters in random forests that are optimized and become population 
initialized in the genetic Algorithm are max_depth, max features, min_sample_leaf, 
min_sample_split, and n_estimators. Evaluate the fitness value for each chromosome to ensure 
that the chromosome criteria are suitable for selection. This study is a classification of heart 
disease, the purpose of the classification is to predict whether a person has heart disease or not. 

For this reason, the fitness score used in this study is the AUC score. AUC Score is one of the 
fitness value metrics evaluations. If the fitness value meets the criteria, it is selected to be the 
best chromosome for the genetic algorithm optimization process. However, if it does not meet, 
the selection process is carried out using the tournament size two times. Crossover to swap genes 
from two-parent chromosomes to get a new child to achieve a better solution. Mutations to make 
small changes in specific gene elements of the population by randomly selecting genes with very 
low probability and replacing them. This process produces the best chromosome, obtained as the 
best parameter. When running a genetic algorithm with parameters such as crossover_probabilty, 
mutation_probability, population_size, and number_of_generations, the algorithm module from 
DEAP will be used to execute the Evolutionary Algorithm. One of the parameters required from 
the algorithm module is the HallOfFame() module. The Genetic Algorithm process will be stored 
in a list by HallOfFame(), which contains the best individuals who survive after going through the 
evolution process in the form of best_parameters. These best_parameters are random forest 
parameters that a genetic algorithm has optimized. Furthermore, classification is carried out using 
optimized parameters (best_parameters).  

In the Random Forest Process, the best parameters obtained from the Genetic Algorithm process 
are classified using Random Forest. Accuracy results are evaluated using the Confusion Matrix 
to measure the performance of the classification and determine the level of obtaining precision, 
accuracy, and error values. The ROC-AUC evaluation technique will describe an accuracy 
improvement curve and obtain a final score of accuracy. 

DEAP is built using python and can be used to perform computational calculations for researchers 
who want to use Genetic Programming. DEAP provides the essentials for assembling advanced 
Evolutionary Computation (EC) systems. The aim is to provide a practical tool for rapid prototyping 
of custom evolution algorithms, where every step of the process is as straightforward as possible 
and easy to read and understand. 

DEAP provides basic data structures, genetic operators, and basic examples for users to 
implement evolutionary loops [26]. DEAP consists of two basic structures: the creator and toolbox 
modules. The creator module allows the generation of genotypes and populations from any data 
structure. The creator module is the key to facilitating the implementation of all evolutionary 
algorithms, including Genetic Algorithms, genetic programming, evolution strategies, and others. 

2.6. Evaluation Method  

The evaluation methods that will be used to test the performance of the classification model are 
Confusion Matrix and ROC AUC. The confusion matrix is used to obtain the accuracy of the 
classification performed on the Algorithm. The classification process's accuracy value is obtained 
in the confusion matrix. In measuring performance using the confusion matrix, there are four terms 
used, namely: True Positive (TP), True Negative (TN), False Negative (FN), and False Positive 
(FP). The confusion matrix results measure performance metrics, often called evaluation 
matrices. The evaluation metrics used are classification accuracy, classification error, precision, 
and recall. Classification accuracy is used to display the accuracy obtained from the evaluation 
results. Classification error is used to display the number of errors or errors in the evaluated data. 
Precision is used to describe a measure of the accuracy of the evaluation. The recall is used to 
describe the success of the accuracy obtained. 

 
Accuracy = 
(TN+TP)

(TN+FP+FN+TP)
 

( 2 ) 


LONTAR KOMPUTER VOL. 13, NO. 1 APRIL 2022 p-ISSN 2088-1541 
DOI : 10.24843/LKJITI.2022.v13.i01.p06 e-ISSN 2541-5832 
Accredited Sinta 2 by RISTEKDIKTI Decree No. 158/E/KPT/2021 
 

67 
 

Error rate = 
(FP+FN)

(TP+TN+FP+FN)
 

( 3 ) 

 
Precision = 
(TP+TN)

(TP+TN+FP+FN)
 

( 4 ) 

Recall = 
TP

TP+FN
 ( 5 ) 

A better classification model is a model that has a larger ROC curve. The results of the ROC 
curve show the visualization of the accuracy of the model and comparison between classification 
models based on their True Positive Rate(TPR) and False Positive Rate(FPR)  [27]. The AUC 
Score is also used to test the performance of the model.AUC (Area Under the Curve) closer to 1 
would be able to ideally differentiate the two classes in the case of binary classification [28]. 
 

3. Result and Discussion 

3.1. Result 

The proposed work performs four experiment models: Random Forest with default Parameter, 
Grid Search, Random Search, and RF + GA. Performance measures are calculated and 
compared, as mentioned in the evaluation section.  

In RF+GA, we do some research to see the best parameters of GA like generations, population, 
crossover rate, and mutation rate. This study compares the classification results based on the RF 
with the default parameter, Random Search, Grid Search, and RF +GA. The result of the 
experiment shows in these figures. 
 

Figure 4. Parameter GA Experiment 
 

From Figure 4, we can state that the best parameters for GA to produce a better result are 
generation 50, population 25, Crossover 0.95, and Mutation 0.09. In figure 4, the blue line (the 
value of the axis) is the accuracy value of each experiment. 
 

LONTAR KOMPUTER VOL. 13, NO. 1 APRIL 2022 p-ISSN 2088-1541 
DOI : 10.24843/LKJITI.2022.v13.i01.p06 e-ISSN 2541-5832 
Accredited Sinta 2 by RISTEKDIKTI Decree No. 158/E/KPT/2021 
 

68 
 

Table 3. Parameters used in the experiment 

Experiment Max_ 
depth 

Max_ 
features 

Min_ 
sample_leaf 

Min_ 
sample_split 

N_estimators 

Default 
parameter 

None auto 1 2 100 

Grid search 5 log2 1 2 300 
Random 
search 

2 sqrt 2 5 100 

RF+GA 2 sqrt 4 5 100 

 
The four classification methods use the same training and testing samples to maintain the 
comparability of the result. Table 3 shows the parameters used for each classification. Random 
Forest parameters max_depth, max features, min_sample_leaf, min_sample_split, and 
n_estimators will be optimized to achieve optimal results with the Genetic Algorithm. This value 
is obtained from the literature review of similar research. 
 

Table 4. Experiment Results 

Experiment Accuracy Error Precision Recall AUC 

Default parameter 0.825 0.175 0.8534 0.8919 0.79 
Grid search 0.8333 0.1667 0.8661 0.8642 0.82 

Random search 0.8167 0.1833 0.8734 0.8519 0.81 
RF + GA 0.8583 0.1417 0.8861 0.8974 0.84 

 
The Accuracy of the AUC Score for RF with default Parameter, Grid Search, Random Search, 
and RF + GA are illustrated in Table 4. It can be observed that the Accuracy and AUC scores of 
RF + GA come out to be more than Default Parameter, Random Search, and Grid Search. Based 
on the table, the best evaluation metrics experiment is Grid Search. 
 

Figure 5. ROC Curve of (a) Default Parameter (b) Random Search (c) Grid Search (d) RF+GA 
 

LONTAR KOMPUTER VOL. 13, NO. 1 APRIL 2022 p-ISSN 2088-1541 
DOI : 10.24843/LKJITI.2022.v13.i01.p06 e-ISSN 2541-5832 
Accredited Sinta 2 by RISTEKDIKTI Decree No. 158/E/KPT/2021 
 

69 
 

Figures 5 compares the ROC curves for RF with Default Parameter, Grid Search, Random 
Search, and RF + GA. The curve observation states that RF + GA is more suitable for the 
prediction model since the AUC and the graphic are closer to 1. 

Table 4 shows the result considering different performance measures such as Accuracy, Error, 
Precision, and Recall. From the performance measure, we can state that RF + GA outperforms 
the other Algorithm to predict heart disease. 

3.2. Discussion 

In the Random Forest optimization experiment using Genetic Algorithm (RF + GA), the authors 
conclude that GA can be used to optimize the parameters of Random Forest and produce better 
accuracy than Grid Search. The search space used by GA and Grid Search is also the same 
through the initial population input in GA. The performance of the GA is also influenced by the 
parameters that exist in the GA, including Generation, Population, Crossover Rate, and Mutation 
Rate. Accordingly, the experimental results can be analyzed as follows:  

a. The number of Generations is not directly proportional to accuracy. We conclude that the 
generation parameter will provide the optimum solution for a particular generation so that 
the GA will stop searching when it has obtained the optimal solution, which can be 
referred to as termination criteria. 

b. The number of small populations produces better accuracy than large populations, and 
we conclude that this is influenced by the dataset and search space performed by GA, 
the search space that is not too large makes GA not need a larger population to search. 
However, if the search space is large, we assume that GA will require a larger population 
to produce a more optimum solution 

c. The experimental results show that the crossover with the highest value and the mutation 
with the lowest value provides better accuracy and obtains the optimum solution. 

In the Random Forest Experiment, experiments have been carried out using default parameters, 
Random Search, and Grid Search. The experimental results show that parameter optimization 
using Grid Search can increase accuracy, while experiments using Random Search experience 
a decrease compared to the default parameters.  
The result of the analysis of the relationship between input parameters and RF classification 
accuracies are as follows: 

a. In some cases, a high number of n_estimators can produce good accuracy, but using the 

default value=100 can also produce more optimal accuracy. 

b. The higher the max_depth value, the higher the observation probability so that it can 
improve the model's capabilities. 

c. Using max_features = sqrt(n) tends to produce a better model than auto and sqrt. But it 
is possible to use max_features = log2(n) to produce a good solution as in Grid Search. 

d. Using min_sample_split and min_sample_leaf with higher values tends to produce a 
better result. 

 
4. Conclusion 

Random Forest is one of the classifying algorithms of machine learning. One application of the 
classification algorithm is Heart Disease Classification. There are several classification 
algorithms, including Random Forest. Random Forest is an algorithm that produces good results 
when classifying. Random Forest has parameters that are used to build a classification model. 
This research focuses on GA, which is used to optimize five parameters on RF, namely 
n_estimator, max_depth, max_feature, min_sample_split, and min_sample_leaf, to produce 
optimal heart disease classification accuracy. Optimization of the Random Forest parameter 
using a genetic algorithm is carried out by using the Random Forest parameter as input for the 
initial population in the Genetic Algorithm. The Random Forest parameter undergoes a series of 
processes from the Genetic Algorithm: Selection, Crossover Rate, and Mutation Rate. 

Based on the experiments conducted, the performance of the Random Forest classification with 
Default Parameters 82.5%, Random Search 82%, and Grid Search 83% shows that parameter 
optimization using Grid Search can improve accuracy, while experiments using random search 
experience problems. The performance of RF + GA classification reaches 85.83%; this is 
influenced by the parameters in the Genetic Algorithm, including Generation, Population, 


LONTAR KOMPUTER VOL. 13, NO. 1 APRIL 2022 p-ISSN 2088-1541 
DOI : 10.24843/LKJITI.2022.v13.i01.p06 e-ISSN 2541-5832 
Accredited Sinta 2 by RISTEKDIKTI Decree No. 158/E/KPT/2021 
 

70 
 

Crossover Rate, and Mutation Rate. Therefore, it can be concluded that Genetic Algorithms can 
be used to optimize the parameters of Random Forest and increase the accuracy of Random 
Forest results. 

Further, as an extension of this work, a bigger dataset is required to obtain a better training model, 
using other optimization algorithms to see the difference in the performance of the Genetic 
Algorithm with other algorithms for Heart Disease Classification. 

 
References 
 
[1] L. Anderson et al., "Patient education in the management of coronary heart disease," 

Cochrane Database Syst Rev., vol. 2017, no. 6, 2017, doi: 
10.1002/14651858.CD008895.pub3. 

[2] K. H. Miao, J. H. Miao, and G. J. Miao, "Diagnosing Coronary Heart Disease using Ensemble 
Machine Learning," International Journal of Advanced Computer Science and 
Applications(IJACSA), vol. 7, no. 10, pp. 30–39, 2016, doi: 10.14569/ijacsa.2016.071004. 

[3] I. Tougui, A. Jilbab, and J. El Mhamdi, "Heart disease classification using data mining tools 
and machine learning techniques," Health and Technology, vol. 10, no. 5, pp. 1137–1144, 
2020, doi: 10.1007/s12553-020-00438-1. 

[4] N. B. Muppalaneni, M. Ma, and S. Gurumoorthy, Soft Computing and Medical Bioinformatics. 
Springer Singapore, 2019. doi: 10.1007/978-981-13-0059-2. 

[5] H. Kaur and D. Gupta, "Human Heart Disease Prediction System Using Random Forest 
Technique," International Journal of Computer Science and Engineering, vol. 6, no. 7, pp. 
634–640, 2018. 

[6] P. V. S. N. Sravanthi and P. Rajesh, "An exploration of prediction of heart disease using 
machine learning classification," International Journal Scientific & Technology Research, vol. 
9, no. 3, pp. 6817–6824, 2020. 

[7] R. R. Waliyansyah and N. D. Saputro, “Forecasting New Student Candidates Using the 
Random Forest Method,” Lontar Komputer  Jurnal Ilmiah Teknologi Informasi, vol. 11, no. 1, 
p. 44, 2020, doi: 10.24843/lkjiti.2020.v11.i01.p05. 

[8] I. Syarif, A. Prugel-Bennett, and G. Wills, "SVM Parameter Optimization using Grid Search 
and Genetic Algorithm to Improve Classification Performance," TELKOMNIKA 
(Telecommunication Computing Electronics and Control, vol. 14, no. 4, p. 1502, 2016, doi: 
10.12928/telkomnika.v14i4.3956. 

[9] A. S. Wicaksono and A. A. Supianto, "Hyperparameter optimization using genetic algorithm 
on machine learning methods for online news popularity prediction," International Journal of 
Advanced Computing Science and Application, vol. 9, no. 12, pp. 263–267, 2018, doi: 
10.14569/IJACSA.2018.091238. 

[10] P. Probst, M. N. Wright, and A. L. Boulesteix, "Hyperparameters and tuning strategies for 
random forest," Wiley Interdisciplinary Reviews Data Mining and Knowledge Discovery, vol. 
9, no. 3, 2019, doi: 10.1002/widm.1301. 

[11] R. Schaer, H. Müller, and A. Depeursinge, "Optimized distributed hyperparameter search 
and simulation for lung texture classification in CT using Hadoop," Journal of Imaging, vol. 
2, no. 2, 2016, doi: 10.3390/jimaging2020019. 

[12] D. Ming, T. Zhou, M. Wang, and T. Tan, "Land cover classification using random forest with 
genetic algorithm-based parameter optimization," Journal of Applied Remote Sensing, vol. 
10, no. 3, p. 035021, 2016, doi: 10.1117/1.jrs.10.035021. 

[13] G. Rivera, L. Cisneros, P. Sánchez-Solís, N. Rangel-Valdez, and J. Rodas-Osollo, "Genetic 
algorithm for scheduling optimization considering heterogeneous containers: A real-world 
case study," Axioms, vol. 9, no. 1, 2020, doi: 10.3390/axioms9010027. 

[14] N. K. Kumar, D. Vigneswari, M. V. Krishna, and G. V. P. Reddy, "An Optimized Random 
Forest Classifier for Diabetes Mellitus", Emerging Technologies in Data Mining and 
Information Security, doi: 10.1007/978-981-13-1498-8. 

[15] S. S. Shah and M. A. Pradhan, "R-Ga: an Efficient Method for Predictive Modeling of Medical 
Data Using a Combined Approach of Random Forests and Genetic Algorithm," ICTACT 
Journal on Soft Computing, vol. 06, no. 02, pp. 1153–1156, 2016, doi: 
10.21917/ijsc.2016.0160. 

[16] M. D. Yudianto, T. M. Fahrudin, and A. Nugroho, "A Feature-Driven Decision Support System 


LONTAR KOMPUTER VOL. 13, NO. 1 APRIL 2022 p-ISSN 2088-1541 
DOI : 10.24843/LKJITI.2022.v13.i01.p06 e-ISSN 2541-5832 
Accredited Sinta 2 by RISTEKDIKTI Decree No. 158/E/KPT/2021 
 

71 
 

for Heart Disease Prediction Based on Fisher's Discriminant Ratio and Backpropagation 
Algorithm," Lontar Komputer Journal Ilmiah Teknologi Informasi, vol. 11, no. 2, p. 65, 2020, 
doi: 10.24843/lkjiti.2020.v11.i02.p01. 

[17] "Heart Disease Data Set." https://archive.ics.uci.edu/ml/datasets/heart+disease (accessed 
Apr. 01, 2021). 

[18] A. Syukron and A. Subekti, “Penerapan Metode Random Over-Under Sampling dan Random 
Forest Untuk Klasifikasi Penilaian Kredit,” Jurnal Informatika, vol. 5, no. 2, pp. 175–185, 
2018, doi: 10.31311/ji.v5i2.4158. 

[19] E. Goel and E. Abhilasha, "Random Forest: A Review," International Journal of Advanced 
Research in Computer Science and Software Engineering, vol. 7, no. 1, pp. 251–257, 2017, 
doi: 10.23956/ijarcsse/v7i1/01113. 

[20] S. Kumar and G. Sahoo, "A random forest classifier based on genetic algorithm for 
cardiovascular diseases diagnosis," International Journal of Engineering Transaction B: 
Application, vol. 30, no. 11, pp. 1723–1729, 2017, doi: 10.5829/ije.2017.30.11b.13. 

[21] S. M. Elsayed, R. A. Sarker, and D. L. Essam, "A new genetic algorithm for solving 
optimization problems," Engineering Application of Artificial Intelligence, vol. 27, pp. 57–69, 
2014, doi: 10.1016/j.engappai.2013.09.013. 

[22] K. Kim, K. Lee, and H. Ahn, "Predicting corporate financial sustainability using Novel 
Business Analytics," Sustainability, vol. 11, no. 1, pp. 1–17, 2018, doi: 10.3390/su11010064. 

[23] J. Emakhu, S. Shrestha, and S. Arslanturk, "Prediction system for heart disease based on 
ensemble classifiers," Proceedings of the 5th International Conference on Industrial 
Engineering and Operations Management, no. August, pp. 2337–2347, 2020. 

[24] C. G. Siji George and B. Sumathi, "Grid search tuning of hyperparameters in random forest 
classifier for customer feedback sentiment prediction," International Journal of Advanced 
Computer Science and Applications(IJACSA), vol. 11, no. 9, pp. 173–178, 2020, doi: 
10.14569/IJACSA.2020.0110920. 

[25] P. Liashchynskyi and P. Liashchynskyi, "Grid Search, Random Search, Genetic Algorithm: 
A Big Comparison for NAS," no. 2017, pp. 1–11, 2019. 

[26] J. Kim and S. Yoo, "Software review: DEAP (Distributed Evolutionary Algorithm in Python) 
library," Genetic Programming and Evolvable Machines, vol. 20, no. 1, pp. 139–142, 2019, 
doi: 10.1007/s10710-018-9341-4. 

[27] D. Krishnani, A. Kumari, A. Dewangan, A. Singh, and N. S. Naik, "Prediction of Coronary 
Heart Disease using Supervised Machine Learning Algorithms," IEEE Region 10 Annual 
International Conference Proceedings/TENCON, vol. 2019-Octob, pp. 367–372, 2019, doi: 
10.1109/TENCON.2019.8929434. 

[28] E. K. Hashi and Md. Shahid Uz Zaman, "Developing a Hyperparameter Tuning Based 
Machine Learning Approach of Heart Disease Prediction," Journal of Applied Science & 
Process Engineering, vol. 7, no. 2, pp. 631–647, 2020, doi: 10.33736/jaspe.2639.2020. 

[29] P. T. Nguyen, N. B. Vu, L. Van Nguyen, L. P. Le, and K. D. Vo, "The Application of Fuzzy 
Analytic Hierarchy Process (F-AHP) in Engineering Project Management," 2018 IEEE 5th 
International Conference Engineering Technologies Applied Science (ICETAS) 2018, pp. 1–
4, 2019, doi: 10.1109/ICETAS.2018.8629217.