JURNAL RISET INFORMATIKA 
Vol. 5, No. 1. December 2022 

P-ISSN: 2656-1743 |E-ISSN: 2656-1735 
DOI: https://doi.org/10.34288/jri.v5i1.487 

Accredited rank 3 (SINTA 3), excerpts from the decision of the Minister of RISTEK-BRIN No. 200/M/KPT/2020 

 
537 

 
PREGNANCY RISK LEVEL CLASSIFICATION USING  
THE CRISP-DM METHOD 

 
Reka Dwi Syaputra-1*), Achmad Solichin-2 

 
Master of Computer Science, 

Universitas Budi Luhur 
Jakarta, Indonesia 

https://www.budiluhur.ac.id/ 
1*)2011600075@student.budiluhur.ac.id, 2achmad.solichin@budiluhur.ac.id 

 
(*) Corresponding Author 

 
Abstract 

Independent midwife practices have the task of reminding and maintaining the quality of standardized 
reproductive health services for pregnant women. Independent midwife practices have had patient visits 
since the covid-19 pandemic from 2020 to 2021, especially at the yetti puranama midwife, which consists 
of 320 pregnancy examinations, 130 delivery care, and 50 referrals. The covid-19 pandemic has impacted 
maternal mortality rates because there are still many restrictions on all services. Maternal health services 
include pregnant women who are routinely unable to go to the puskesmas or other healthcare facilities due 
to fear of contracting covid-19, which delays the examination of pregnancy gravida, abortion, temperature, 
pregnancy distance, haemoglobin, blood pressure, ideal weight, and decisions. So that the problem that 
occurs is an increase in the risk of pregnancy, resulting in death and increased maternal mortality. In solving 
this problem, the research takes a machine-learning approach. The research aims to build a classification of 
pregnancy risk levels that can predict early treatment in this study using the random forest method with 
cross-validation 2. This study obtained the results of an accuracy value of 98%, precision of 94%, and 
recalled 100% in the random forest method. 
 
Keywords: pregnancy risk level, classification, decision tree, random forest. 

 
Abstrak 

Praktik bidan mandiri memiliki tugas mengingatkan dan mempertahankan kualitas pelayanan kesehatan 
reproduksi terstandar pada ibu hamil. Praktik bidan mandiri memiliki kunjungan pasien sejak pandemi covid -
19 dari tahun 2020 sampai 2021 khususnya di bidan yetti puranama yang terdiri dari pemeriksaan kehamilan 
320 orang, asuhan persalinan 130 orang dan rujukan 50 orang. Masa pandemi covid-19 memberikan dampak 
angka kematian ibu meningkat disebabkan masih banyak pembatasan ke semua layanan. pelayanan 
kesehatan ibu seperti ibu hamil yang menjadi rutinitas tidak dapat ke puskesmas atau fasilitas pelayanan 
kesehatan lainnya disebabkan takut tertular covid-19. Sehingga menundakan pemeriksaan kehamilan 
gravida, abortus, suhu, jarak kehamilan, hemoglobin, tekanan darah, berat badan ideal dan keputusan. 
Sehingga permasalahan yang terjadinya terdapat peningkatan tingkat risiko kehamilan berdampak kematian 
dan meningkatnya angka kematian ibu. Dalam memecahkan masalah ini, penelitian melakukan cara 
pendekatan dengan machine learning. Tujuan penelitian untuk membangun klasifikasi tingkat risiko 
kehamilan  yang dapat memprediksi penanganan secara dini. Pada penelitian ini menggunakan metode 
random forest dengan  cross-validation 2. Penelitian ini mendapatkan hasil nilai akurasi 98%, precision 94% 
dan recall 100% pada metode random forest. 
 
Kata kunci: tingkat risiko kehamilan, klasifikasi, pohon keputusan, random forest. 
 

INTRODUCTION 
 

From 2007-2020, the coverage of K4 
pregnant women's health services tended to 
increase. However, there was a decrease in 2020, 
from 88.55% to 84.60%. This decline is assumed to 
occur due to program implementation in areas 

affected by the Covid-19 pandemic (Kementerian 
Kesehatan RI, 2021). Health services for pregnant 
women (K4) in 2020 in bengkulu province by 87% 
(Badan Pusat Statistik Provinsi Bengkulu, 2020). 
Compared to 2021, the coverage of K4 pregnant 
women's health services tends to fluctuate. In 2021 
the K4 rate was 88.8%, which is an increase 


P-ISSN: 2656-1743 | E-ISSN: 2656-1735 
DOI: https://doi.org/10.34288/jri.v4i3.XXX 

JURNAL RISET INFORMATIKA 
Vol. 4, No. 3 June 2022 

Accredited rank 3 (SINTA 3), excerpts from the decision of the Minister of RISTEK-BRIN No. 200/M/KPT/2020 

 
538 

 
compared to the previous year. New adaptations 
can influence the increase in K4 coverage of the 
covid-19 situation in 2021 (RI, 2021). It is because 
in the previous year, 2020, there were still many 
restrictions on all routine services, including 
maternal health services, such as pregnant women 
who could not go to the health centre or other 
healthcare facilities for fear of contracting covid-19. 

Thus delaying pregnancy checks and 
service uncertainty due to a lack of human 
resources, infrastructure, and personal protective 
equipment (PPE). Health services for pregnant 
women (K4) in 2021 in the province of Bengkulu are 
89% (Kemenkes RI, 2021). 

It is because in the previous year, 2020, 
there were still many restrictions on all routine 
services, including maternal health services, such as 
pregnant women who could not go to the health 
centre or other healthcare facilities for fear of 
contracting covid-19. So that it delays pregnancy 
checks and service uncertainty in terms of 
personnel, infrastructure, and personal protective 
equipment (PPE). Health services for pregnant 
women (K4) in 2021 in the province of Bengkulu are 
89% (Badan Pusat Statistik Provinsi Bengkulu, 
2020).   While in 2021, the maternal mortality rate 
increased by 50 maternal deaths, with the cause of 
22 pregnant deaths (44%), 11 maternity deaths 
(22%), and 17 postpartum deaths (34%)  
(Bengkulu, 2021).  

Independent midwife practices have the 
task of improving and maintaining the quality of 
reproductive health services for pregnant women, 
especially in the independent midwife practice of 
yetti purnama, M.keb, which locate in tanah patah 
village, Ratu Agung sub-district, Bengkulu city. 
Based on reports from independent midwife 
practices before the outbreak of the covid-19 
pandemic, health services in 1 year could have 
approximately 650 pregnant women patient visits. 
When the Covid-19 pandemic occurred, maternity 
health services declined from the beginning of 2020 
to the end of 2021. There were 500 pregnant 
women patients with 320 pregnancy examinations, 
130 delivery care and 50 referrals. 

The problem faced during the covid-19 
pandemic is the low level of pregnancy health 
services. Many pregnant patients do not come 
directly to the independent midwife practice. So 
there is an increase in pregnancy risk, impacting 
death and maternal mortality. In solving this 
problem, researchers took a machine-learning 
approach. 

Machine Learning in understanding how it 
works aims to make computers learn data. At the 
same time, data mining plans to extract information 

from large amounts of data (Putra, 2020). This 
research predicts the risk level by classifying the 
data set to determine whether it is risky. Then 
collected to build a machine learning model with a 
supervised learning algorithm model. This 
prediction occurs in a data set using the random 
forest method. The research entitled predicting the 
possibility of diabetes (Pavlov, 2018) in the early 
stages uses a random forest classification algorithm 
which produces an accuracy value of 97.88% by 
comparing three machine learning classification 
algorithms, namely support vector machine. Naive 
Bayes and random forest using Receiver Operating 
Characteristic (ROC) curve comparison (Apriliah et 
al., 2021). 

From the citations of previous research, this 
research generally has similarities and differences 
with previous research. In the similarity of this 
research aims to detect disease early by using the 
random forest method. Also, the differences are the 
data and the object of research. This research 
focuses on pregnancy examination data, namely 
gravida, abortion, temperature, gestational 
distance, haemoglobin, blood pressure, ideal 
weight, and decisions so that the research aims to 
build a classification of pregnancy risk levels using 
the random forest method to detect early 
examination by providing an accurate level of risk 
and no risk during the Covid-19 pandemic. 

This reason makes the design of a random 
forest method that can predict early detection and 
provide an accuracy level value, namely risk and not 
risk. In pregnancy, using the random forest method. 
With the division of datasets, namely the ratio of 
training 80% and testing 20% using cross-
validation 2. Using the random forest method will 
get good accuracy results. By paying attention to the 
results of previous studies, it shows that the random 
forest method has classification results in a good 
category. Table 1 shows the previous related 
studies. Based on previous literature for solving 
existing problem concepts is shown in Figure 1. 

 
RESEARCH METHODS 

 
This research uses the CRISP-DM method 

(Cross-industry standard process for data mining), 
as seen in Figure 2. The stages refer to research 
entitled classification of pregnancy risk levels using 
the random forest method. The CRSIP-DM (Cross-
Industry standard process for data mining) method 
contains stages carried out sequentially or in a 
hierarchical process, namely business 
understanding, data understanding, data pre-
preparation, modelling, evaluation, and deployment 
(Mulaab, 2017).  


JURNAL RISET INFORMATIKA 
Vol. 5, No. 1. December 2022 

P-ISSN: 2656-1743 |E-ISSN: 2656-1735 
DOI: https://doi.org/10.34288/jri.v5i1.487 

Accredited rank 3 (SINTA 3), excerpts from the decision of the Minister of RISTEK-BRIN No. 200/M/KPT/2020 

 
539 

 
Table 1. Literature Review Research 

No. Main Objectives method Results 

1 Predicting the likelihood of 
diabetes (Apriliah et al., 2021) 

Support Vector Machine, 
Naive Bayes and Random 
Forest 

Using three algorithms 
obtained good accuracy 
value results, namely 
random forest with an 
accuracy value of 97%. 

2 Random forest optimization 
for bank marketing data 
classification (Religia et al., 
2021) 

Using random forest 
optimization with bagging 
and generic algorithm 

Random forest with 88.30% 
accuracy 

3 Detection of patients with 
diabetes (Suryanegara et al., 
2021) 

Metode normalization (Min-
Max normalization-RF), 
model ke 2 (Z-score 
normalization-RF) dan 
model ke 3 (Normalization-
RF) 

(Min-max normalization-
RF) of 95.45%, model 2 (Z-
score normalization-RF) of 
95%, and model 3 
(Normalization-RF) of 92%. 
These results show that 
model 1 (min-max 
normalization-rf) is better. 

4 Clustering the Risk Level of 
Pregnant Women (Akbar et 
al., 2021)  

K-Means Method K-Means prediction can be 
modelled on clustering and 
is categorical. 

5 Classification of preeclampsia 
status of pregnant women 
(Kustiyahningsih et al., 2020) 

K-Fold Cross Validation and 
ID3 algorithm as data 
clustering 

K-Fold Cross Validation and 
ID3 algorithm as accuracy 
data clustering using a 
confusion matrix 

6 Pregnancy health diagnosis 
(Hikmatulloh et al., 2019) 

ID3 Algorithm This method can produce an 
accuracy of 80.33% 

7 PSO optimization to improve 
the accuracy of the ID3 
algorithm for pregnant 
women's disease prediction 
(Byna, 2019) 

ID3 Algorithm The ID3 classification model 
produces good accuracy 
with PSO, improving this 
study's accuracy. 

8 Detection of pregnancy risk 

level (Wulandari & 

Susanto, 2018) 

Fuzzy Mamdani method and 
simple additive weighting 

Got 88% accuracy using the 
recognition rate model 

9 Comparison of random forest 
and SVM classification 
methods  (Adrian et al., 2021) 

Random Forest and SVM 
Methods 

Two data sets will be 
compared with the first 
SVM algorithm and Random 
Forest to get the most 
accurate sentiment analysis 
results. 

10 Classification of human 
activities (Aziz, 2021) 

Metode Ensemble Stacking Ensemble the stacking 
method as classification. 
Compared to the SVM 
method for accuracy, 
sensitivity, and specificity. 

 
P-ISSN: 2656-1743 | E-ISSN: 2656-1735 
DOI: https://doi.org/10.34288/jri.v4i3.XXX 

JURNAL RISET INFORMATIKA 
Vol. 4, No. 3 June 2022 

Accredited rank 3 (SINTA 3), excerpts from the decision of the Minister of RISTEK-BRIN No. 200/M/KPT/2020 

 
540 

 
Figure 1. Conceptual framework 
 

Figure 2. CRISP-DM model (Mulaab, 2017) 

 
The stages of the CRISP-DM concept are as follows: 
1. Business understanding  

This stage is a research object place for initial 
data collection. 
2. Data understanding  

The stages of the data collection process are to 
build the random forest method and describe the 
data. The data used is nominal, namely data in the 
form of categories. 
3. Data pre-preparation  

Data preparation stage as input to analyze the 
random forest algorithm. Data used in the objective 
examination. So it uses data processing techniques 
to overcome data problems and get valid data. All of 
that uses in data mining techniques. 
4. Modelling 

The stage of forming a classification method to 
predict the category in one object class data, namely 
risk or no risk. In classifying, research uses the main 
method in classification, namely the random forest 
method. At the same time, the decision tree method 
compares good accuracy rates. The data division of 
this method has a ratio of 80% as training data and 
20% as testing data with a cross-validation pattern 
of 2. So in the modelling stage of understanding 
classification in general, there is a system evaluation 
formula in the deployment and prototype. 
5. Deployment  

This stage is the final stage in forming or building 
an application system. This application answers the 
formulation of the problem of predicting the level of 
risk of pregnancy. The model obtained using the 
basic method is random forest by comparing the 
decision tree method to find a good level of 


JURNAL RISET INFORMATIKA 
Vol. 5, No. 1. December 2022 

P-ISSN: 2656-1743 |E-ISSN: 2656-1735 
DOI: https://doi.org/10.34288/jri.v5i1.487 

Accredited rank 3 (SINTA 3), excerpts from the decision of the Minister of RISTEK-BRIN No. 200/M/KPT/2020 

 
541 

 
accuracy. There are several steps in problem-
solving carried out by researchers as follows: 
 
1. Literature Study 

A literature study related to this research topic 
by conducting related literature objectives that 
discuss theories around pregnancy, midwives, 
RStudio, data mining, machine learning, decision 
tree, and random forest. In addition, also a review of 
previous studies. Researchers conducted a 
literature study stage, namely looking for various 
study sources in discussing theories around 
RStudio, data mining, machine learning, decision 
tree, random forest, and midwives. Also, a review of 
previous research studies of the literature is in table 
1. research literature review. 

 
2. Data Collection 

The author collects data through secondary data 
at the research site, while interviews are not for 
primary subjects. Objective examination underlies 
secondary data collection, namely gravida, abortion, 
temperature, gestational distance, haemoglobin, 
blood pressure, ideal weight, and decision. At the 
same time, interviews were conducted with 
midwives for researchers to explore pregnancy 
knowledge. In addition, the process of collecting 
data by researchers, namely non-participant 
observation. Observation by observing the visit 
book of pregnancy patients in independent midwife 
practices. 
3. Pre-Processing data 

Objective examination data classify by variable 
names sourced from secondary data and interviews 
to obtain valid data. At the same time, 
reconstructing data to form a data structure makes 
it more suitable for using the random forest method. 

 
4. Modelling 

The classification algorithm modelling in this 
study uses predictive categories, namely Decision 
Tree and Random Forest. In data mining, there is an 
evaluation to determine the algorithm's accuracy. 
Comparison of these two algorithms to create 
patterns of train data and test data with Cross-
Validation 2 to determine the level of good accuracy. 

  
5. Model Evaluation 

Comparison of two algorithms to find a good 
level of accuracy with predictive value or accuracy 
to evaluate the model requires a method to measure 
model performance in measuring the performance 
of the classification model parameters, the designer 
model, and represented using the confusion matrix 
formula. 

 
Table 2. Confusion Matrix 

Class 
Actual: 

True 
Actual: 
False 

Prediction: 
Respondent 1 

TP 
(True 

Positive) 

FP 
(False 

Positive) 

Prediction: 
Respondent n 

FN 
(False 

Negative) 

TN 
(True 

Negative) 
 
There is the following explanation of the terms TP, 
TN, FP, and FN in this study: 
a. True Positive (TP) 

It shows that predicting pregnancy risk level is 
risky, and it is true that the risk level is a risk. 
b. False Positive (FP) 

It shows that predicting pregnancy risk level is 
not risky, but the model that has predicted it is risky. 
c. False Negative (FN) 

It shows that the pregnancy risk level is risky, 
but the model predicts it is not a risk. 
d. True Negative (TN) 

It shows that predicting the risk level of the 
pregnancy is not risky, and the model created has 
predicted the risk level of the pregnancy. 
6. Drawing Conclusions 

Data from the model evaluation results to then 
conclude that use as an accuracy value and model 
performance, including proving the theory of the 
research idea. 

 
RESULTS AND DISCUSSION 

 
In this trial, training models and processes 

on the RStudio platform with the main libraries, 
namely caret, reptree, dplyr, devtools, random 
forest, tree, rpart, rocr, and related to the main 
software in prototype development, namely the 
object detection framework visual code, and MySQL 
database. 

Literature study serves as a source of study 
in discussing the theory surrounding previous 
research, as explained in table 1. At the same time, 
interviews with midwives and patients related to 
this study. The interview with the midwife is to 
learn the risk of pregnancy with an objective 
examination, namely through age, gravida, abortion, 
temperature, pregnancy distance, haemoglobin, 
blood pressure, ideal weight, and decision. Twenty 
questions will be given to patients through 
questionnaires so that the research obtains data to 
select and make decisions, namely risk and no risk. 
The objective examination results can build a 
classification of pregnancy risk levels using the 
random forest method. Below are the types of 
objective examination names in Table 3. 


P-ISSN: 2656-1743 | E-ISSN: 2656-1735 
DOI: https://doi.org/10.34288/jri.v4i3.XXX 

JURNAL RISET INFORMATIKA 
Vol. 4, No. 3 June 2022 

Accredited rank 3 (SINTA 3), excerpts from the decision of the Minister of RISTEK-BRIN No. 200/M/KPT/2020 

 
542 

 
Table 3. Type Name Objective Examination 

No Name Description Value 

1 Age Patient Age Number 
2 Gravida Pregnancy Number 
3 Abortus Pregnancy Miscarriage Number 
4 Temperature Body temperature of the patient Celius 

5 Pregnancy Spacing 
Gestational age of the expectant 

mother (in weeks) 
Number (in weeks) 

6 Haemoglobin 
Haemoglobin level in pregnant 

women (gr/dl) 
Gr/dl 

7 Blood Pressure 
Pregnant women's blood 

pressure (systole/diastole) 
Systole/diastole 

8 Ideal Body Weight 
Normal weight gain in pregnant 

women (9-15 Kg during 
pregnancy) 

Kg 

9 Decision 
The conclusion from the data 

obtained 
The risk or No Risk 

 
Validation of preparation data with an 

objective examination of 500 patients, from which 
the collection is done through secondary data, while 
the interviews were not with the main subject. The 
initial data that the type of objective examination 
name has grouped in table 3 forms the initial data 
type structure. In RStudio, the data type must first 

be converted into a factor data type to perform 
cross-validation two so that the data processing 
process uses the random forest method. In table 4. 
the initial data type structure has been categorized 
as providing risk and non-risk decisions (see Table 
4). 
 

Table 4. Initial Testing Data Type Structure 

No Name Data Type Value 

1 Age Char "above 35 years," "18 to 35 years." 
2 Gravida Char "more than 4x" "below 4x" 
3 Abortus Char "above 2x" "below 2x" 
4 Temperature Char "Hyperthermy," "Normal" 

5 Pregnancy Spacing Char 
"above two years or 24 months," "below 
two years or 24 months," "first child"v 

6 Haemoglobin Char "Normal" "mild anemia" 

7 Blood Pressure Char 
"Normal 100-130," "hypothermic below 

100," "normal 100-130" 
8 Ideal Body Weight Char "less than 18kg" 
9 Decision Char The risk or No Risk 

 
Data processing through RStudio uses a 
ratio of 80% as training data and 20% as testing 
data. Using a cross-validation two pattern that 
divides into train and test data with a cross-
validation two ratio of 80% and 20%, which results 
in initial objective testing data that 52 pregnancy 
patients are at risk and 48 pregnancy patients are 
not at risk in the initial objective testing data. While 
in the initial objective training data, there are 192 
pregnancy patients at risk and 208 pregnancy 
patients who are not at risk in the initial objective 
training data. 

The data integration stage will present data 
analysis and interpretation after collecting data 
from existing data. The results of data integration 
analysis can produce a comparison of risk and non-
risk with different variable data compositions. We 
can see this in Table 5. 

Then, in Table 5, there is a difference in the 
composition of the initial dataset. Also, this data can 
experience a change in the variables and 
composition of each data before modelling and k-
fold. Then the initial dataset must be changed to get 
the same composition and variables between risk 
and non-risk for modelling and k-fold. 


JURNAL RISET INFORMATIKA 
Vol. 5, No. 1. December 2022 

P-ISSN: 2656-1743 |E-ISSN: 2656-1735 
DOI: https://doi.org/10.34288/jri.v5i1.487 

Accredited rank 3 (SINTA 3), excerpts from the decision of the Minister of RISTEK-BRIN No. 200/M/KPT/2020 

 
543 

 
Table 5. The initial dataset that has not been cross-
validated has two risks and no risks 

No Data Name Risk No Risk 

1 
Validation 

Objective Initial 
Data 

0,48 0,52 

2 
Objective Initial 

Training Data 
0,52 0,48 

3 
Objective Initial 

Testing Data 
0,48 0,51 

 
Table 6. Initial Dataset with Cross-Validation 2 Risk 

and No Risk 
No Data Name Risk No Risk 

1 
Validation 

Objective Initial 
Data 

0,50 0,50 

2 
Objective Initial 

Training Data 
0,50 0,50 

3 
Objective Initial 

Testing Data 
0,50 0,50 

 
Can be seen in Table 6 shows a change in the 

initial objective data into balanced composition data 
on risk and non-risk variables. The number of data 
sets each data as can be seen in table 7. Data Value 
Results. 

 
Table 7. Data Value Results 

No Nama Data Jumlah 

1 
Validation Objective Initial 

Data 
488 

2 Objective Initial Training Data 384 

3 Objective Initial Testing Data 96 

 
Modelling employs information that has 

been altered by changing the number of columns 
and weighted to account for risk and non-risk 
factors. This data set uses the basic algorithm as a 
Random Forest model by considering the Decision 
Tree algorithm. These two algorithms will search 
and provide a good level of accuracy, namely 
accuracy, precision, recall, and f1-score, to get the 
results to build a pregnancy risk level classification 
system that provides accurate information on 
pregnant women patients experiencing a risk level 
or not at risk early. There will also be a ratio 
comparison of 80% as training data and 20% as 
testing data with a cross-validation pattern of 2. Can 
be seen in table 8. Comparison of the accuracy of 
random forest and decision tree algorithms. 
 

Table 8. Comparison of Random Forest and Decision Tree Accuracy 

No Algorithm Comparison Accuracy Precision recall F1-score 

1 Random Forest Training Data 85% 84% 2,95 1,31 

2 Decision Tree Training Data 77% 79% 1,67 1,08 

3 Random Forest Testing Data 98% 94% 1 97 

4 Data Testing Decision Tree 74% 76% 0,68 0,72 

Can be seen in table 8. Comparison of the 
accuracy of random forest and decision tree 
methods has different values. There are several 
explanations in table 8. Comparison of the accuracy 
of random forest and decision tree methods below: 
1. Accuracy shows that the data from random 

forest modelling results, namely train and 
testing data. Compared with the data from the 
decision tree modelling results, namely train and 
testing data, it produces a conclusion on the 
accuracy, namely the random forest method has 
a good level of model accuracy and is superior in 
classifying correctly, which gets a data value of 

85% train and 97% testing. Compared to the 
decision tree. 

2. Precision shows that the random forest method 
has a good accuracy between the data requested 
and the prediction results given by the model, 
getting a value of 84% train data and 94% 
testing data. Compared to the decision tree, the 
value obtained by the training data is 79%, and 
the testing data is 76%. 

3. Recall shows that random forest modelling gets 
the model's success in finding good information 
with a data train value of 2.95 and testing data 1. 


P-ISSN: 2656-1743 | E-ISSN: 2656-1735 
DOI: https://doi.org/10.34288/jri.v4i3.XXX 

JURNAL RISET INFORMATIKA 
Vol. 4, No. 3 June 2022 

Accredited rank 3 (SINTA 3), excerpts from the decision of the Minister of RISTEK-BRIN No. 200/M/KPT/2020 

 
544 

 
At the same time, compared to decision tree 
modelling getting a data train value of 1.67. 

4. F-1 Score shows that random forest modelling 
can provide a good algorithm performance 
value, namely data train 1.31 and data testing 97. 
While compared to decision tree modelling, the 
algorithm performance value gets a data train 
value of 1.08 and data testing 0.72. 

Testing tools in the form of visual code 
programs, like those used for constructing web-
based models, and a database used for the random 
forest method, are among the many model 
implementations available. So that it answers the 
purpose of this research is to detect early on the risk 
of pregnancy, both risk and not risk to pregnant 
women patients. This implementation will provide 
information on the accuracy level of pregnant 
women in the form of risk or no risk. 

In the concept of model implementation, 
researchers use the SDLC (System Development Life 
Cycle) waterfall. The waterfall model provides a 
sequential software flow or cycle approach. We can 
see this in Figure 3. Waterfall illustration below: 
 

Figure 3. Modified Waterfall Model 

 
The second analysis stage occurs during the 
translation of software requirements into a design 
representation for later incorporation into the 
program. 

 
So this research is built using a use case 

diagram to classify pregnancy risk levels using the 
random forest method (Figure 4). 

 
Dataset Upload

Random Forest 
Construction

Single Data Test

 
Figure 4. Use Case Diagram 

 
Some actors or users have several functions, 

as shown in Figure 4. Use-case-diagram there is a 
process of uploading data sets, building random 
forests, and testing data, namely single testing data 
with a cross-validation rule of 2 and displaying 
accuracy values. The initial view on implementing 
the random forest model is still empty to run the 
single data system provided. Also has a menu 
function in the form of data sets to split data and 
modelling used in random forest models. We can see 
the initial appearance of the application in Figure 5. 
 

Figure 5. Initial View of the Menu 


JURNAL RISET INFORMATIKA 
Vol. 5, No. 1. December 2022 

P-ISSN: 2656-1743 |E-ISSN: 2656-1735 
DOI: https://doi.org/10.34288/jri.v5i1.487 

Accredited rank 3 (SINTA 3), excerpts from the decision of the Minister of RISTEK-BRIN No. 200/M/KPT/2020 

 
545 

 
Can be seen in Figure 5. It is a display of the 

results of uploading datasets into the system. The 
data set shown is raw or has no training and test 
data partitions yet. So, to do random forest 
modelling, first split the data or divide the training 
and testing data with a ratio of 80% training data 
and 20% testing data with cross-validation 2 in the 

application of pregnancy risk level classification 
using the random forest method. The above dataset 
divides training and testing data using the random 
forest model. Use training data to train the 
algorithm while using test data to determine the 
performance of a previously trained algorithm 
when it encounters new, never-before-seen data. 

  
Figure 6. Dataset view 
 

Figure 7. K-Fold value input display 

 
Table 9. K-Fold values 2 to 10 below. 

No. 
K-Fold 

Sequence 
Testing Data 

Accuracy 
Training Data 

Accuracy 

1 K-Fold 2 96% 85% 

2 K-Fold 3 96% 79% 

3 K-Fold 4 95% 78% 

4 K-Fold 5 95% 77% 

5 K-Fold 6 94% 77% 

6 K-Fold 7 94% 75% 

7 K-Fold 8 93% 75% 

8 K-Fold 9 93% 72% 

9 K-Fold 10 93% 71% 

 
P-ISSN: 2656-1743 | E-ISSN: 2656-1735 
DOI: https://doi.org/10.34288/jri.v4i3.XXX 

JURNAL RISET INFORMATIKA 
Vol. 4, No. 3 June 2022 

Accredited rank 3 (SINTA 3), excerpts from the decision of the Minister of RISTEK-BRIN No. 200/M/KPT/2020 

 
546 

 
Figure 7 displays the pregnancy risk level 
classification system using the random forest 
method. There are several k-folds of 2 to 10, which 
have different accuracy values for testing and 
training. The k-fold 2 to 10 is k-fold 2, which has a 
high accuracy value in both testing and training data 
in Figure 8.  

 
Figure 8. K-Fold 2 Result Display 

 
Figure 9. Display of Pregnancy Risk Level Prediction Application 
 

JURNAL RISET INFORMATIKA 
Vol. 5, No. 1. December 2022 

P-ISSN: 2656-1743 |E-ISSN: 2656-1735 
DOI: https://doi.org/10.34288/jri.v5i1.487 

Accredited rank 3 (SINTA 3), excerpts from the decision of the Minister of RISTEK-BRIN No. 200/M/KPT/2020 

 
547 

 
Figure 10. Single Data Result View 
 

In the next stage, researchers will predict 
the classification of pregnancy risk levels using the 
random forest method from the testing data that 
has been k-fold 2. Display of pregnancy risk level 
prediction application. 

In testing the prediction of pregnancy risk 
level classification applications using the random 
forest method using single data on testing data with 
risk status. By having several factors, namely 18 to 
35 years, under 4x, above 2x, hyperthermia, under 
two years or 24 months, mild anemia, normal 100-
130, and less than 18 kg. To get the results at 87% 
and not at risk at 13%. Can be seen in Figure 10. 
Display of single data prediction results. 

 
CONCLUSIONS AND SUGGESTIONS 
 

The results of the application of pregnancy 
risk level classification using the random forest 
method for early pregnancy prediction. In the 
random forest, the method can get a good accuracy 
value. In the future, this research should use 
subjective and objective examination data. Also, the 
software can be built based on android and perform 
data analysis in hybrids. 
 
 
REFERENCES 
 
Adrian, M. R., Putra, M. P., & Rakhmawati, N. A. 

(2021). Perbandingan Metode Klasifikasi 
Random Forest dan SVM pada Anlisis 
Sentimen PSBB. Informatika UPGRIS, 7(1), 36–
40. https://doi.org/10.26877/jiu.v7i1.7099 

Akbar, F. M., Ali, M., & Zahro, H. Z. (2021). K-Means 
Clustering Untuk Pengelompokkan Tingkat 
Resiko Ibu Hamil Di Praktik Mandiri Bidan 
Upt Puskesmas Pandanwangi Malang. Jurnal 

Mahasiswa Teknik Informatika, 5(2), 581–
588. 
https://doi.org/10.36040/jati.v5i2.3772 

Apriliah, W., Kurniawan, I., Baydhowi, M., & Haryati, 
T. (2021). Prediksi Kemungkinan Diabetes 
pada Tahap Awal Menggunakan Algoritma 
Klasifikasi Random Forest. Jurnal Sistem 
Informasi, 10(1), 163–171. 
https://doi.org/10.32520/stmsi.v10i1.1129 

Aziz, F. (2021). Klasifikasi Aktivitas Manusia 
menggunakan metode Ensemble Stacking 
berbasis Smartphone. Journal of System and 
Computer Engineering, 1(2), 106–111. 
https://doi.org/10.47650/jsce.v1i2.171 

Badan Pusat Statistik Provinsi Bengkulu. (2021). 
Profil Kesehatan Ibu dan Anak Provinsi 
Bengkulu 2020. Bengkulu: Badan Pusat 
Statistik Provinsi Bengkulu. Retrieved from 
https://bengkulu.bps.go.id/publication/202
1/07/02/e23d0b377ed0964bb9fc06c4/prof
il-kesehatan-ibu-dan-anak-provinsi-
bengkulu-2020.html. 

Bengkulu, D. K. P. (2021). Profil Kesehatan Provinsi 
Bengkulu Tahun 2021 ( provbengkulu dinkes, 
Ed.). dinas kesehatan provinsi bengkulu. 
dinkes.bengkuluprov.go.id 

Byna, A. (2019). Penerapan Optimasi PSO untuk 
meningkatkan Akurasi Algoritma ID3 pada 
Prediksi Penyakit Ibu Hamil. Jurnal Teknologi 
Informasi Universitas Lambung Mangkurat, 
4(2), 65–70. 
https://doi.org/10.20527/jtiulm.v4i2.40 

Hikmatulloh, Rahmawati, A., Wintana, D., & 
Ambarsari, D. A. (2019). Penerapan Algoritma 
Iterative Dichotomiser Three (ID3) dalam 
mendiagnosa Kesehatan Kehamilan. 
Kumpulan Jurnal Ilmu Komputer, 6(2), 116–
127. https://doi.org/10.20527/klik.v6i2.189 

Putra, J. W. G. (2020). Pengenalan Konsep 
Pembelajaran Mesin dan Deep Learning (1.4). 
https://wiragotama.github.io/resources/ebo
ok/intro-to-ml-secured.pdf 

Kementerian Kesehatan RI. (2021). Profil 
Kesehatan Indonesia Tahun 2020. In 
Kementerian Kesehatan RI. 
https://doi.org/10.1524/itit.2006.48.1.6 

Kustiyahningsih, Y., Mula’ab, & Hasanah, N. (2020). 
Metode Fuzzy ID3 Untuk Klasifikasi Status 
Preeklamsi Ibu Hamil. Teknika, 9(1), 74–80. 
https://doi.org/10.34148/teknika.v9i1.270 

Mulaab, M. (2020). Data Mining Konsep dan 
Aplikasi. Malang: MNC Publishing. 

Religia, Y., Nugroho, A., & Hadikristanto, W. (2021). 
Klasifikasi Analisis Perbandingan Algoritma 
Optimasi pada Random Forest untuk 
Klasifikasi Data Bank Marketing. Jurnal RESTI 


P-ISSN: 2656-1743 | E-ISSN: 2656-1735 
DOI: https://doi.org/10.34288/jri.v4i3.XXX 

JURNAL RISET INFORMATIKA 
Vol. 4, No. 3 June 2022 

Accredited rank 3 (SINTA 3), excerpts from the decision of the Minister of RISTEK-BRIN No. 200/M/KPT/2020 

 
548 

 
(Rekayasa Sistem Dan Teknologi Informasi), 
5(1), 187–192. 
https://doi.org/10.29207/resti.v5i1.2813 

RI, K. (2021). Profil Kesehatan Indonesia 2021. 
https://www.kemkes.go.id/downloads/reso
urces/download/pusdatin/profil-kesehatan-
indonesia/Profil-Kesehatan-2021.pdf 

Suryanegara, G. A. B., Adiwijaya, & Purbolaksono, M. 
D. (2021). Peningkatan Hasil Klasifikasi pada 
Algoritma Random Forest untuk Deteksi 
Pasien Penderita Diabetes Menggunakan 
Metode Normalisasi. Jurnal Rekayasa Sistem 
Dan Teknologi Informasi, 5(1), 114–122. 
https://doi.org/10.29207/resti.v5i1.2880 

Wulandari, T., & Susanto, A. (2018). Deteksi Tingkat 
Risiko Kehamilan dengan Metode Fuzzy 
Mamdani dan Simple Additive Weighting. 
Jurnal Teknologi Dan Sistem Komputer, 6(3), 
110–114. 
https://doi.org/10.14710/jtsiskom.6.3.2018.
110-114 

Pavlov, Y. L. (2018). Random Forests. Berlin: De 
Gruyter. Retrieved from 
https://www.degruyter.com/document/doi/
10.1515/9783110941975/html?lang=en