Lontar - Template LONTAR KOMPUTER VOL. 13, NO. 2 AUGUST 2022 p-ISSN 2088-1541 DOI : 10.24843/LKJITI.2022.v13.i02.p04 e-ISSN 2541-5832 Accredited Sinta 2 by RISTEKDIKTI Decree No. 158/E/KPT/2021 105 Comparison of Naive Bayes Method and Certainty Factor for Diagnosis of Preeclampsia Linda Perdana Wantira1, Nur Wachid Adi Prasetyaa2, Laura Saria3, Lina Puspitasarib4 ,Annisa Romadlonia4 aDepartment of Informatics, Politeknik Negeri Cilacap Jln. Dr. Soetomo No.1, Karangcengis, Cilacap Selatan, Cilacap, Jawa Tengah, Indonesia 1linda_perdana@pnc.ac.id (Corresponding author) 2nwap.pnc@pnc.ac.id 3laurasari@pnc.ac.id 4annisa_romadloni@pnc.ac.id bDepartment of Midwifery, STIKES Graha Mandiri Cilacap Jln. Dr. Soetomo No.4-B, Karangcengis, Cilacap Selatan, Cilacap, Jawa Tengah, Indonesia 3linapuspitasari@gmail.com Abstract Preeclampsia is a disease often suffered by pregnant women caused by several factors such as a history of heredity, blood pressure, urine protein, and diabetes. The data sample used in this study is data on pregnant women in the 2020 time period recorded at health services in the former Cilacap Regency. This study was conducted to compare the final results of the Naive Bayes method and the certainty factor method in providing the results of a diagnosis of preeclampsia seen from the symptoms experienced by these pregnant women. The naïve Bayes approach provides decisions by managing statistical data and probabilities taken from the prediction of the likelihood of a pregnant woman showing symptoms of preeclampsia. Symptoms of preeclampsia, while the certainty factor method determines the certainty value of the diagnosis of preeclampsia in pregnant women based on the calculation of the CF value. The research output compares the two methods, showing that the certainty factor method provides more accurate diagnostic results than the Naive Bayes method. It happens because the CF method requires a minimum value of 0.2 and a maximum of 1 for each rule on the factors/symptoms involved, while the Naive Bayes method only requires values of 0 and 1 for each factor causing preeclampsia in pregnant women. Keywords: Preeclampsia, Expert System, Naïve Bayes, Certainty Factor, Pregnant Women 1. Introduction Preeclampsia is a hypertensive disorder in pregnant women that significantly affects morbidity and is one of the causes of death in pregnant women and fetuses [1], [2]. Maternal Mortality Ratio (MMR), according to the World Health Organization (WHO), is the incidence of death in pregnant women during the period around delivery, which is 42 days after the end of pregnancy, which is caused by all causes related to pregnancy or the wrong way of handling it and is not caused by injury or accident [3]. Maternal Mortality Ratio (MMR) and Infant Mortality Ratio (IMR) are some of the benchmarks for the health and welfare of the people in a country [4]. WHO reports from various sources that the direct cause of maternal deaths occurs during and after childbirth and is caused by bleeding, infection, or high blood pressure during pregnancy by 75% [5]. According to WHO data, the prevalence of preeclampsia is 1.8-18% in developing countries, while in developed countries, it is 1.3-6%. This value indicates that the case of pregnant women with preeclampsia in developing countries is higher than in developed countries because preventive treatment of pregnant women with preeclampsia is handled faster in developed countries than in developing countries [6]. In Indonesia alone, the Maternal Mortality Ratio (MMR) for the last ten years was 459 maternal and fetal deaths from 100,000 births, with a frequency of preeclampsia incidence of LONTAR KOMPUTER VOL. 13, NO. 2 AUGUST 2022 p-ISSN 2088-1541 DOI : 10.24843/LKJITI.2022.v13.i02.p04 e-ISSN 2541-5832 Accredited Sinta 2 by RISTEKDIKTI Decree No. 158/E/KPT/2021 106 around 3% to 10% of all pregnancies. The MMR value in Indonesia as a developing country is still relatively high. Data from the Inter-Census Population Survey (SUPAS) recorded MMR in as many as 305 cases during the last five years; this means that there are 305 cases of maternal death caused by pregnancy until delivery for 42 days after delivery per 100,000 live births [7]. In Cilacap Regency, according to data from the Cilacap Regency Health Office, it shows that during the last two years, MMR was 15 cases while for IMR it was 155 cases. Meanwhile, for the maximum target of the Regional Medium-Term Development Plan (RPJMD) of Cilacap Regency, the MMR is 19 cases and the IMR is 139 cases [8]. Based on this target, the MMR in Cilacap Regency is still quite high even though it is below the maximum standard set [9]. This has become the concern of relevant institutions in Cilacap Regency to continue suppressing MMR and IMR so that the level of community welfare increases. MMR can be identified based on the mother's general condition during the gestation of 40 weeks [10]. One of the identifications can be done through health examination of pregnant women in available health facilities [11]. This identification reduces the risk of death of pregnant women and fetuses, which can be predicted based on the symptoms experienced during pregnancy through prompt and correct handling in the most dangerous period, namely the period around delivery [12]. An expert system can be simply a transfer of knowledge from an expert to a computer through an information system that can be utilized without time and place restrictions [13]. The expert system asks for facts that will later be used as knowledge inference which is then processed to provide conclusions or decisions that are conical to a result of these facts [14]. The conclusion is considered the result of consultation with experts, who provide non-expert advice and explain possible solutions to the consequences [15]. Several studies have been conducted on implementing the naïve Bayes method and certainty factors to detect various diseases, including the research conducted by Hanny, which mapped the spread of respiratory tract infections (ARI) using the Naive Bayes method. Classification is carried out using ARI data so that the community is responsive to the spread of ARI diseases and helps medical personnel to complete the eradication of ARI diseases that have been targeted. The result of this study is the visualization used for mapping the spread of ARI disease based on classification using naïve Bayes [16]. Further research was conducted by Yovita et al., who implemented the naïve Bayes method in an expert system for diagnosing dysmenorrhea. Diagnosis is made to produce a conclusion about the dysmenorrhea suffered by a woman, whether it is included in the category of primary dysmenorrhea or secondary dysmenorrhea using the Naive Bayes classification. The analysis results show that the Naive Bayes method classification accuracy is 90% for the ten tested data [17]. Subsequent research was carried out by Muhammad et al., who used the Naive Bayes algorithm to determine the credit given to prospective customers. The naïve Bayes algorithm is used to predict and classify potentially problematic and non-problematic customers to get credit so that the company does not lose money with customers who have the potential to cause problems with bad loans in the future [18]. Subsequent research by Khairina et al. applied the certainty factor to an expert system for diagnosing ENT diseases. The expert in this study is an ENT specialist who provides complete and detailed information about the causes and symptoms experienced by patients who have problems with their ears, nose, and throat. The results of this study are a website-based information system that can diagnose ENT diseases by selecting the symptoms experienced by patients, and search results provided by the system results in the form of information about ENT diseases suffered based on the selected symptoms [19]. Based on several studies that have been done before, the authors are interested in comparing the certainty factor method and the naive Bayes method in diagnosing preeclampsia in pregnant women. The search results for preeclampsia by comparing the naïve Bayes method and the certainty factor method are used to design and develop an expert system. It is conducted by exploring expert knowledge, used as a knowledge base in an expert system development environment [20]. The consulting environment has a user interface, annotation facilities, and an inference engine connected to the development environment [21]. After extracting expert knowledge, forming rules based on facts on a knowledge base that will later be used in the tracing process, becomes the next step in designing an expert system for diagnosing preeclampsia in pregnant women [22]. The conclusions/decision results given are non-expert; if there are doubts about the results, they can later be consulted with real experts [23]. With the results, it is hoped that the developed expert system will be able to suppress the Maternal Mortality Ratio (MMR) to LONTAR KOMPUTER VOL. 13, NO. 2 AUGUST 2022 p-ISSN 2088-1541 DOI : 10.24843/LKJITI.2022.v13.i02.p04 e-ISSN 2541-5832 Accredited Sinta 2 by RISTEKDIKTI Decree No. 158/E/KPT/2021 107 prevent the death of pregnant women and babies as early as possible. The research used the certainty factor and naive bayes method to find the most effective method in providing recommendations for the category of preeclampsia based on the factors/symptoms, whether it falls into the severe, moderate, or mild category of preeclampsia. The expected benefit of this research is to provide fast and accurate information to stakeholders in diagnosing the category of preeclampsia by involving the factors/symptoms experienced by pregnant women. 2. Research Methods At this stage, it is explained about the certainty factor method, the Naive Bayes method, data on factors that cause preeclampsia, rule data for the two methods used for the process of tracing preeclampsia, and flowcharts for each method being compared. 2.1. Naïve Bayes Method The naïve bayes method is better known and more widely used in the classification process, while in the expert system developed the naïve bayes method is used to classify data on symptoms of disease experienced by pregnant women to raise the opportunity for preeclampsia which causes delays in the normal delivery process if not treated early. and lead to a conclusion about preeclampsia with the highest posterior score [24], [25]. The naïve Bayes approach is an appropriate expert system for the early detection of preeclampsia because it defines rules that use probability in producing an appropriate decision/recommendation [26]. Figure 1 describes a flowchart for calculating the probability of preeclampsia in pregnant women, starting with entering data on symptoms/factors causing preeclampsia and then checking the training data used in this study. The next stage is determining the posterior value, from finding the mean to finding the prior value and probability value for each class involved [27]. Figure 1. Flowchart of the Naïve Bayes Method LONTAR KOMPUTER VOL. 13, NO. 2 AUGUST 2022 p-ISSN 2088-1541 DOI : 10.24843/LKJITI.2022.v13.i02.p04 e-ISSN 2541-5832 Accredited Sinta 2 by RISTEKDIKTI Decree No. 158/E/KPT/2021 108 Calculations on the Naive Bayes method to generate disease opportunities go through several stages of the process as explained below [28]: a. Calculate the average of each class by using the equation below to find the initial value for each class involved [29]: 𝑋(𝑝𝑖|𝑎𝑗) = 𝑞𝑑+(𝑟∗𝑥) 𝑞+𝑟 (1) Description: Qd = the value of the data record in the training data that have a = aj and p = pi X = 1 / many types of class / disease r = number of symptoms/parameter q = the value of the data record in the training data that has a value of a = aj/each class/disease b. Determine the likelihood value for each existing class using the equation below [30]: 𝑋(𝑎𝑗) = 𝑞 𝑟 (2) c. Determine the posterior value for each class involved using the following equation [31]: 𝑋(𝑎𝑗|𝑝𝑖) = 𝑋(𝑝𝑖|𝑎𝑗) ∗ 𝑋(𝑎𝑗) (3) The final result of the Naive Bayes method is to classify the classes involved in the process of appearing the chance of preeclampsia disease by comparing the posterior end values of each class involved [32]. And the result of the naïve bayes method of classification is the highest posterior value of several classes being compared [33]. 2.2. Certainty Factor Method The certainty factor method is a method for tracing a conclusion that begins by observing the symptoms [28]. Tracing a conclusion is used to measure the certainty of a set of facts or rules [34]. In this case, the set of facts in question is the symptoms experienced by pregnant women during pregnancy from the first trimester to the last trimester. The data is collected to make rules for tracing preeclampsia [35]. The certainty factor (CF) value is calculated to show confidence in the facts of an event [36]. One of the reasons for choosing the certainty factor method to diagnose preeclampsia in pregnant women is that this method can measure something certain and uncertain in deciding on an expert system that is being developed [37]. The measure of the certainty of a fact is denoted by MB (Measure of increased Belief), while the measure of uncertainty is denoted by MD (Measure of increased Disbelief) [19]. The stages of the CF value search process are as follows [38]: a. Determine the value of CF 𝐶𝐹[𝐻, 𝐸] = 𝑀𝐵[𝐻, 𝐸] − 𝑀𝐷[𝐻, 𝐸] (4) Description CF [H, E]: a measure of the certainty of the hypothesis H that affected by symptoms E MB [H, E]: a measure of MB's confidence in H affected by E MD [H, E]: a measure of MD's distrust of H affected by E b. Determine the value of CF Combination determined by one premise 𝐶𝐹[𝑋Λ𝑌] = 𝑀𝑖𝑛(𝐶𝐹[𝑥], 𝐶𝐹[𝑦]) ∗ 𝐶𝐹[𝑅𝑈𝐿𝐸] (5) c. Determine the value of CF Combination determined by more than one premise 𝐶𝐹[𝑋Λ𝑌] = 𝑀𝑎𝑥(𝐶𝐹[𝑥], 𝐶𝐹[𝑦]) ∗ 𝐶𝐹[𝑅𝑈𝐿𝐸] (6) d. Determine the CF value for the same conclusion 𝐶𝐹 𝐶𝑜𝑚𝑏[𝐶𝐹1, 𝐶𝐹2] = 𝐶𝐹1 + 𝐶𝐹2 ∗ (1 − 𝐶𝐹1) (7) LONTAR KOMPUTER VOL. 13, NO. 2 AUGUST 2022 p-ISSN 2088-1541 DOI : 10.24843/LKJITI.2022.v13.i02.p04 e-ISSN 2541-5832 Accredited Sinta 2 by RISTEKDIKTI Decree No. 158/E/KPT/2021 109 The final result of the certainty factor method provides a certainty value for a decision, namely determining diseases that attack pregnant women [11]. The accuracy of the calculation results of this method is maintained because it can only process two data for one calculation [39], [40]. Figure 2 shows the stages of the certainty factor method, starting with determining the CF value for each premise of the rule used, then proceeding with determining the combination CF value determined by one or more premises, and ending with determining the CF value for the same conclusion, namely the diagnosis of preeclampsia [41]. Figure 2. Flowchart of Certainty Factor Method 2.3. Preeclampsia The data on symptoms/factors causing preeclampsia used in this study are shown in table 1. While table 2 shows the data description of elements grouped by symptoms in table 1. Table 3 shows examples of rule data used to diagnose preeclampsia based on data in table 1, and Table 2 is data on symptoms/factors causing preeclampsia. The rules in table 3 are formed based on the knowledge base obtained after consulting with experts, namely obstetricians and midwives. The category itself is divided into four categories, namely severe preeclampsia with the symbol (B), moderate preeclampsia with the symbol (S), mild preeclampsia with the symbol (R), and undetected preeclampsia with the symbol (T). Table 1. Preeclampsia Symptom Factor Data Factor Code Information Factor Description F01 Age U1, U2, U3 F02 Parity P1, P2 F03 Pregnancy Distance JK1, JK2 F04 Multiple Pregnancy KG1, KG2 F05 History of Preeclampsia RP1, RP2 F06 History of Hypertension RH1, RH2 F07 Descendants History RK1, RK2 F08 History of DM RD1, RD2 F09 Nutritional status SG1, SG2 F10 Antenatal Care AC1, AC2 F11 Family Planning Acceptor History RA1, RA2 F12 Educational status SP1, SP2 F13 Knowledge P1, P2, P3 F14 Economic Status SE1, SE2 F15 Work PK1, PK2 F16 Health Service Distance J1, J2, J3 LONTAR KOMPUTER VOL. 13, NO. 2 AUGUST 2022 p-ISSN 2088-1541 DOI : 10.24843/LKJITI.2022.v13.i02.p04 e-ISSN 2541-5832 Accredited Sinta 2 by RISTEKDIKTI Decree No. 158/E/KPT/2021 110 Table 2. Description of the Causes of Preeclampsia Code Description Factor Description Code Description Factor Description U1 <= 18 years SG1 Obesity U2 18 - 38 years SG2 Not U3 >= 38 years AC1 3 times P2 Second/more RA1 There is JK1 < 24 months RA2 Not JK2 >/ 24 months SP1 Elementary/ Junior High School KG1 Double SP2 High School/ College KG2 Single P1 Not enough RP1 There is P2 Currently RP2 Not P3 Good RH1 There is SE1 <500k RH2 Not SE2 >/= 500k RK1 There is PK1 Unemployment RK2 Not PK2 Work RD1 There is J1 >1000 meters RD2 Not J2 = 38 years old 2. RH1: There is a history of hypertension 3. RP1: There is a history of preeclampsia 4. RD2: No history of diabetes 5. AC1: Antenatal care