PARADIGMA BARU PENDIDIKAN MATEMATIKA DAN APLIKASI ONLINE INTERNET PEMBELAJARAN How to cite: P. K. Intan, β€œComparison of Kernel Function on Support Vector Machine in Classification of Childbirth", Mantik, vol. 5, no. 2, pp. 90-99, October 2019. Comparison of Kernel Function on Support Vector Machine in Classification of Childbirth Putroue Keumala Intan UIN Sunan Ampel Surabaya, putroue@uinsby.ac.id doi: https://doi.org/10.15642/mantik.2019.5.2.90-99 Abstrak: Angka kematian ibu hamil saat proses melahirkan dapat diturunkan melalui upaya ketepatan tim medis dalam menentukan proses persalinan yang harus dijalani. Pembelajaran menggunakan mesin dalam hal mengklasifikasikan proses persalinan bisa menjadi solusi bagi tim medis dalam menentukan proses persalinan. Salah satu metode klasifikasi yang dapat digunakan adalah metode Support Vector Machine (SVM) yang mampu menentukan hyperplane yang akan membentuk decision boundary yang baik sehingga mampu mengklasifikasikan data dengan tepat. Pada SVM terdapat fungsi kernel yang berguna untuk menyelesaikan kasus klasifikasi non linier dengan cara mentransformasi data ke dimensi yang lebih tinggi. Pada penelitian ini akan digunakan empat fungsi kernel; Linier, Radial Basis Function (RBF), Polinomial, dan Sigmoid pada proses klasifikasi proses persalinan guna mengetahui fungsi kernel yang mampu menghasilkan nilai akurasi tertinggi. Berdasarkan penelitian yang telah dilakukan diperoleh bahwasanya nilai akurasi yang dihasilkan oleh SVM dengan fungsi kernel linier lebih tinggi dibandingkan dengan tiga fungsi kernel yang lainnya. Kata kunci: persalinan, SVM, fungsi kernel Abstract: The maternal mortality rate during childbirth can be reduced through the efforts of the medical team in determining the childbirth process that must be undertaken immediately. Machine learning in terms of classifying childbirth can be a solution for the medical team in determining the childbirth process. One of the classification methods that can be used is the Support Vector Machine (SVM) method which is able to determine a hyperplane that will form a good decision boundary so that it is able to classify data appropriately. In SVM, there is a kernel function that is useful for solving non-linear classification cases by transforming data to a higher dimension. In this study, four kernel functions will be used; Linear, Radial Basis Function (RBF), Polynomial, and Sigmoid in the classification process of childbirth in order to determine the kernel function that is capable of producing the highest accuracy value. Based on research that has been done, it is obtained that the accuracy value generated by SVM with linear kernel functions is higher than the other kernel functions. Keywords: childbirth, SVM, kernel functions Jurnal Matematika MANTIK Volume 5, Nomor 2, October 2019, pp. 90-99 ISSN: 2527-3159 (print) 2527-3167 (online) http://u.lipi.go.id/1458103791 http://u.lipi.go.id/1457054096 P. K. Intan Comparison of Kernel Function on Support Vector Machine in Classification of Childbirth 91 1. Introduction Childbirth is a period of high risk for mother and baby, either by vaginal birth or cesarean section. Some risks caused by childbirth are postpartum risks of cardiac arrest, wound hematoma hysterectomy, major puerperal infection, anaesthetic complications, venous thromboembolism and haemorrhage requiring hysterectomy[1]. Cesarean section risks include the morbidity associated with any major abdominal surgical procedure such as anaesthesia accidents, damage to blood vessels, an accidental extension of the uterine incision, damage to the urinary bladder and other organs. The cesarean section procedure is a potent risk factor for respiratory distress syndrome (RDS) in preterm infants and for other forms of respiratory distress in mature infants. RDS are major causes of neonatal morbidity and mortality.[2] In order to reduce the risks caused by childbirth, several solutions are needed. One of the solutions that can be implemented is to use machine learning in the childbirth classification process, such as Amin and Ali's research on the performance evaluation of supervised machine learning classifiers for predicting healthcare operational decisions [3]. The most popular classification method is Support Vector Machine (SVM) which is capable of producing good accuracy values, as in the study conducted by Qiong Li, Qinglin Meng, et al. which compares SVM models and different artificial neural network models to predict hourly cooling loads in buildings and SVM results can achieve better accuracy and generalization [4]. In other research about Comparison of support vector machine, neural network, and CART algorithms for the land-cover classification using limited training data points, these results indicated that SVM's had superior generalization capability, particularly with respect to small training sample sizes and the overall accuracies for the SVM algorithm were 91% (Kappa = 0.77) and 64% (Kappa = 0.34) for homogeneous and heterogeneous pixels [5], and in research on Comparison of Vector Engine Support and Artificial Neural Network Systems for Drug / Nondrug Classification, SVM yields 82% correct predictions and ANN produces 80% correct predictions[6]. SVM can also be applied in the field of medical science such as research conducted by Rajani and Selvi who applied the SVM classifier for early detection of breast cancer and research on SVM for diagnosis of Diabetes Mellitus.[7] SVM method is able to determine the optimal hyperplane that will form a good decision boundary so that it can classify data appropriately. In SVM on nonlinearly separable data, there are two solutions, the first is to make soft margin that is particularly adapted to noised data, and the second is to use a kernel function. The kernel function is used to protect the data points to higher dimensional space for better classification.[8][9] In this paper, we have developed a framework for the classification of childbirth using SVM classifier and studied the effect of various kernel functions on classification accuracy and performance. Jurnal Matematika MANTIK Vol. 5, No. 2, October 2019, pp. 90-99 92 2. Antecedents 2.1 Kernel Function The Kernel function is used to protect the data points to higher dimensional space to improve its ability to find the best hyperplane to separate the data points of different classes.[9] Definition (kernel function) [10] A kernel is a function of πœ… for all π‘₯, 𝑧 ∈ 𝑋 will meet the condition πœ…(π‘₯, 𝑧) = βŒ©πœ™(π‘₯). πœ™(𝑧)βŒͺ, where πœ™ is the mapping of inner product from 𝑋 to space with higher dimension 𝐹 πœ™: π‘₯ ↦ πœ™(π‘₯) ∈ 𝐹. Some common kernel function:[11] a. Linear The linear kernel function is defined as: πœ…(π’™π’Š, 𝒙𝒋 ) = π’™π’Š 𝑇 𝒙𝒋 (1) The linear kernel function is the simplest kernel function which is a dot product of two vectors. b. Radial Basis Function (RBF) RBF can also be called a Gaussian kernel function. RBF is defined as: πœ…(π’™π’Š, 𝒙𝒋 ) = exp (βˆ’π›Ύβ€–π’™π’Š βˆ’ 𝒙𝒋‖ 2 ) , 𝛾 > 0 (2) where 𝛾 is a positive parameter to set the distance c. Polynomial The polynomial kernel function with has a degree𝑑, where π‘Ÿ and 𝑑 are the parameters defined as follows: πœ…(π’™π’Š, 𝒙𝒋 ) = (𝛾 π’™π’Š 𝑇𝒙𝒋 + π‘Ÿ) 𝑑 , 𝛾 > 0 (3) d. Sigmoid The sigmoid kernel function is defined as: πœ…(π’™π’Š, 𝒙𝒋 ) = tanh(π›Ύπ’™π’Š, 𝒙𝒋 + 𝒓) (4) where tanh(π‘Ž) = 2𝜎(π‘Ž) βˆ’ 1, and 𝜎(π‘Ž) = 1 1+exp(π‘Ž) 2.2 Support Vector Machine (SVM) SVM is a data mining method developed by Boser, Guyon, and Vapnik and presented for the first time in 1992 [12]. The basic idea of this SVM is to determine a hyperplane function in the form of a linear model that will form a decision boundary (DB) by maximizing margins. Margin is the distance between the hyperplane and the nearest data. The SVM method is not only able to solve linear classification problems but also able to solve non-linear classification problems using the kernel trick concept. a. SVM on Linearly Separable Data For 𝑁 data set: {π’™π’Š, 𝑑𝑖 }, 𝑖 = 1 … 𝑁 (5) P. K. Intan Comparison of Kernel Function on Support Vector Machine in Classification of Childbirth 93 with π’™π’Š = [π‘₯1, π‘₯2, … , π‘₯𝑛] is a line vector with dimensions n and 𝑑𝑖 = {βˆ’1,1} is target value on each row vector. The data will be classified into two class, i.e. class 𝑅1 for target value 𝑑𝑖 = +1 and class 𝑅2 for target value 𝑑𝑖 = βˆ’1. SVM uses a linear model as a hyperplane with a general form: 𝑦(π’˜) = π’˜π‘‡ 𝒙 + 𝑏 (6) where x is the input vector, w is the weight parameter, and b is a bias. So with the hyperplane SVM will classify the data into two classes 𝑅1 and 𝑅2 with each class will have a delimiter field parallel to the hyperplane as: π’˜π‘‡ π’™π’Š + 𝑏 β‰₯ 1, π‘“π‘œπ‘Ÿ 𝑑𝑖 = +1 (7) π’˜π‘‡ π’™π’Š + 𝑏 ≀ βˆ’1, π‘“π‘œπ‘Ÿ 𝑑𝑖 = βˆ’1 (8) Or both of the delimiter fields are written in the following inequality: 𝑑𝑖 (π’˜ 𝑇 π’™π’Š + 𝑏) βˆ’ 1 β‰₯ 0 (9) The search for the best hyperplane in the SVM method is done by maximizing margins, which is maximizing the value of 1 β€–π’˜β€– which is the same as minimizing the value of β€–π’˜β€–2 can be formulated into the following optimization problem: [13] 𝑀,𝑏 arg π‘šπ‘–π‘› 1 2 ‖𝑀‖2 (10) 𝑠. 𝑑 𝑑𝑖(π’˜ 𝑇 π’™π’Š + 𝑏) βˆ’ 1 β‰₯ 0, 𝑖 = 1, … , 𝑁 (11) This problem will be more easily solved if it is changed into the Lagrange function (primal problem), thus the optimization problem can be changed to: 𝐿(π’˜, 𝑏, π‘Ž) = 1 2 β€–π’˜β€–2 βˆ’ βˆ‘ π‘Žπ‘– 𝑑𝑖(π’˜ 𝑇 π’™π’Š + 𝑏) 𝑁 𝑖=1 + βˆ‘ π‘Žπ‘– 𝑁 π‘–βˆ’1 (12) With the addition of the lagrange multiplier π‘Žπ‘– β‰₯ 0. The dual problem is obtained as follows: π‘Ž arg π‘šπ‘Žπ‘₯ 𝐿(π‘Ž) = βˆ‘ π‘Žπ‘– 𝑁 π‘–βˆ’1 βˆ’ 1 2 βˆ‘ βˆ‘ π‘Žπ‘– π‘‘π‘–π‘Žπ‘— 𝑑𝑗 (π’™π’Š 𝑻𝒙𝒋) 𝑁 𝑗=1 𝑁 𝑖=1 (13) 𝑠. 𝑑 βˆ‘ π‘Žπ‘– 𝑁 π‘–βˆ’1 𝑑𝑖 = 0, π‘Žπ‘– β‰₯ 0 (14) The optimization form above must meet the following The Karush-Kuhn- Tucker (KKT) conditions [13]: π‘Žπ‘– β‰₯ 0 (15) 𝑑𝑖 (π’˜ 𝑻𝒙𝑖 + 𝑏) βˆ’ 1 β‰₯ 0 (16) π‘Žπ‘– (𝑑𝑖(π’˜ 𝑻𝒙𝑖 + 𝑏) βˆ’ 1) = 0 (17) From the KKT conditions, the optimization problem solution (13) is (π‘Žβˆ—, π’˜βˆ—, π‘βˆ—) that satisfies π‘Žπ‘– βˆ—(𝑑𝑖(π’˜ βˆ—π‘»π’™π‘– + 𝑏 βˆ—) βˆ’ 1) = 0 . So if there is training Jurnal Matematika MANTIK Vol. 5, No. 2, October 2019, pp. 90-99 94 data that has the value π‘Žπ‘– > 0 then 𝑑𝑖 (π’˜ 𝑻𝒙𝑖 + 𝑏) = 1, it means that the data is a support vector while the rest have the value π‘Žπ‘– = 0 . Thus the resulting decision function is only influenced by support vectors. After the problem solution is found (π‘Žβˆ—, π’˜βˆ—, π‘βˆ—) then the class of the x test data can be determined based on value of the decision function: 𝑦(𝒙) = βˆ‘ π‘Žπ‘– βˆ— 𝑁𝑠 π‘–βˆ’1 π‘‘π‘–π’™π’Š 𝑻𝒙 + π‘βˆ— (18) where π’™π’Š is a support vector, Ns = number of support vector. b. SVM on Nonlinearly Separable Data For data that is not classified correctly, the SVM model must be modified by adding the slack variable πœ‰π‘– . Hyperplane search by adding slack variables is also called a hyperplane soft margin as follows: 𝑀,𝑏,πœ‰ arg π‘šπ‘–π‘› 1 2 ‖𝑀‖2 + 𝐢 (βˆ‘ πœ‰π‘– 𝑁 𝑖=1 ) (19) 𝑠. 𝑑 𝑑𝑖 (π’˜ 𝑇 π’™π’Š + 𝑏) β‰₯ 1 βˆ’ πœ‰π‘– , 𝑖 = 1, … , 𝑁 (20) πœ‰π‘– β‰₯ 0 , 𝑖 = 1, 2, … , 𝑛 (21) Where C is the parameter that determines the amount of penalty due to errors in classification and the value of C is not obtained in the learning process but must be determined before learning. In addition to adding slack variables to deal with data that is not linearly classified, it is also necessary to transform the data into higher dimensions with kernel functions so that it can be linearly separated at higher dimensions. Using the kernel trick, the π’™π’Š data will be mapped with the function πœ™(𝒙𝑖 ), and each product (𝒙𝑖 . 𝒙𝑗) will be calculated using πœ…(𝒙𝑖 , 𝒙𝑗). Thus the linear model used as a hyperplane is: 𝑦(𝒙) = π’˜π‘‡ πœ™(𝒙) + 𝑏 (22) The form of optimization issues from soft margin to: arg min π’˜, 𝑏, πœ‰ 1 2 β€–π’˜β€–2 + 𝐢 βˆ‘ πœ‰π‘– 𝑛 𝑖=1 (23) 𝑠. 𝑑 𝑑𝑖(π’˜ 𝑇 πœ™(𝒙𝑖) + 𝑏) β‰₯ 1 βˆ’ πœ‰π‘– , 𝑖 = 1, 2, … , 𝑁 (24) πœ‰π‘– β‰₯ 0 , 𝑖 = 1, 2, … , 𝑁 (25) By multiplying the Lagrange multiplier π‘Žπ‘– β‰₯ 0 and πœ‡π‘– β‰₯ 0 to the primal form of the optimization problem, so the Lagrange function is obtained as follows: 𝐿(π’˜, 𝑏, πœ‰, π‘Ž, πœ‡) = 1 2 β€–π’˜β€–2 + 𝐢 βˆ‘ πœ‰π‘– 𝑛 𝑖=1 + βˆ‘ π‘Žπ‘–{1 βˆ’ πœ‰π‘– βˆ’ 𝑑𝑖 (π’˜ 𝑇 πœ™(𝒙𝑖) + 𝑏)} 𝑛 𝑖=1 βˆ’ βˆ‘ πœ‡π‘– πœ‰π‘– 𝑛 𝑖=1 The dual problem form is obtained as follows: max π‘Ž βˆ‘ π‘Žπ‘– 𝑛 𝑖=1 βˆ’ 1 2 βˆ‘ βˆ‘ π‘Žπ‘– 𝑑𝑖 π‘Žπ‘— 𝑑𝑗 (πœ™(𝒙𝑖 ) 𝑇 πœ™(𝒙𝑗)) 𝑛 𝑗=1 𝑛 𝑖=1 (26) 𝑠. 𝑑 βˆ‘ π‘Žπ‘– 𝑑𝑖 𝑛 𝑖=1 = 0 P. K. Intan Comparison of Kernel Function on Support Vector Machine in Classification of Childbirth 95 0 ≀ π‘Žπ‘– ≀ 𝐢 , 𝑖 = 1, 2, … , 𝑛 The dual form in equation (3.35) satisfies the KKT conditions as follows [12]: π‘Žπ‘– β‰₯ 0 (27) πœ‡π‘– β‰₯ 0 (28) 𝑑𝑖(π’˜ π‘»πœ™(𝒙𝑖) + 𝑏) βˆ’ 1 + πœ‰π‘– β‰₯ 0 (29) πœ‰π‘– β‰₯ 0 (30) π‘Žπ‘– {𝑑𝑖 (π’˜ π‘»πœ™(𝒙𝑖 ) + 𝑏) βˆ’ 1 + πœ‰π‘– } = 0 (31) πœ‡π‘– πœ‰π‘– = 0 (32) From the KKT conditions, the optimization problem solution (26) is (π‘Žβˆ—, π’˜βˆ—, π‘βˆ—) that satisfies π‘Žπ‘– βˆ—(𝑑𝑖(π’˜ βˆ—π‘»πœ™(𝒙𝑖) + 𝑏 βˆ—) βˆ’ 1 + πœ‰π‘– ) = 0 and πœ‡π‘– πœ‰π‘– = 0. So if there is training data that has the value π‘Žπ‘– = 0, then πœ‡π‘– = 𝐢 > 0 and πœ‰π‘– = 0 results in 𝑑𝑖 (π’˜ π‘»πœ™(𝒙𝑖 ) + 𝑏) β‰₯ 0 then the data is correctly classified and not support vector. if there is training data that has the value 0 < π‘Žπ‘– < 𝐢, then πœ‡π‘– > 0 and πœ‰π‘– = 0 results in 𝑑𝑖 (π’˜ π‘»πœ™(𝒙𝑖 ) + 𝑏) β‰₯ 0 then the data is correctly classified and not support vector. Whereas if there is training data that has the value π‘Žπ‘– = 𝐢, then πœ‡π‘– = 0 and πœ‰π‘– > 0 results in πœ‰π‘– ≀ 1 or πœ‰π‘– > 1, so 𝑑𝑖(π’˜ π‘»πœ™(𝒙𝑖) + 𝑏) β‰₯ 0 then the data is not properly classified. After the problem solution is found (π‘Žβˆ—, π’˜βˆ—, π‘βˆ—) then the class of the x test data can be determined based on the value of the decision function: 𝑦(𝒙) = βˆ‘ π‘Žπ‘– βˆ—π‘‘π‘– πœ…(𝒙𝑖, 𝒙𝑗) 𝑁𝑠 𝑖=1 + π‘βˆ— (33) where π’™π’Š is a support vector, Ns = number of support vectors. 3. Methodology 3.1 Tools Used The main tools used for this analysis and study is python programming language, which is a free open source platform for machine learning. There are many packages available with standard implementations for various machine learning algorithms. The scikit-learn packages (reference) used in this study for the implementation of preprocessing, model selection and SVM classification with four kernel functions [14]. 3.2 Preprocessing Normalization or scaling of each feature data highly recommended before being processed by SVM, i.e. in the interval [-1, +1] [15]. Preprocessing is done to avoid the domination of features with large values over features with small values and to avoid numerical difficulties during the calculation process. 3.3 Model Selection In almost all data mining methods, there will be parameters that cannot be determined in the learning process. Determining the value of these parameters must be done before the learning process. Determination of these parameters is called model selection. Model selection aims to tune the hyperparameters of SVM classification (the penalty parameter C and any kernel parameters) in order to Jurnal Matematika MANTIK Vol. 5, No. 2, October 2019, pp. 90-99 96 achieve the lowest test error, i.e. the lowest probability of misclassification of unseen test examples [16]. In SVM, there two parameters that need to be determined parameters C and 𝛾. Rows of parameter pairs (𝐢, 𝛾) are recommended as candidates is 𝐢 = 2βˆ’5, 2βˆ’3, … , 215 dan 𝛾 = 215, 213, … , 23 [11]. Random search methods will be used in this study. According to Bergstra and Bengio (2012), random search method is a more efficient method than the grid search method, due to determine the value of the parameter that can produce the same accuracy value even better without having to try all possible parameter values in the range of a specified value [17]. 3.4 Classification The childbirth data will be classified using SVM with different kernel functions after obtaining the parameter values (𝐢, 𝛾) using the scikit learn package in python programming language. 3.5 Classification Evaluation a. Evaluation Procedure K - Fold Cross Validation is one procedure that is commonly used to estimate the performance of a model.[15] This product consists of 3 stages: - Divide data into k parts of the same size. - k-1 part is used as training data, and one part is used as testing data. - This process is done as k times as repetition for each different combination of testing data and training data so that the whole section will become testing data. The accuracy of each iteration is averaged to get an estimate of the final accuracy of the model. This study uses k = 5 or 5-fold cross-validation, so that for each experiment will use four subsets for training data, and one subset for data testing conducted five times trial for all possibilities. b. Unit of Evaluation Size Confusion matrix The confusion matrix is used to present the results of the K-Fold Cross Validation as follows:[18] Table 1. Confusion matrix actual Prediction -1 1 -1 True Negative (TN) False Positive (FP) 1 False Negative (FN) True Positive (TP) One unit of performance measurement based on confusion matrix is Accuracy Value which can be calculated by: π΄π‘π‘π‘’π‘Ÿπ‘Žπ‘π‘¦ = 𝑇𝑃 + 𝑇𝑁 𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁 (34) ROC Curve To visualize the comparison results of two or more classification models, you can use the ROC Curve (Receiver Operating Characteristics). The ROC curve P. K. Intan Comparison of Kernel Function on Support Vector Machine in Classification of Childbirth 97 is a two-dimensional graph with False Positive (FP) as a horizontal line and True Positive (TP) as a vertical line[18]. Point (0.1) states that a perfect classification of all positive and negative cases with no FP value or FP = 0 and a high TP value or TP = 1. Point (0,0) states that the classification predicts each case to be -1. Point (1,1) states that the classification that predicts each case becomes 1. To evaluate the classification can be seen from the AUC (Area Under the Curve) value. The accuracy level of the AUC (Area Under the Curve) value in the classification is divided into five groups expressed in Table 2:[18] Table 2. Accuracy Levels of AUC Value in Classifications AUC Interval Value Accuracy Level 0.90 – 1.00 excellent classification 0.80 – 0.89 good classification 0.70 – 0.79 fair classification 0.60 – 0.69 poor classification 0.50 – 0.59 failure classification 4. Experimental Analysis This experiment used 304 data childbirth which is divided into two classes; vaginal birth 104 data and caesarean section 200 data. The data consists of 10 features; age, hypertension history, glucose disease, first pregnancy, fetal position, parturition history, number of fetuses, hip size, another disease, and ruptured amniotic fluid history. The results of model selection we got the parameters value 𝐢 = 2048,0 and 𝛾 = 8,6316745750310983𝑒 βˆ’ 0.5, that will be used to whole kernel functions on support vector classification and specifically the polynomial kernel function will be used in 3rd degree. Figure 1. ROC Curve for various kernel functions Figure 1. shows the mean AUC from 5fold validation for SVM with four different kernel function. SVM-Linear, SVM-RBF, and SVM-Sigmoid are Jurnal Matematika MANTIK Vol. 5, No. 2, October 2019, pp. 90-99 98 categorized in good classification because the AUC values are at intervals of 0.80 - 0.89, but SVM-Polynomial is only categorized as a fair classification because its AUC value is 0.70. Based on the results of the AUC value, it can be seen that SVM-Linear is the best model because its AUC value is greater than the others. To confirm the conclusion that we need to see the results of the accuracy of the model. Accuracy value of each SVM model with different kernel functions can be seen below: Table 3. Accuracy value of each model No Model TN FP FN TP Accuracy Value 1 SVM-Linear 75 29 24 176 0,83 2 SVM-RBF 67 37 18 182 0,82 3 SVM-Polynomial 0 104 0 200 0,66 4 SVM-Sigmoid 78 26 43 157 0,77 Table 4. shows the whole results of the confusion matrix from each classification with a different kernel function. SVM-Polynomial fails to predict class -1 data properly because the result of TN is 0, but it succeeds in predicting class 1 data perfectly. Other models produce different class 1 and -1 data predictions so that different accuracy values are obtained. Based on the results of the accuracy value of each model, we can find out the best model for classifying the childbirth. SVM-Linear can produce an accuracy value of 0,83, and this result is greater than the SVM-RBF result of 0,82, SVM- Polynomial of 0,66, and SVM-Sigmoid of 0,77. It can be concluded that SVM- Linear method is the best model in classifying the childbirth. So the classification model that is most suitable for labour classification is SVM which uses linear kernel functions. 5. Conclusion Based on experiments that have been done to see the effect of kernel function selection on the SVM method on the accuracy of the analysis of the childbirth classification, it can be concluded that the SVM-linear method is the best model in classifying the childbirth. This can be seen from the accuracy value produced by the SVM-Linear method that is 0.83, and this value is higher than the accuracy value generated by SVM-RBF which is 0.82, SVM-Polynomial which is 0.66, and SVM-Sigmoid which is 0,77. In addition, the AUC value produced by SVM-Linear is 0.85 and is greater than other methods, which shows good classification. References [1] S. Liu et al., β€œMaternal mortality and severe morbidity associated with low- risk planned cesarean delivery versus planned vaginal delivery at term.,” CMAJ, vol. 176, no. 4, pp. 455–60, Feb. 2007. [2] M. Wagner, β€œChoosing caesarean section.,” Lancet (London, England), vol. 356, no. 9242, pp. 1677–80, Nov. 2000. [3] A. Amin, Muhammad & Ali, β€œPerformance Evaluation of Supervised P. K. Intan Comparison of Kernel Function on Support Vector Machine in Classification of Childbirth 99 Machine Learning Classifiers for Predicting Healthcare Operational Decisions,” 2018. [4] Q. Li, Q. Meng, J. Cai, H. Yoshino, and A. Mochida, β€œPredicting hourly cooling load in the building: A comparison of support vector machine and different artificial neural networks,” Energy Convers. Manag., vol. 50, no. 1, pp. 90–96, Jan. 2009. [5] Y. Shao and R. S. Lunetta, β€œComparison of support vector machine, neural network, and CART algorithms for the land-cover classification using limited training data points,” ISPRS J. Photogramm. Remote Sens., vol. 70, pp. 78–87, Jun. 2012. [6] E. Byvatov, U. Fechner, J. Sadowski, and G. Schneider, β€œComparison of Support Vector Machine and Artificial Neural Network Systems for Drug/Nondrug Classification,” J. Chem. Inf. Comput. Sci., vol. 43, no. 6, pp. 1882–1889, Nov. 2003. [7] Y. I. A. Rejani and S. T. Selvi, β€œEarly Detection of Breast Cancer using SVM Classifier Technique,” Dec. 2009. [8] D. Novitasari, β€œKlasifikasi Alzheimer dan Non Alzheimer Menggunakan Fuzzy C-Mean, Gray Level Co-Occurence Matrix dan Support Vector Machine”, mantik, vol. 4, no. 2, pp. 83-89, Oct. 2018. [9] S. Pahwa and D. Sinwar, β€œComparison Of Various Kernels Of Support Vector Machine,” Int. J. Res. Appl. Sci. Eng. Technol., vol. 3, no. VII, pp. 532–536, 2015. [10] John_Shawe-Taylor_&_Nello_Christianini, Kernel Methods For Pattern Analysis, vol. 111, no. 479. New York: Cambridge university Press, 1965. [11] C.-W. Hsu, C.-C. Chang, and C.-J. Lin, β€œA Practical Guide to Support Vector Classification,” 2003. [12] B. E. Boser, I. M. Guyon, and V. N. Vapnik, β€œTraining algorithm for optimal margin classifiers,” Proc. Fifth Annu. ACM Work. Comput. Learn. Theory, pp. 144–152, 1992. [13] Christopher M Bishop, Pattern Recognition and Machine Learning (Information Science and Statistics). Heidelberg: Springer-Verlag Berlin, 2006. [14] β€œ1.4. Support Vector Machines β€” scikit-learn 0.21.3 documentation.” [Online]. Available: https://scikit-learn.org/stable/modules/svm.html#svm- classification. [Accessed: 29-Aug-2019]. [15] F. Pedregosa Fabianpedregosa et al., β€œScikit-learn: Machine Learning in Python GaΓ«l Varoquaux Bertrand Thirion Vincent Dubourg Alexandre Passos Pedregosa, Varoquaux, Gramfort Et Al. Matthieu Perrot,” J. Mach. Learn. Res., vol. 12, pp. 2825–2830, 2011. [16] C. Gold and P. Sollich, β€œModel selection for support vector machine classification,” Neurocomputing, vol. 55, no. 1–2, pp. 221–249, 2003. [17] J. Bergstra and Y. Bengio, β€œRandom Search for Hyper-Parameter Optimization,” J. Mach. Learn. Res., vol. 13, no. Feb, pp. 281–305, 2012. [18] F. Gorunescu, Dana Mining: Concepts, Models, and Techniques. New York: Springer US, 2011.