JURNAL RISET INFORMATIKA Vol. 5, No. 3. June 2023 P-ISSN: 2656-1743 |E-ISSN: 2656-1735 DOI: https://doi.org/10.34288/jri.v5i3.511 Accredited rank 4 (SINTA 4), excerpts from the decision of the DITJEN DIKTIRISTEK No. 230/E/KPT/2023 269 K-MEANS BINARY SEARCH CENTROID WITH DYNAMIC CLUSTER FOR JAVA ISLAND HEALTH CLUSTERING Muhammad Andryan Wahyu Saputra-1*, Muhammad Faisal-2, Ririen Kusumawati-3 Magister Informatics UIN Maulana Malik Ibrahim Malang, Indonesia 210605210010@student.uin-malang.ac.id1, mfaisal@ti.uin-malang.ac.id2, ri2n.kusumawati@gmail.com3 (*) Corresponding Author Abstract This study is focused on determining the health status of each district/city in Java using the K-means Binary Search Centroid and Dynamic Kmeans algorithms. The research data uses data on the health profile of Java Island in 2020. Comparative algorithms were tested using the Davies Bound Index and Calinski-Harabasz Index methods on the traditional k-means algorithm and dynamic binary search centroid k-means. Based on the test, 5 clusters were found in the distribution area, including 11 regions with very high health quality cluster 1, 24 regions with high health quality, 28 regions with moderate health quality, and 28 clusters 4 with low health quality, 45 regions, and cluster 5 with poor health quality is 11 regions, with the best validation value of DBI 1.8175 and CHI 67.7868. Overall optimization of the dynamic k-means algorithm based on binary search centroid results in a better average cluster quality and a smaller number of iterations than the traditional k-means algorithm. The test results can be used as one of the best methods in evaluating the level of health in the Java Island area and a reference for decision-making in determining policies for related agencies. Keywords: Clustering; Binary Search Centroid; Dynamic K-Means; Java Island Health Profile Abstrak Penelitian ini difokuskan untuk menentukan derajat kesehatan setiap kabupaten/kota dalam Pulau Jawa menggunakan algoritma K-means Binary Search Centroid (KBSC) dan Dynamic Kmeans (DK). Data penelitian menggunakan data profil kesehatan Pulau Jawa tahun 2020. Perbandingan algoritma terbaik diuji tingkat validitasnya menggunakan metode Davies Bound Index (DBI) dan Calinski-Harabasz Index (CHI) pada algoritma k-means tradisional dan k-means dinamis binary search centroid. Berdasarkan pengujian diperoleh 5 jumlah klaster di wilayah sebaran, di antaranya klaster 1 dengan kualitas kesehatan sangat tinggi adalah 11 kabupaten/kota, klaster 2 dengan kualitas kesehatan tinggi adalah 24 kabupaten/kota, dan klaster 3 dengan kualitas kesehatan cukup adalah 28 kabupaten/kota. dan kualitas kesehatan 45 kabupaten/kota untuk klaster 4 dengan kualitas kesehatan rendah serta klaster 5 dengan kualitas kesehatan sangat rendah adalah 11 kabupaten/kota, dengan nilai validasi terbaik DBI 1.8175 dan CHI 67.7868. Secara keseluruhan optimasi algoritma k-means dinamis berbasis binary search centroid menghasilkan rata-rata kualitas klaster yang lebih baik dan jumlah iterasi yang lebih kecil dibanding algoritma k-means tradisional. Hasil pengujian dapat digunakan sebagai salah satu metode terbaik dalam evaluasi tingkat kesehatan yang ada di wilayah khususnya Pulau Jawa serta acuan pengambilan keputusan dalam menentukan kebijakan bagi instansi terkait. Kata kunci: Klastering; Binary Search Centroid; Dynamic K-Means; Profil Kesehatan Pulau Jawa INTRODUCTION Indicators of the success of a country's development can be seen from the level of achievement of the country in providing health insurance. In Indonesia, the government has set several indicators through the Ministry of Health that become a benchmark for the progress of health development at all levels of Indonesian territorial units (from the province, district, to sub-district). The Ministry of Health collects and processes population health data annually through health offices across various regions to produce provincial and district/city health rankings. These P-ISSN: 2656-1743 | E-ISSN: 2656-1735 DOI: https://doi.org/10.34288/jri.v5i3.511 JURNAL RISET INFORMATIKA Vol. 5, No. 3 June 2023 Accredited rank 4 (SINTA 4), excerpts from the decision of the DITJEN DIKTIRISTEK No. 230/E/KPT/2023 270 processed results and rankings become essential references. For example, local governments (Pemda) formulate intervention plans that are more targeted as promotional materials so that local governments are more motivated to improve their regional health rankings, as the basis for the central government to formulate seriousness or especially for health problem areas (DBKBK), as the basis for determining the allocation of central government health funding to the regions also serves to support the Ministry of Vulnerable Regions (KMPDT) in regional/urban development (L. Suryani, 2021). Regional clustering is carried out to determine the region's health level, especially Java Island. The Central Bureau of Statistics notes that Java Island has 29 regencies and nine cities. So far, the grouping of environmental health indicator data is still based on manual computing techniques, where the calculations still have several problems, especially in data consistency. In terms of improving the health services of the Health Office, a system is needed that groups healthy areas based on environmental health data so that counselling, services, and assistance can be more accurate and on target. At the time of the condition where the regional grouping data do not want to be determined as needed, but want to produce segmentation from the distribution of the data, then to determine the assumption of the number of clusters it is not expected at the beginning, but the assumption of the number of clusters that will be formed must be sought. To handle this condition, there is the development of the K-means algorithm, namely the Dynamic Kmeans (DK) algorithm, which has a process to find the number of clusters without having to assume the assumption of the number of clusters (H. Santoso et al., 2022). In the research, Widiarina and Romi proposed a dynamic cluster algorithm on the K-means algorithm to determine the number of clusters (k) to produce optimal cluster quality and provide better and more precise potential segmentation results (W. Widiarina & R. S. Wahono, 2015). However, this algorithm has some of the same shortcomings as the K-means algorithm, namely the problem of determining the centroid point in the clustering process, which is still randomly selected. Based on research from Yugar Kumar and G. Sahoo, the problem of determining the centroid point is done by developing K-means into K-means Binary Search Centroid (KBSC), which has a process in determining the centroid point using a Binary Search technique approach (Akbari Gumilar & Yusrila Kerlooza, 2018). The results of this study prove that the K-means Binary Search Centroid (KBSC) algorithm has better intra- and inter-cluster values than the K-means algorithm. However, the K-means Binary Search Centroid (KBSC) Algorithm has limitations in determining the number of clusters to form. Therefore, it is proposed that combining the Dynamic Kmeans algorithm with the K-means Binary Search Centroid algorithm can complement the clustering process in the area of assuming the number of clusters and determining the cluster centroid point, to produce the best Davies-Bouldin Index value for case studies of regional clustering based on profiles. Health. RESEARCH METHODS In carrying out research, a research method is needed to increase the possibility of a research activity running smoothly. The research method acts as a framework for carrying out research so that by following this framework, research can run systematically, and it is hoped that research can be more effective and in a reasonable period (C. Kamila et al., 2021). The following is the framework of this research shown in Figure 1. Figure 1. Research Flow Design JURNAL RISET INFORMATIKA Vol. 5, No. 3. June 2023 P-ISSN: 2656-1743 |E-ISSN: 2656-1735 DOI: https://doi.org/10.34288/jri.v5i3.511 Accredited rank 4 (SINTA 4), excerpts from the decision of the DITJEN DIKTIRISTEK No. 230/E/KPT/2023 271 A. Data Collection Stage Data collection for this study cites publications from the Central Statistics Agency and the Java Island Health Service. This study uses data from 2020. This study will use datasets from 29 districts and nine cities on the island of Java. The variables used in this study are (Dinkes & BPS Pulau Jawa, 2020) :  Life expectancy  Health Center Ratio  Hospital Ratio  Clinic Ratio  Percentage of Eligible Sanitation RT  Percentage of Drinking Water Facilities  Percentage of Babies with LBW  Complete Basic Immunization Percentage  Percentage of Babies Who Get Exclusive Breastfeeding  Percentage of Toddlers with Malnutrition  Percentage of Tuberculosis Treatment Success  Diarrhea Pain Rate (per 1k population)  Covid-19 Healing Rate  Covid-19 Death Rate  Percentage of the Population Who Smokes  Percentage of Services for Diabetes Mellitus Patients  Percentage of Public Places Meet Health Requirements  B. Data Pre-processing Stage This stage consists of a data selection stage: selecting data from a set of operational data for extracting information, such as summarizing data for each city for each aspect. Then perform data cleaning, namely data containing data, removing data duplication, checking inconsistent data, and correcting errors in data. After that, normalize the data, which is the process of making several variables have the same value range, none of which is too big or too small so that it can make statistical analysis easier (C. Satria and A. Anggrawan, 2021). C. Clustering Stage In this study, we will combine the DK and KBSC algorithms with the advantages of each algorithm to find the best number of clusters and determine the initial centroid point for each formation of the number of clusters. The combination of the two algorithms is shown in Figure 2. Figure 2. The Flow of Merging the Two Algorithms Figure 2 describes the merging flow of the two algorithms. Flow chart. This centroid point will use a data search technique with a Binary Search approach (K. Ariasa, 2020). The process will divide the data area based on the number of clusters entered. The following is an explanation of the flowchart flow: 1. Enter the number of clusters as much as k (Supriyatna et al., 2020). 2. Calculate the maximum and minimum values in the data for each data attribute. 3. Calculate the range between the centroid points, calculate the range between the centroid points in data in the following way: 𝑀 = max(𝑎𝑖)−𝑚𝑖𝑛(𝑎𝑖) 𝑘 ............................................................ (1) 4. Based on formula (1), it is used to calculate a value for the variable M which is the specific distance between the data centroid points before giving the results of the data centroid points. 𝑚𝑖𝑛(𝑎𝑖 ) the minimum value of each data attribute, 𝑚𝑎𝑥(𝑎𝑖 ) is the maximum value of each data attribute, and k is the number of clusters to be formed. 5. Determine the data centroid point. To generate the centroid point, use the following formula: 𝐶𝑘 = 𝑚𝑖𝑛(𝑎𝑖 ) + (𝑘 − 1)𝑀 (2) P-ISSN: 2656-1743 | E-ISSN: 2656-1735 DOI: https://doi.org/10.34288/jri.v5i3.511 JURNAL RISET INFORMATIKA Vol. 5, No. 3 June 2023 Accredited rank 4 (SINTA 4), excerpts from the decision of the DITJEN DIKTIRISTEK No. 230/E/KPT/2023 272 6. Based on formula (2), CK K is the centroid for cluster k, 𝑚𝑖𝑛(𝑎𝑖 ) the minimum data value, and M range between data centroid points. 7. Group all data or objects with similarities by calculating the distance to the nearest centroid point. This distance is defined as the similarity of a data object with a predetermined centroid point. The proximity of two objects is determined based on the calculation of the distance between the objects. The distance theory can be used to calculate the distance of all data to each cluster centroid point. Euclidean Distance coordinates are formulated as follows (Bagus Muhammad et al., 2021) : 𝐷(𝑖, 𝑗) = √(𝑥1𝑖 − 𝑥1𝑗 ) 2 + (𝑥 − 𝑥2𝑗 ) 2 + ⋯ + (𝑥𝑛𝑖 − 𝑥𝑛𝑗 ) 2 ...... (3) 8. Equation (3) explains that 𝐷(𝑖, 𝑗) is the distance of the data to the center of clusters, then 𝑥𝑛𝑖 is the data to the nth attribute of data and 𝑥𝑛𝑗 is the center point or centroid of the Ken attribute. In the K-means dynamic cluster algorithm, intercluster is the minimum distance between the centroid points or the cluster's center with other cluster centers. This Inter is usually used to measure the minimum level of similarity in other clusters or the separation between clusters shown in the equation (C. Sreedhar et al., 2017). 𝑖𝑛𝑡𝑒𝑟 = 𝑚𝑖𝑛(∑𝑁−1𝑖=0,𝑗=0 )𝑑𝑖𝑠𝑡 (𝑥𝑘 , 𝑥𝑗 ) ∀𝑘 = 1,2, … , 𝑘 − 1 𝑎𝑛𝑑 𝑘 = 𝑘 + 1, … , 𝑘𝑛 ............................ (4) 9. Equation (4) explains that 𝑥𝑘 is the distance to the center of the previous cluster, while 𝑥𝑗 is the distance to the center of the next cluster, and k is the distance to the cluster's center. Next, in the dynamic cluster algorithm on K-means, there is also an intra-term: the distance between the data and the centroid point in a cluster. Intra is used to measure a cluster's maximum similarity level, which is indicated by equation (5) (R. M. Alguliyev et al., 2019). √ 1 𝑛−1 ∑ (𝑥𝑖 , 𝑥𝑚 ) 2𝑛 𝑖−1 .......................................................... (5) 10. Equation (5) explains that 𝑛 is the amount of data, 𝑥𝑖 is data, and 𝑥𝑚 is the centroid D. Analysis Stage The analysis was carried out since the determination of the data, the pre-processing process, and the implementation of the algorithm calculations. The evaluation of clustering was tested for validity using several methods, namely Cluster Variance (V), Davies Bound Index (DBI), and Calinski-Harabasz Index (CHI). The equation used by the Davies-Bouldin Index will produce the most optimum value, namely the smallest value produced by equations (6) and (7) (Y. A. Wijaya et al., 2021). 𝐷𝐵𝐼 = 1 𝑘 ∑ 𝑚𝑎𝑥𝑗≠1{𝐷𝑖,𝑗 } 𝑘 𝑖=1 .......................................... (6) Di,j = d̅i+d̅j di,j ........................................................................... (7) Based on equations (6) and (7), DBI is the value of the Davies-Bouldin Index, k is the number of clusters, 𝐷𝑖,𝑗 is the value of the distance between clusters 𝑖 and 𝑗. 𝑑𝑖 and 𝑑j are the average distance between the data in the temporary cluster 𝑖 and the cluster 𝑗. The smaller the Davies Bouldin Index (DBI) value obtained (non-negative >= 0), the better the cluster obtained from grouping using the clustering algorithm. The Calinski-Harabasz (CH) validity index calculates the comparison between the Sum of Squares. Values between clusters (SSB) as separation and the Sum of Square within-cluster (SSW) value as compactness multiplied by the normalization factor, which is the difference between the number of data and the number of clusters divided by the number of clusters minus one. The more excellent CH value indicates the number of the best clusters [(Luthfi et al., 2021). Suppose there is a data set with k clusters and N points data. For example 𝐶𝑖 is the l-th cluster where 𝑥𝑖 is the i-th point in the l-th cluster, 𝑁𝑙 is the number of points in the l-cluster, and �̅�𝑖 If the center point of the l-cluster, then the calculation of the CH validity index can be seen in the formula (8), (9), and (10) (A. F. Khairati et al., 2019). 𝑆𝑆𝑊 = ∑ ∑ (𝑥𝑖 − �̅�𝑖 )(𝑥𝑖 − �̅�𝑖 ) 𝑇 𝑥𝑖∈𝐶𝑖 𝑘 𝑖=1 ................... (8) 𝑆𝑆𝐵 = ∑ 𝑁𝑖 (𝑥𝑖 − �̅�𝑖 )(𝑥𝑖 − �̅�𝑖 ) 𝑇𝑘 𝑖=1 ............................ (9) 𝐶𝐻 = 𝑡𝑟𝑎𝑐𝑒 (𝑆𝑆𝐵) 𝑡𝑟𝑎𝑐𝑒 (𝑆𝑆𝑊) × 𝑁−𝑘 𝑘−1 .............................................. (10) To determine the evaluation of the model used, several experimental scenarios were carried out to determine the most suitable and accurate algorithm for evaluating algorithm performance. After all cluster results are formed, the algorithms are compared, and conclusions are drawn about which algorithms work optimally, the best algorithm accuracy, and the most optimal number of clusters based on the test data criteria. JURNAL RISET INFORMATIKA Vol. 5, No. 3. June 2023 P-ISSN: 2656-1743 |E-ISSN: 2656-1735 DOI: https://doi.org/10.34288/jri.v5i3.511 Accredited rank 4 (SINTA 4), excerpts from the decision of the DITJEN DIKTIRISTEK No. 230/E/KPT/2023 273 RESULTS AND DISCUSSION A. Result In knowing the comparison of algorithm performance, the test scenarios are carried out alternately. The traditional k-means test was carried out with clusters ranging from 3 to 6. Meanwhile, the Dynamic K-means algorithm with Binary Search Centroid uses the results of the ideal cluster number based on intra and inter-cluster calculations of each dynamic cluster algorithm. The assessment of the final evaluation results is determined by the acquisition of experiments that have been carried out previously, then comparing 3 to 6 the number of clusters formed. Figure 3 and Figure 4 are graphs evaluating the DBI and CHI values from testing the two algorithms. Figure 3. Comparison of the average DBI values of the two algorithms Overall the results of the DBI evaluation obtained the best value on the dynamic k-means algorithm with a DBI value of 1.81748915. The addition of the number of clusters affects the results of the DBI evaluation, where the decrease in the value of DBI on the dynamic cluster algorithm is better. It is known that the optimal cluster results with the smallest DBI value or close to the value (non-negative >= 0) indicate the cluster is getting better (Siagian et al., 2022). Figure 4. Comparison of the average Calinski- Harabasz Index (CHI) values of the two algorithms In the evaluation test using the Calinski- Harabasz Index (CHI), the Dynamic K-means algorithm with Binary Search Centroid performs better than the traditional K-means algorithm. This can be seen from the acquisition of the CHI value with the best number of clusters indicated by the more excellent CHI value (Harani et al., 2020). Based on Figure 4, the Calinski-Harabasz Index (CHI) evaluation value, the traditional k-means algorithm, is unstable for each number of clusters formed. Based on the evaluation of DBI and CHI, it is clear that the change in the value of each algorithm depends on the initiation of the centroid in determining the initial cluster used. The Dynamic K- means algorithm with Binary Search Centroid has the advantage of constant iterations in each test. So that every time the data is grouped, the Dynamic K- means method with Binary Search Centroid always produces the same cluster. Initialization of the initial cluster center point value randomly causes the traditional k-means algorithm to obtain different validation values and the number of clusters in each test, making it very difficult to obtain individual initial cluster results. The test results do not necessarily represent a good cluster. As a result, the obtained centroid is not a centroid that is dominated by the grouping of health indicators that have more similarities. The iteration process after the initial cluster generation will produce an inaccurate centroid. Determining initial cluster initiation dramatically affects the number of iterations of the clustering algorithm, and the estimated time is not constant. Based on the test results, the Dynamic K-means algorithm with Binary Search Centroid produces better cluster quality than traditional K-means. To find out the best number of clusters in the dynamic kmeans algorithm with cluster initialization, testing is carried out on the level of validity of each number of clusters. Table 1 is a comparison of each evaluation in each number of clusters. Table 1. Validation value of Dynamic K-means cluster k-means with Binary Search Centroid Number of Clusters VW VB DBI CHI 3 3.7282 2.8909 2.8852 44.9747 4 3.6566 4.2032 2.0006 62.6688 5 3.4085 4.0360 1.8174 67.7868 6 3.7151 4.5564 2.1475 69.2945 Based on Table 1, the VW value of the number of 4 clusters is smaller than the number of clusters 3, while the VB value of the number of clusters 4 is greater than the VB value of the number of clusters 3 4 5 6 Traditional K-means 43,45363553 52,86908894 59,75413653 63,48475436 Dynamic K-means algorithm with Binary Search Centroid 44,97477344 62,66889519 67,78687766 69,2945078 0 10 20 30 40 50 60 70 80 C H I V a lu e 3 4 5 6 Traditional K-means 2,636498 2,329357 2,080675 2,334418 Dynamic K-means algorithm with Binary Search Centroid 2,88525484 2,00064044 1,81748915 2,14750858 0 0,5 1 1,5 2 2,5 3 3,5 D B I V a lu e P-ISSN: 2656-1743 | E-ISSN: 2656-1735 DOI: https://doi.org/10.34288/jri.v5i3.511 JURNAL RISET INFORMATIKA Vol. 5, No. 3 June 2023 Accredited rank 4 (SINTA 4), excerpts from the decision of the DITJEN DIKTIRISTEK No. 230/E/KPT/2023 274 3. This causes the DK-KBSC algorithm to add one more cluster to cluster 5 according to the results of the cluster evaluation. When viewed from the DBI and CHI values, the Dynamic K-means algorithm with Binary Search Centroid has several clusters 5, which is better than the number of clusters 3, 4, and 6. B. Analysis Based on the results of clustering of Java Island health profile data in 2020 using the dynamic k- means algorithm with mean-based initial cluster initialization, the results obtained are 5 clusters in the distribution area, of which cluster 1 with very high health quality is 11 districts/cities, cluster 2 with High health quality is 28 regencies/cities, and cluster 3 with adequate health quality is 24 regencies or cities. Moreover, the health quality of 45 districts/cities for Cluster 4 with low health quality and cluster 5 with poor health quality in 11 districts or cities. The percentage distribution of clusters is shown in Figure 5. Figure 5. Percentage of Dynamic K-means Clustering Results with Binary Search Centroid The result of the Dynamic K-means clustering area distribution with Binary Search Centroid from health profile data is shown in Table 2. The clustering results can then be used as a reference to determine the distribution of regional health levels and as evaluation material by the relevant agencies. Table 2. Clustering Results Cluster Regency/City Cluster 1 Sukabumi City Cirebon City Tangerang City Cilegon City South Tangerang City Central Jakarta City North Jakarta City West Jakarta City South Jakarta City East Jakarta City Seribu Islands Regency Cluster 2 Pacitan Regency Bogor Regency Sukabumi Regency Cianjur Regency Bandung Regency Garut Regency Tasikmalaya Regency Ciamis Regency Kuningan Regency Cirebon Regency Majalengka Regency Sumedang Regency Indramayu Regency Subang Regency Purwakarta Regency Karawang Regency Pangandaran Regency Bogor City Bandung City Bekasi City Depok City Cimahi City Tasikmalaya City Lebak Regency Pandeglang Regency Serang Regency Tangerang Regency Serang City Cluster 3 Kediri Regency Lumajang Regency Jember Regency Banyuwangi Regency Bondowoso Regency Situbondo Regency Probolinggo Regency Pasuruan Regency Jombang Regency Nganjuk Regency Madiun Regency Tuban Regency Bangkalan Regency Sampang Regency Pamekasan Regency Kediri City Malang city Probolinggo City Pasuruan City Grobogan Regency Rembang Regency Pekalongan Regency Brebes Regency Banjar City Cluster 4 Ponorogo Regency Boyolali Regency 9% 24% 20% 38% 9% Cluster Percentage Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5 JURNAL RISET INFORMATIKA Vol. 5, No. 3. June 2023 P-ISSN: 2656-1743 |E-ISSN: 2656-1735 DOI: https://doi.org/10.34288/jri.v5i3.511 Accredited rank 4 (SINTA 4), excerpts from the decision of the DITJEN DIKTIRISTEK No. 230/E/KPT/2023 275 Cluster Regency/City Trenggalek Regency Tulungagung Regency Blitar Regency Malang Regency Sidoarjo Regency Mojokerto Regency Magetan Regency Ngawi Regency Bojonegoro Regency Lamongan Regency Gresik Regency Sumenep Regency Surabaya City Batu City Cilacap Regency Banyumas Regency Purbalingga Regency Banjarnegara Regency Kebumen Regency Purworejo Regency Wonosobo Regency Magelang Regency Klaten Regency Sukoharjo Regency Wonogiri Regency Karanganyar Regency Sragen Regency Blora Regency Pati Regency Holy Regency Jepara Regency Demak Regency Semarang Regency Temanggung Regency Kendal Regency Batang Regency Pemalang Regency Tegal Regency Surakarta City Salatiga City Semarang city Pekalongan City Tegal City Cluster 5 Blitar City Mojokerto City Madiun City Magelang City Kuningan Regency Bekasi Regency Kulon Progo Regency Bantul Regency Gunung Kidul Regency Sleman Regency Yogyakarta City CONCLUSIONS The dynamic cluster algorithm on kmeans with binary search centroid initiation produces a better average cluster quality than the traditional kmeans algorithm in evaluating the Davies Bound Index (DBI) and Calinski-Harabasz Index (CHI). The algorithm's success can be seen from the resulting 5 clusters with the smallest VW value of 3.408591352 compared to the traditional k-means algorithm. The advantage of the dynamic k-means algorithm with binary search centroid initiation is its constant number of iterations in each test. So that every time the data is grouped, the dynamic k- means method with binary search centroid initiation always produces the same cluster. Meanwhile, the traditional k-means algorithm obtains different validation values and the number of clusters in each test because the initialization of the initial cluster center point value is carried out randomly, so obtaining individual initial cluster results and a small number of iterations is challenging. The number of clusters affects the DBI and CHI values. If the number of clusters increases, the CHI value will increase. Evaluation of the validity of the Calinski-Harabasz Index (CHI) the more excellent CHI value indicates the best value. The more the number of clusters and the data used, the better the quality of the cluster or grouping. Adding the number of clusters affects the results of the DBI evaluation in a better direction. Although other algorithms have improved clustering results, the decrease in DBI in the traditional k-means algorithm has not been significant. REFERENCES A. F. Khairati, A. A. Adlina, G. F. Hertono, and B. D. Handari, “Kajian Indeks Validitas pada Algoritma K-Means Enhanced dan K-Means MMCA,” PRISMA, Prosiding Seminar Nasional Matematika, vol. 2, pp. 161–170, 2019. Akbari Gumilar and Yusrila Kerlooza. “Peningkatan Hasil Cluster Menggunakan Algoritma Dynamic K-means dan K-means Binary Search Centroid.” Jurnal Tata Kelola dan Kerangka Kerja Teknologi Informasi, vol. 4, no. 2, pp. 25-33, 2018. Badan Pusat Statistik Pulau Jawa, “Statistik Kesehatan Pulau Jawa 2020”, Available: https://jatim.bps.go.id/publication/2021/08 /05/a70cbc1ca224552d5e0f5000/statistik- kesehatan-provinsi-jawa-timur-2020.html. [Online]. [Accessed: 05-August-2022]. C. Kamila, M. Adiyatma, G. R. Namang, and R. R. F. P-ISSN: 2656-1743 | E-ISSN: 2656-1735 DOI: https://doi.org/10.34288/jri.v5i3.511 JURNAL RISET INFORMATIKA Vol. 5, No. 3 June 2023 Accredited rank 4 (SINTA 4), excerpts from the decision of the DITJEN DIKTIRISTEK No. 230/E/KPT/2023 276 Syah, “Systematic Literature Review: Penggunaan Algoritma K-Means untuk Clustering di Indonesia dalam Bidang Pendidikan,” Informatika dan Teknologi (Intech), vol. 2, no. 1, pp. 19–24, 2021. C. Satria and A. Anggrawan, “Aplikasi K-Means Berbasis Web untuk Klasifikasi Kelas Unggulan,” MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer, vol. 21, no. 1, pp. 111 – 124, nov 2021. C. Sreedhar, N. Kasiviswanath, and P. Chenna Reddy, “Clustering Large Datasets Using K- Means Modified Inter and Intra Clustering (KM-I2C) in Hadoop,” Journal of Big Data, vol. 4, no. 27, pp. 1 – 19, 2017. Dinas Kesehatan Pulau Jawa, “Profil Kesehatan Pulau Jawa 2020”, Available: https://dinkes.jatimprov.go.id/userfile/doku men/PROFIL%20KESEHATAN%202020.pdf. [Online]. [Accessed: 05-August-2022]. H. Santoso, H. Magdalena, H. Wardhana, and I. Artikel, “Aplikasi Dynamic Cluster pada K- Means Berbasis Web untuk Klasifikasi Data Industri Rumahan,” MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer, vol. 21, no. 3, pp. 541–554, 2022. Harani, Nisa Hanum, Cahyo Prianto, and Fikri Aldi Nugraha. "Segmentasi Pelanggan Produk Digital Service Indihome Menggunakan Algoritma K-Means Berbasis Python." Jurnal Manajemen Informatika (JAMIKA), Vol. 10 No. 2, pp. 133-146, 2020. Islami, Bagus Muhammad, Cepy Sukmayadi, and Tesa Nur Padilah. “Clustering of Health Facilities Based on Districts in Karawang With the K-Means Algorithm.” BINA INSANI ICT JOURNAL, Vol. 8, No. 1, pp. 83-92, 2021. K. Ariasa, I. G. A. Gunadi, and I. M. Candiasa, “Optimasi Algoritma Klaster Dinamis pada K- Means dalam Pengelompokkan Kinerja Akademik Mahasiswa (Studi Kasus: Universitas Pendidikan Ganesha),” Jurnal Nasional Pendidikan Teknik Informatika : JANAPATI, vol. 9, no. 2, pp. 181–193, 2020. L. Suryani, “Evaluasi Sistem Informasi Kesehatan dengan Pendekatan Health Metrics Network (HMN) di Dinas Kesehatan Kota Pagar Alam Tahun 2021,” J. Kesehat. Saelmakers PERDANA, vol. 5, pp. 97–103, 2022. Luthfi, Emir, and Arie Wahyu Wijayanto. “Analisis perbandingan metode hirearchical, k-means, dan k-medoids clustering dalam pengelompokkan indeks pembangunan manusia Indonesia.” INOVASI, Vol. 17, No. 4, pp. 761-773, 2021. R. M. Alguliyev, R. M. Aliguliyev, and F. J. Abdullayeva, “PSO+K-Means Algorithm for Anomaly Detection in Big Data,”Statistics, Optimization and Information Computing, vol. 7, no. 2, pp. 348–359, 2019. Siagian, Romadansyah, Pahala Sirait, and Arwin Halim. “The Implementation of K-Means dan K-Medoids Algorithm for Customer Segmentation on E-commerce Data Transactions.” Sistemasi: Jurnal Sistem Informasi, Vol. 11 No.2, pp. 260-270, 2022. Supriyatna, Adi, et al. Rice productivity analysis by province using K-means cluster algorithm. In: IOP Conference Series: Materials Science and Engineering. IOP Publishing, Vol. 771, No. 1, pp. 012025, 2020. W. Widiarina and R. S. Wahono, “Algoritma Cluster Dinamik untuk Optimasi Cluster pada Algoritma K-Means dalam Pemetaan Nasabah Potensial,” Journal of Intelligent Systems, vol. 1, no. 1, pp. 33–36, 2015. Y. A. Wijaya, D. A. Kurniady, E. Setyanto, W. S. Tarihoran, D. Rusmana, and R. Rahim, “Davies Bouldin Index Algorithm for Optimizing Clustering Case Studies Mapping School Facilities,” TEM Journal, vol. 10, no. 3, pp. 1099–1103, 2021.