Vol. 4, No.2 July 2023 | 76 Agglomerative Clustering of 2022 Earthquakes in North Sulawesi, Indonesia Berton Maruli Siahaan1, Afrioni Roma Rio2 1,2Department of Physics, Faculty of Mathematic and Natural Science, Sam Ratulangi University Kampus UNSRAT, Bahu, Manado, North Sulawesi, INDONESIA 1bertonsiahaan@unsrat.ac.id, 2afrioni@unsrat.ac.id Abstract This paper presents a cluster analysis of earthquake data in the surrounding region of North Sulawesi, Indonesia. The dataset comprises seismic data recorded throughout the year 2022, obtained from the BMKG earthquake repository. A total of 211 earthquakes were included in the analysis, with a minimum magnitude threshold of 2.5 and a maximum depth of 300 km. The agglomerative clustering technique, combined with the elbow method, was employed to determine the optimal and distinct number of clusters. As a result, four unique clusters were identified. Cluster 1 exhibited high magnitudes, with an average magnitude of 4.4, and shallow depths, averaging at 20 km. Cluster 2 also had high magnitudes, averaging at 4.4, but deeper depths, with an average of 199 km. Cluster 3 consisted of earthquakes with low magnitudes, averaging at 3.4, and shallow depths, averaging at 21 km. Lastly, Cluster 4 comprised earthquakes with low magnitudes, averaging at 3.4, but deeper depths, with an average of 136 km. Among the 211 earthquakes, 29 were assigned to Cluster 1, 39 to Cluster 2, 100 to Cluster 3, which had the highest population, and 43 to Cluster 4. This study provides valuable insights into the clustering patterns and characteristics of earthquakes in the region, contributing to a better understanding of seismic activity in North Sulawesi, Indonesia. Keywords: Earthquake clustering; earthquakes data, machine learning; agglomerative clustering Abstrak Artikel ini membahas analisis klasterisasi pada data gempa bumi di sekitar Sulawesi Utara, Indonesia. Data yang digunakan adalah data gempa bumi yang terjadi selama tahun 2022 yang diperoleh dari repositori gempa BMKG. Terdapat 211 gempa bumi yang dianalisis dengan batas magnitudo terendah 2,5 dan kedalaman maksimum 300 km. Dalam penelitian ini, digunakan teknik klasterisasi aglomeratif dan metode elbow untuk menentukan jumlah klaster yang optimal dan unik. Hasilnya, ditemukan empat klaster yang unik. Klaster 1 memiliki gempa bumi dengan magnitudo tinggi rata-rata sebesar 4,4 dan kedalaman dangkal rata-rata sebesar 20 km. Klaster 2 juga memiliki gempa bumi dengan magnitudo tinggi rata-rata 4,4, namun kedalaman lebih dalam rata-rata sebesar 199 km. Klaster 3 terdiri dari gempa bumi dengan magnitudo rendah rata-rata 3,4 dan kedalaman dangkal rata-rata sebesar 21 km. Klaster 4 terdiri dari gempa bumi dengan magnitudo rendah rata-rata 3,4, namun kedalaman lebih dalam rata-rata sebesar 136 km. Dari total 211 gempa bumi yang dianalisis, terdapat 29 gempa bumi dalam Klaster 1, 39 gempa bumi dalam Klaster 2, 100 gempa bumi dalam Klaster 3 yang memiliki populasi terbanyak, dan 43 gempa bumi dalam Klaster 4. Penelitian ini memberikan pemahaman yang lebih baik mengenai pola dan karakteristik gempa bumi di wilayah Sulawesi Utara, Indonesia. Kata kunci: Klasterisasi gempa bumi; data gempa; machine learning: agglomerative clustering P-ISSN : 2715-2448 | E-ISSN : 2715-7199 Vol.4 No.2 July 2023 Buana Information Technology and Computer Sciences (BIT and CS) Vol. 4, No.2 July 2023 | 77 I. Introduction Earthquakes are natural phenomena that frequently occur in various regions around the world, including Indonesia, which is known for its high seismic activity. This region is located around the Pacific Ocean, known as the Pacific Ring of Fire (ROF). The Pacific Ring of Fire is a long chain of active volcanoes and tectonic structures encircling the Pacific Ocean. It stretches along the western coast of South and North America, passes through the Aleutian Islands in Alaska, extends along the eastern coast of Asia through New Zealand, and reaches the northern coast of Antarctica. The Pacific Ring of Fire is one of the most active geological areas on Earth and often experiences powerful earthquakes and volcanic eruptions. It is home to over 450 active and inactive volcanoes. Most of these volcanoes are formed through the process of subduction, where dense oceanic plates collide and slide beneath lighter continental plates. Material from the ocean floor melts as it enters the Earth's mantle and then rises to the surface as magma. The deepest trench in the ocean, the Mariana Trench, is located along the western part of the Pacific Ring of Fire. The majority of the world's largest earthquakes also occur within this ring. These earthquakes are caused by sudden movements of rocks laterally or vertically along plate boundaries. Approximately 81% of the world's largest earthquakes occur within the Pacific Ring of Fire [1]. In Indonesia, the frequency of earthquakes is exceptionally high, with an average of 6,512 tectonic earthquake events per year, equivalent to 543 events per month and 18 earthquake events per day [2]. This study focuses on the clustering of earthquakes in the vicinity of North Sulawesi. The data used is sourced from the earthquake repository of the Meteorology, Climatology, and Geophysics Agency (BMKG) for a one-year period in 2022 [3]. A total of 211 earthquakes have been analyzed, meeting the minimum magnitude criteria of 2.5 and a maximum depth of 300 km. The aim of this research is to identify clustering patterns that can provide deeper insights into the seismic activity in the area. Clustering is an effective approach for analyzing and grouping earthquake data based on their characteristics. Several studies have applied this clustering method to categorize disaster data, disaster impacts, and tsunami potentials [4]–[7]. In this study, the agglomerative clustering method is used to partition the data into closely related groups based on attribute similarities. The validation of the unique and optimal number of clusters is performed using the elbow method. The cluster analysis results reveal the presence of four unique clusters. Each cluster exhibits distinct characteristics, including magnitude scale and depth. Understanding the earthquake clustering patterns in the North Sulawesi region can significantly contribute to disaster mitigation efforts. By identifying the specific clusters and their characteristics, authorities and disaster management agencies can gain valuable insights into the distribution and behavior of earthquakes in the area. This knowledge can aid in the development of more targeted and effective disaster preparedness plans, early warning systems, and evacuation strategies. Additionally, it can inform infrastructure development and building codes to ensure resilience against seismic events. Ultimately, this research serves as a valuable resource for stakeholders involved in disaster management, enabling them to make informed decisions and implement proactive measures to reduce the potential impact of earthquakes and enhance the overall resilience of the region. II. Methods This study encompassed various stages, which involved conducting a review of relevant literature, collecting earthquake data, processing the data, applying the elbow method, performing agglomerative clustering, and conducting cluster analysis (refer to Fig. 1). Vol. 4, No.2 July 2023 | 78 Figure 1. Flowchart of research methodology. A. Literature Review In a high-quality research study, it is crucial to have a relevant collection of literature and references as a strong foundation. Therefore, the initial stage of this research involved gathering literature related to the research topic. The collected literature encompassed various important aspects pertaining to the research subject. During the initial stage, the researcher gathered information related to earthquakes. This aimed to understand the characteristics, causes, and impacts of earthquakes that are relevant to the context of this research. Additionally, the researcher studied machine learning, which is the technique or algorithm used for data processing in this study. A profound understanding of machine learning is crucial as it forms the basis for the data clustering conducted in this research. Furthermore, the researcher obtained an understanding of the clustering technique using the agglomerative clustering method. Clustering is a method used to group data with similar characteristics into specific clusters. In the context of this research, the application of the agglomerative clustering method enables the identification of unique clustering patterns and structures within the earthquake data in the North Sulawesi region B. Data Gathering To obtain earthquake data in North Sulawesi for the year 2022, the data source used is the official website of the Meteorology, Climatology, and Geophysics Agency (BMKG), accessible at https://repogempa.bmkg.go.id/ [3]. The geographic region parameter is set to 0°N to 3°N latitude and 123°E to 126°E longitude. Although the data available on this website is limited to a single month, by leveraging knowledge of the API (application programming interface), we can efficiently access and extract data using for loops and the requests module in the Python programming language [8]. The process of retrieving data through the BMKG API will generate an HTML file that needs further processing. To convert it into a more structured format, the data needs to be parsed into a tabular form using the BeautifulSoup library [9]. Once the data has been successfully parsed and organized, the next step is to save it in CSV format for easier further data processing. C. Data Processing After successfully obtaining the earthquake data in North Sulawesi for the year 2022, the next step is to process and explore the data. The data processing is performed using the Python programming language [8], utilizing several packages and libraries as described below: To perform numerical calculations on arrays or matrices, the NumPy library is used [10]. Additionally, for data analysis and processing in the form of dataframes, the Pandas library is employed https://repogempa.bmkg.go.id/ Vol. 4, No.2 July 2023 | 79 [11]. For visualizing graphs and plotting data distribution on maps, the Matplotlib, Seaborn, and Plotly libraries (including the API provided by Mapbox) are utilized [12]–[15]. Before proceeding with the agglomerative clustering method for clustering, the selected parameters for analysis, namely the earthquake magnitude scale (M) and depth (km), undergo a data preprocessing step using the standard scaler method. The standard scaler is a widely used technique in data preprocessing that transforms the data by subtracting the mean and dividing by the standard deviation. This process ensures that the data is centered around zero with a standard deviation of 1, making it more suitable for clustering algorithms. By applying the standard scaler, the magnitude and depth values are normalized to a comparable scale, eliminating any potential biases caused by differences in their measurement units. This normalization allows for a fair comparison and accurate clustering based on the similarity of the scaled features. D. Elbow Method The elbow method is a visual technique utilized to determine the optimal number of clusters for clustering algorithms. It involves plotting the explained variance by each cluster against the number of clusters and observing the point of inflection, commonly referred to as the "elbow," where adding more clusters no longer significantly improves the explained variance [16]. In simpler terms, the elbow method assists in selecting the appropriate number of clusters for the given data by identifying the point at which adding more clusters does not yield substantial enhancements in the clustering outcomes. To apply the elbow method, we initially perform clustering with various numbers of clusters and plot the explained variance against the corresponding number of clusters. The elbow point on the plot represents the stage at which the explained variance begins to level off, indicating that the inclusion of additional clusters does not contribute significantly to the improvement. Once the elbow point is determined, we can choose the number of clusters that strikes a balance between the explained variance and the simplicity of the model. E. Agglomerative Clustering Once the optimal number of clusters has been determined using the elbow method, the subsequent step involves applying agglomerative clustering to the preprocessed data. Agglomerative clustering is a hierarchical clustering technique that begins with each data point assigned as a separate cluster and gradually merges the closest clusters until all data points are grouped into a single cluster [17]. In this research, we utilized the agglomerative clustering algorithm provided by the Python Scikit- learn library [18]. The algorithm requires specifying the number of clusters, which is set to the optimal number obtained from the elbow method. For this study, Ward's linkage criterion was employed, aiming to minimize the sum of squared differences within all clusters. F. Cluster Analysis Afterwards, the resulting clusters are analyzed to identify patterns or trends within the data. This analysis involves examining the characteristics of each cluster, such as the average values of relevant variables, as well as utilizing visualizations like box plots to depict the clusters. By conducting cluster analysis, we can gain insights into the data structure and identify meaningful clusters among the population of 211 earthquakes in North Sulawesi. These clusters can be further investigated to obtain a deeper understanding of the characteristics specific to each cluster. III. Results And Discussion In this section, we will delve into the findings derived from the analysis of earthquake clusters in the Sulawesi Utara region of Indonesia for the year 2022. The application of the elbow method resulted in the identification of four distinct clusters, as illustrated in Fig. 2. These clusters represent groups of earthquakes that share similar characteristics in terms of their magnitudes and depths. Vol. 4, No.2 July 2023 | 80 Following the determination of the optimal number of clusters, the agglomerative clustering technique was employed to further examine the data. This hierarchical clustering method starts by considering each earthquake event as an individual cluster and then iteratively merges the two closest clusters until all data points are grouped into a single cluster. By applying this method to the identified four clusters, we were able to discern notable dissimilarities among them, particularly in terms of the magnitudes and depths of the earthquakes they encompass. Figure 2. Elbow method with agglomerative clustering Out of the total of 211 earthquakes analyzed, we observed that the first cluster consisted of 29 earthquakes, the second cluster contained 39 earthquakes, the third cluster emerged as the largest group with 100 earthquakes, and the fourth cluster comprised 43 earthquakes. These cluster-specific earthquake populations can be visualized in Fig. 3. Figure 3. Distribution of earthquake events based on cluster. Vol. 4, No.2 July 2023 | 81 The distribution of earthquake data and the formed clusters can be found on the map displayed in Fig. 4. In the cluster analysis, Cluster 3 emerged as the most dominant cluster, primarily distributed in the offshore area. This cluster exhibits characteristics of low magnitude scale, with an average of 3.4 M, and shallow depths, with an average of 21 km. Therefore, this area tends to be relatively safe from the impacts of earthquakes. Figure 4. Clustering results: grouping of North Sulawesi earthquake events in 2022 No cluster 1 was found in the main island area. This cluster needs to be closely monitored as it exhibits a relatively high magnitude scale, with an average of 4.4 M, and shallow depths, with an average of 20 km. Shallow depths can increase the potential for damage compared to deeper depths. However, in the northern island areas of mainland Sulawesi, there are several earthquake sources that fall within cluster 1. Therefore, this area requires careful mitigation planning to reduce the risk for the local population. Vol. 4, No.2 July 2023 | 82 Figure 5. Distribution of magnitude based on cluster. Cluster 2 also displays a high magnitude scale, with an average of approximately 4.4 magnitude, and deep depths with an average of 199 km. This type of earthquake often occurs in the main island area, which may be related to the activities of active volcanoes in the North Sulawesi region. Figure 6. Distribution of depth based on cluster. Cluster 4 exhibits a low magnitude scale and deep depths, making it relatively safe. However, it is still important to remain vigilant regarding the potential earthquakes within this cluster. The distribution and descriptive data of the clustering results can be observed in Fig. 5 and Fig. 6, alo ng with Table 1. Vol. 4, No.2 July 2023 | 83 Table 1. Data description based on cluster Cluster Number of earthquakes Average magnitude (M) Average depth (km) 1 29 4.4 20 2 39 4.4 199 3 100 3.4 21 4 43 3.4 136 Having an understanding of the characteristics of the formed clusters, this information can be utilized in disaster mitigation efforts and more effective planning to protect the community and reduce the risks associated with earthquake impacts in North Sulawesi. By incorporating this knowledge, appropriate measures can be taken to enhance preparedness, response, and resilience in the face of seismic events. It is crucial to prioritize the safety and well-being of the population and ensure that strategies are in place to mitigate the potential consequences of earthquakes. You can access the results of the clustering analysis through the following link: https://sl.unsrat.ac.id/EQ-AC. IV. Conclusions In conclusion, the application of agglomerative clustering to analyze earthquake data in the Sulawesi Utara region resulted in the identification of four distinct clusters with varying characteristics in terms of magnitude and depth. These clusters offer valuable insights into the seismic activity patterns in the area. The findings of this study have significant implications for disaster mitigation and risk reduction efforts in Sulawesi Utara. By understanding the clustering patterns of earthquakes, stakeholders can develop more effective strategies to protect the local population and minimize the impact of seismic events. Future research in this field could focus on expanding the dataset by incorporating data from additional sources and over a longer time period. This would provide a more comprehensive understanding of the seismic activity and clustering patterns in Sulawesi Utara. Additionally, investigating the correlation between these earthquake clusters and geological features, such as fault lines or volcanic activity, could provide valuable insights for predicting and mitigating future seismic events. V. References [1] M. Masum and M. A. Akbar, “The Pacific ring of fire is working as a home country of geothermal resources in the world,” in IOP Conference Series: Earth and Environmental Science, IOP Publishing, 2019, p. 012020. [2] A. Sabtaji, “Statistik kejadian gempa bumi tektonik tiap provinsi di wilayah Indonesia selama 11 tahun pengamatan (2009-2019),” Bul. Meteorol. Klimatol. Dan Geofis., vol. 1, no. 7, pp. 31–46, 2020. [3] B. M. K. dan Geofisika, “EQ Repository,” 2023. https://repogempa.bmkg.go.id/ [4] P. Novianti, D. Setyorini, and U. Rafflesia, “K-Means cluster analysis in earthquake epicenter clustering,” Int. J. Adv. Intell. Inform., vol. 3, no. 2, pp. 81–89, 2017. [5] M. Murdiaty, A. Angela, and C. Sylvia, “Pengelompokkan Data Bencana Alam Berdasarkan Wilayah, Waktu, Jumlah Korban dan Kerusakan Fasilitas Dengan Algoritma K-Means,” J. Media Inform. Budidarma, vol. 4, no. 3, pp. 744–752, 2020. [6] M. T. Furqon and L. Muflikhah, “Clustering the potential risk of tsunami using Density-Based Spatial clustering of application with noise (DBSCAN),” J. Environ. Eng. Sustain. Technol., vol. 3, no. 1, pp. 1–8, 2016. https://sl.unsrat.ac.id/EQ-AC Vol. 4, No.2 July 2023 | 84 [7] A. Wahyu and R. Rushendra, “Klasterisasi Dampak Bencana Gempa Bumi Menggunakan Algoritma K-Means di Pulau Jawa,” JEPIN J. Edukasi Dan Penelit. Inform., vol. 8, no. 1, pp. 174–179, 2022. [8] M. F. Sanner and others, “Python: a programming language for software integration and development,” J Mol Graph Model, vol. 17, no. 1, pp. 57–61, 1999. [9] L. Richardson, “Beautiful soup documentation.” April, 2007. [10] C. R. Harris et al., “Array programming with NumPy,” Nature, vol. 585, no. 7825, pp. 357–362, 2020. [11] W. McKinney and others, “pandas: a foundational Python library for data analysis and statistics,” Python High Perform. Sci. Comput., vol. 14, no. 9, pp. 1–9, 2011. [12] P. Barrett, J. Hunter, J. T. Miller, J.-C. Hsu, and P. Greenfield, “matplotlib–A Portable Python Plotting Package,” in Astronomical data analysis software and systems XIV, 2005, p. 91. [13] M. L. Waskom, “Seaborn: statistical data visualization,” J. Open Source Softw., vol. 6, no. 60, p. 3021, 2021. [14] P. T. Inc, “Collaborative data science,” 2015. https://plot.ly [15] Mapbox, “Map, Geocoding and Navigations APIs at: https://docs.mapbox.com/api/maps/.” Accessed, 2023. [16] P. Bholowalia and A. Kumar, “EBK-means: A clustering technique based on elbow method and k-means in WSN,” Int. J. Comput. Appl., vol. 105, no. 9, 2014. [17] D. Müllner, “Modern hierarchical, agglomerative clustering algorithms,” ArXiv Prepr. ArXiv11092378, 2011. [18] F. Pedregosa et al., “Scikit-learn: Machine learning in Python,” J. Mach. Learn. Res., vol. 12, pp. 2825–2830, 2011.