JURNAL RISET INFORMATIKA 
Vol. 5, No. 2. Maret 2022 

P-ISSN: 2656-1743 |E-ISSN: 2656-1735 
DOI: https://doi.org/10.34288/jri.v5i2.512 

Accredited rank 3 (SINTA 3), excerpts from the decision of the Minister of RISTEK-BRIN No. 200/M/KPT/2020 

 
237 

 
CLUSTERING THE IMPACTS OF THE RUSSIA-UKRAINE WAR ON 
PERSONNEL AND EQUIPMENT 

 
Wargijono Utomo 

 
Sistem Informasi 

Universitas Krisnadwipayana 
Jakarta, Indonesia 

https://unkris.ac.id/ 
wargiono@unkris.ac.id 

 
Abstract 

In post-pandemic recovery efforts, uncertainty arose due to the unresolved conflict between the Russia-
Ukraine war. This conflict impacts world security stability and affects the economic, energy, and food 
sectors. This conflict also impacts humanity by causing death to civilians and military personnel, including 
children in Ukraine. The clustering analysis results of the impact of the Russian-Ukrainian war show losses 
and losses in personnel and war equipment, with three cluster optimization methods used through k-
means. Of the two methods that can be recommended, namely elbow and Silhouette, both produce K=3. The 
profiling results show that losses or losses in Ukrainian personnel and war equipment are categorized into 
three clusters, with cluster one being the lowest category, cluster two being the very high category, and 
cluster three being the moderate category. This research is helpful for state agencies, international 
organizations (NGOs), and other stakeholders. 
 
Keywords: Clustering; K-Means; Elbow; Silhouette; Gap Statistics 
 

Abstrak 
Dalam upaya pemulihan pascapandemi, ketidakpastian muncul akibat konflik yang belum terselesaikan 
antara perang Rusia-Ukraina. Konflik ini berdampak pada stabilitas keamanan dunia, dan juga 
mempengaruhi sektor ekonomi, energi dan pangan. Konflik ini juga berdampak pada kemanusiaan dengan 
menyebabkan kematian warga sipil dan personel militer, termasuk anak-anak di Ukraina. Hasil analisis 
clustering dampak perang Rusia-Ukraina menunjukkan kerugian dan kerugian personel dan peralatan 
perang, dengan tiga metode optimasi cluster yang digunakan melalui k-means. Dari dua metode yang dapat 
direkomendasikan yaitu elbow dan Silhouette, keduanya menghasilkan K=3. Hasil profiling menunjukkan 
bahwa kerugian atau kehilangan personel dan peralatan perang Ukraina dikategorikan menjadi tiga klaster, 
dengan klaster satu kategori paling rendah, klaster dua kategori sangat tinggi, dan klaster tiga kategori 
sedang. Penelitian ini bermanfaat bagi lembaga negara, organisasi internasional (LSM), dan pemangku 
kepentingan lainnya. 
 
Kata kunci: Clustering; K-Means; Elbow; Silhouette; Gap Statistics 
 
 
INTRODUCTION 
 
In post-pandemic joint recovery efforts, the 

world is uncertain due to the implications of the 
conflict between Russia and Ukraine, which has not 
been resolved to date. The conflict that is still 
happening has an impact on the world, security 
stability, and its impact on the economy, energy, and 
food which one day will have an indirect impact on 
defense and security(Darmayadi & Megits, 2023; 
Nerlinger & Utz, 2022; Paul, 2015). In addition, the 
outbreak of the Russian-Ukrainian military conflict 
had implications for humanity, resulting in the 
deaths of civilians, military personnel, and even 

children in Ukraine(Haque et al., 2022; Osokina et 
al., 2022). Statista.com, quoted from the Office of the 
UN High Commissioner for Human Rights (OHCHR), 
verified 6,952 civilian deaths during the Russian 
invasion of Ukraine as of January 9, 2023. Of these, 
431 were children. Subsequently, 11,144 people 
were reported injured.  

Various studies that have been conducted 
on the impact of the Russian and Ukrainian wars 
include The EU in the South Caucasus and the 
Impact of the Russia-UkraineWar with a qualitative 
approach(Paul, 2015), The Impact of the Russia-
Ukraine War on the Cryptocurrency Market with the 
IV-GMM method, The impact of the Russia-Ukraine 


P-ISSN: 2656-1743 | E-ISSN: 2656-1735 
DOI: https://doi.org/10.34288/jri.v5i2.512 

JURNAL RISET INFORMATIKA 
Vol. 5, No. 2. Maret 2022 

Accredited rank 3 (SINTA 3), excerpts from the decision of the Minister of RISTEK-BRIN No. 200/M/KPT/2020 

 
238 

 
conflict on energy firms: A capital market 
perspective using the average abnormal returns 
(AAR) method(Nerlinger & Utz, 2022), The human 
toll and humanitarian crisis of the Russia-Ukraine 
war: the first 162 days using the descriptive 
statistics method(Haque et al., 2022). 

Based on various previous studies, this 
paper aims to cluster the impact of the Russian war 
on Ukraine on the human side, such as the number 
of deaths and losses in armament using the k-means 
algorithm and cluster optimization using three 
methods, including elbow, Silhouette, and Gap 
Statistics while clustering data processing using R 
programming(Sinaga & Yang, 2020). Clustering is 
an unsupervised learning method that allows the 
grouping of objects based on different 
characteristics(Yuan & Yang, 2019). The purpose of 
cluster analysis is to discover the structure in 
forming groups of similar cluster objects. Clustering 
is needed to identify the structure of the data. 

This research can contribute to state 
agencies, international organizations (NGOs), and 
stakeholders. This paper is organized into four parts 
to be systematic, including the first part of the 
research background, the second part of the 
research methodology, the third part of the results 
and discussion, and the fourth part of the 
conclusions. 

 
RESEARCH METHODS 

 
Studies on the impact of the Russian-

Ukrainian war on personnel and war equipment are 
carried out using structured, planned, and 
systematic quantitative methods to make it better. 
The research process involves several stages, as 
shown in Figure 1. 

 
Figure 1. Research Methodology 

 
Clustering  

Clustering is a method in data mining that 
aims to group (or classify) items in data into several 
groups (or clusters) based on similarities in their 
features. The goal of clustering is to discover hidden 
structures in data and understand how items are 
related to one another. Clustering is an 
unsupervised machine-learning technique that 
involves grouping similar data points based on 
similarity or distance metrics. The goal of clustering 

is to identify natural groupings within a dataset that 
can be used for further analysis or to gain insight 
into the underlying structure of the data. Clustering 
algorithms typically require no prior knowledge of 
the data or its structure and instead attempt to 
partition the data into distinct clusters based on 
their similarity or dissimilarity. There are many 
different clustering algorithms, including k-means, 
hierarchical, and density-based clustering, each 
with strengths and weaknesses. Clustering is widely 
used in many applications, such as image and text 
processing, marketing segmentation, customer 
profiling, bioinformatics, and anomaly detection, to 
name a few 

 
The K-Means Algorithm 

K-Means is one of the most popular 
clustering algorithms. This algorithm divides data 
into K adjacent groups based on the distance 
between data points. This process is done by 
determining K central points or centroids 
representing each group and then placing each data 
point into the closest group based on the shortest 
distance to the nearest centroid. Given a set of 
objects, the primary aim of the k-means clustering 
is to optimize the following objective function(Cohn 
& Holm, 2021; Govender & Sivakumar, 2020):  

 
𝐽 = ∑ ∑ ‖𝑥𝑖 −𝑖 𝜖 𝑐 𝑗 𝑐𝑗 ‖
2𝑘

𝑗=1  .............................................. (1) 

 
The formula involves a criterion function 

(represented by "j") and various variables, 
including the i-th observation (represented by "xi"), 
the j-th cluster center (represented by "cj"), the set 
of objects in the j-th cluster (represented by "cj"), 
and the number of clusters (represented by "k"). 
The distance between the data object and the 
cluster's center is represented by a norm denoted 
by "‖∗‖." The goal of the criterion function is to 
minimize the distance between each data point and 
the cluster center it is located in. The k-means 
iterative clustering method is commonly executed 
in the following manner: 
1. Please select a value for k and use it to establish 

the initial set of k centroids. 
2. Group each object with the centroid nearest to 

it. 
3. Calculate the mean of the cluster members to 

determine the new centroids for each k cluster. 
4. Iterate steps 2 and 3 until there is no 

modification in the criterion function after an 
iteration. 

 
The Elbow, Silhouette, Gap Statistic  

Elbow Method: The Elbow Method is a 
clustering evaluation method that plots the number 


JURNAL RISET INFORMATIKA 
Vol. 5, No. 2. Maret 2022 

P-ISSN: 2656-1743 |E-ISSN: 2656-1735 
DOI: https://doi.org/10.34288/jri.v5i2.512 

Accredited rank 3 (SINTA 3), excerpts from the decision of the Minister of RISTEK-BRIN No. 200/M/KPT/2020 

 
239 

 
of clusters against the Within-Cluster-Sum-of-
Squared-Errors (WCSS) values (Sinaga & Yang, 
2020). The basic idea of this method is to choose the 
number of clusters that give the best results in 
terms of WCSS. In the WCSS plot versus the number 
of clusters, the "elbow" point on the graph indicates 
the optimal number of clusters to choose from. 

Silhouette Score: Silhouette Score is a 
clustering evaluation metric that considers the 
distance between data points and centroids of other 
clusters. Silhouette scores range between -1 and 1, 
with a value of 1 indicating that the data point is 
strongly related to the current cluster and not 
related to other clusters, while a value of -1 
indicates that the data point is better moved to 
another cluster. 

Gap Statistics: Gap Statistics is a clustering 
evaluation method comparing actual data 
distribution with the same random distribution. 
This method measures how well the clustering 
algorithm separates data into clusters and evaluates 
the optimal number of clusters. A higher Gap 
Statistic value indicates that the clustering 
algorithm better separates data into clusters. 

 
R-Programming Language 

R is a programming language for data 
analysis, visualization, and statistical modeling [16]. 
Developed in 1993, R has a very active community 
and has thousands of packages that can assist in 
performing data analysis tasks, from data cleaning 
to statistical modeling (Peng, 2015). The advantages 
of the R programming language are Open-Source, 
Compatibility with various systems, and Thousands 
of packages available(Mailund, 2017). Powerful 
visualization capabilities: R has many packages that 
allow easy and efficient data visualization. While it 
has many advantages, R also has some 
disadvantages, such as R has many features and 
packages that make it very powerful, but it also 
makes the learning curve quite steep. Slower 
performance compared to other programming 
languages: because it was written in an interpretive 
language, R's performance is sometimes slower 
than other compilation-oriented programming 
languages. In general, R is a compelling and flexible 
programming language used by data analysts and 
statisticians to tackle data analysis and statistical 
modeling tasks 
 

RESULTS AND DISCUSSION 
 
This study used the dataset from 

kaggle.com, which was accessed on January 8, 2023, 
and consisted of two excel format files, namely 
losses equipment, and personnel, each of which was 

319 data consisting of 18 for equipment variables 
and five variables for personnel, which can be seen 
from table 1. From the dataset tables 1 and 2, 
cleaning and unifying the data consists of one 
personnel attribute and eight equipment attributes, 
then transformed into one dataset consisting of 9 
attributes and 319 data, as shown in Figure 2. 

 
Table 1. Personnel Dataset 

 
Table 2. Dataset of equipment losses 

date day aircraft helicopter … 
2/25/2022 2 10 7 … 
2/26/2022 3 27 26 … 

⋮ ⋮ ⋮ ⋮ ⋮ 
1/7/2023 318 285 272 … 
1/8/2023 319 285 272 … 

 
Before processing K-Means clustering data 

using R-Studio, install the required library packages 
such as tidyverse, cluster, factoextra, and dplyr to be 
used (Dmitry & Yerkebulan, 2022; Zhu, Idemudia, & 
Feng, 2019). In the first stage, create scripts or code 
to import excel datasets from the R Studio 
application, which can be used. As seen in code 1, 
then the results of the import dataset can be seen in 
Figure 2. 

 
Code_1_import dataset excel 
library(readxl) 
data_ukraina <- read_excel("D:/Dokumen /data 
ukraina.xlsx") 
View(data_ukraina) 

 
Figure 2. results of import dataset excel 

 
After importing the dataset is complete, 

then coding or scripting for standardizing or 
normalizing data processing, determining the 
number of clusters, implementing K-Means 
clustering, determining classes, data visualization, 
and group profiling. After that, in the second stage, 
coding is done to standardize data or create scale, as 
seen in code 2. 

Date Day Personnel POW 
2/25/2022 2 2800 0 
2/26/2022 3 4300 0 

⋮ ⋮ ⋮ ⋮ 
1/7/2023 318 110740 0 
1/8/2023 319 111170 0 


P-ISSN: 2656-1743 | E-ISSN: 2656-1735 
DOI: https://doi.org/10.34288/jri.v5i2.512 

JURNAL RISET INFORMATIKA 
Vol. 5, No. 2. Maret 2022 

Accredited rank 3 (SINTA 3), excerpts from the decision of the Minister of RISTEK-BRIN No. 200/M/KPT/2020 

 
240 

 
Code_2_the scala distribution and cluster determination 
# normalization of data with scala  
>data_ukraina_scale <- scale(data_ukraina)  
#data standard  
>print(data_ukraina_scale) 
# determining the number of clusters  
>fviz_nbclust(data_ukraina_scale, kmeans, method = "wss") 
# metode elbow 
>fviz_nbclust(data_ukraina_scale, kmeans, method = 
"silhouette") # metode silhouette 
>fviz_nbclust(data_ukraina_scale, kmeans, method = 
"gap_stat") # metode gap_stat 

 
Data normalization in R Studio converts 

data from different scales to a uniform or the same 
scale(Kaparang, Moningkey, & Sumual, 2021; Shelly 
et al., 2020). Normalization is done to correct 
differences in scale between variables that can 
affect the statistical analysis and predictive models 
that are performed. Scale: This function is used to 
standardize data by converting data values into z-
scores, which are an average value of zero and a 
standard deviation of one, and the result is that not 
all of the data can be displayed because there are 
many, as shown in Figure 3. 
 

Figure 3. Data normalization with scale 

  
Next, in the third stage, determine the 

number of clusters using the elbow, Silhouette, and 
statistical gaps to determine the most optimal 
number of clusters. Then coding can be seen in code 
2. significantly towards four and then bends or 
forms an elbow, so it can be concluded that the 
optimal number of clusters k = 3 can be seen in 
Figure 4a. Furthermore, the Silhouette method with 
an average value approach is used to estimate the 
quality of the clusters formed, and the higher the 
average value, the better the quality. From the 
results of this analysis, several clusters are 
considered optimal, namely k = 3 and k = 5, which 
can be seen in Figure 4b, because they have the 
highest average Silhouette value compared to the 
number of other clusters 
 

A. Elbow 

 
B. Silhouette 

 
C. Gap Statistic 

Figure 4. Determining the number of clusters: A. 
Elbow, B. Silhouette, and C. Gap Statistics. 

 
 The Statistical Gap Method is a cluster 
quality evaluation method used to determine the 
optimal number of clusters in cluster analysis. This 
method compares the Statistical Gap value between 
the actual data and the data generated randomly, 
which is at K=1. It can be seen in graph 4c. Based on 
the test of the three methods, two methods can be 
used to determine the optimal cluster, including 
elbow and Silhouette. 
 

Code_3_the application of K-Means, visualization, and 
profiling 
# Application of K-Means clustering 
>final <- kmeans(data_ukraina_scale, centers= 3, nstart = 
25)  
print(final) 
# Cluster visualization 
>fviz_cluster(final, data_ukraina_scale) 
# Group profiling 
>data_ukraina %>% 
mutate(Cluster = final$cluster) %>% 
group_by(Cluster) %>% 
summarise_all("mean") 

 
JURNAL RISET INFORMATIKA 
Vol. 5, No. 2. Maret 2022 

P-ISSN: 2656-1743 |E-ISSN: 2656-1735 
DOI: https://doi.org/10.34288/jri.v5i2.512 

Accredited rank 3 (SINTA 3), excerpts from the decision of the Minister of RISTEK-BRIN No. 200/M/KPT/2020 

 
241 

 
Figure 5. Application of K-Means clustering 

 
In the fourth stage of implementing k-
means[18]–[20], based on Figure 5, there are 3 
clusters with details 67, 106, and 145. Where for the 
average value of the variable clusters personnel, 
aircraft, helicopters, tanks, APCs, Field Artillery, 
MRL, drones, and the naval ship can be seen in 
clusters 1, 2, and 3.  
 

Figure 6. Data Visualization 

 
In addition, in clusters, the number of 

squares indicates the distance between objects in 
the cluster. It can be seen that the distance for 
cluster 1 is 136.39153, for cluster 2 is 97.69972, and 
for cluster 3 is 135.63607. Therefore, the distance 
value for each cluster is 87.0%. 

In the fifth stage, from the results of the 
cluster analysis using the k-means algorithm, the 
results are in the form of three clusters as shown in 
the visualization of Figure 6, namely the results of 
the K-Means Clustering visualization plot which 
consists of three clusters distinguished between 
three colors, namely red, blue and green. The red 
color describes the results of cluster 1, the blue 
color describes the results of cluster 2, and the 
green color explains the results of cluster 3. It can be 
seen that each plot color has a different number of 
members. The following is a display of K-Means 
Clustering results. 
 
 
Figure 7. Group profiling 

 
P-ISSN: 2656-1743 | E-ISSN: 2656-1735 
DOI: https://doi.org/10.34288/jri.v5i2.512 

JURNAL RISET INFORMATIKA 
Vol. 5, No. 2. Maret 2022 

Accredited rank 3 (SINTA 3), excerpts from the decision of the Minister of RISTEK-BRIN No. 200/M/KPT/2020 

 
242 

 
Based on the results above, the last stage can be 
profiling (figure 7) for each group formed 
(Chantaramanee et al., 2022; Lee & Chung, 2016). 
Cluster 1 has the lowest category of loss or loss of 
personnel and war equipment compared to other 
groups. Cluster 2 has a very high category of loss 
and loss of personnel and war equipment. 
Meanwhile, Cluster 3 experienced moderate 
category losses and losses of personnel and war 
equipment in Ukraine. 

 
CONCLUSIONS AND SUGGESTIONS 

 
Conclusion 

Based on the results of a clustering analysis 
of the impact of Russia's war on Ukraine, there were 
losses in personnel and war equipment. Three 
cluster optimization methods are used using k-
means, where two methods can be recommended 
for analysis: elbow, which produces K=3, and 
Silhouette, which also produces K=3. The profiling 
results show that losses or losses in Ukrainian 
personnel and war equipment are categorized into 
three clusters. Cluster one is in the lowest category, 
cluster two is in the very high category, and cluster 
three is in the medium category. 

 
Suggestion 

In order to broaden the scope of further 
research and make it more objective, the dataset 
must also use datasets originating from Russia. In 
addition, other techniques can be combined with 
this research to find a more optimal k value or 
compare it with other clustering methods. 
 

REFERENCES 
 
Chantaramanee, A., Nakagawa, K., Yoshimi, K., 

Nakane, A., Yamaguchi, K., & Tohara, H. 
(2022). Comparison of Tongue 
Characteristics Classified According to 
Ultrasonographic Features Using a K-Means 
Clustering Algorithm. Diagnostics, 12(2). 
https://doi.org/10.3390/diagnostics120202
64 

Cohn, R., & Holm, E. (2021). Unsupervised Machine 
Learning Via Transfer Learning and k-Means 
Clustering to Classify Materials Image Data. 
Integrating Materials and Manufacturing 
Innovation, 10(2), 231–244. 
https://doi.org/10.1007/s40192-021-
00205-8 

Darmayadi, A., & Megits, N. (2023). the Impact of the 
Russia-Ukraine War on the European Union 
Economy. Journal of Eastern European and 
Central Asian Research, 10(1), 46–55. 

https://doi.org/10.15549/jeecar.v10i1.1079 
Dmitry, N., & Yerkebulan, B. (2022). Clustering of 

Dark Patterns in the User Interfaces of 
Websites and Online Trading Portals (E-
Commerce). Mathematics, 10(18). 
https://doi.org/10.3390/math10183219 

Govender, P., & Sivakumar, V. (2020). Application of 
k-means and hierarchical clustering 
techniques for analysis of air pollution: A 
review (1980–2019). In Atmospheric Pollution 
Research (Vol. 11). Turkish National 
Committee for Air Pollution Research and 
Control. 
https://doi.org/10.1016/j.apr.2019.09.009 

Haque, U., Naeem, A., Wang, S., Espinoza, J., 
Holovanova, I., Gutor, T., … Nguyen, U. S. D. T. 
(2022). The human toll and humanitarian 
crisis of the Russia-Ukraine war: the first 162 
days. BMJ Global Health, 7(9), 1–11. 
https://doi.org/10.1136/bmjgh-2022-
009550 

Kaparang, D. R., Moningkey, M. J. M., & Sumual, H. 
(2021). The Distribution Pattern Of New 
Students Admissions Using The K-Means 
Clustering Algorithm. International Journal of 
Information Technology and Business, 3(2), 
52–60. Retrieved from 
https://ejournal.uksw.edu/ijiteb/article/vie
w/4632 

Lee, C., & Chung, M. (2016). Digital Forensic for 
Location Information using Hierarchical 
Clustering and k-means Algorithm. Journal of 
Korea Multimedia Society, 19(1), 30–40. 
https://doi.org/10.9717/kmms.2016.19.1.03
0 

Mailund, T. (2017). Beginning Data Science in R. In 
Beginning Data Science in R. 
https://doi.org/10.1007/978-1-4842-2671-
1 

Nerlinger, M., & Utz, S. (2022). The impact of the 
Russia-Ukraine conflict on energy firms: A 
capital market perspective. Finance Research 
Letters, 50(May), 103243. 
https://doi.org/10.1016/j.frl.2022.103243 

Osokina, O., Silwal, S., Bohdanova, T., Hodes, M., 
Sourander, A., & Skokauskas, N. (2022). 
Impact of the Russian Invasion on Mental 
Health of Adolescents in Ukraine. Journal of 
the American Academy of Child and Adolescent 
Psychiatry, 1–9. 
https://doi.org/10.1016/j.jaac.2022.07.845 

Paul, A. (2015). The EU in the South Caucasus and 
the Impact of the Russia-Ukraine War. 
International Spectator, 50(3), 30–42. 
https://doi.org/10.1080/03932729.2015.10
54223 


JURNAL RISET INFORMATIKA 
Vol. 5, No. 2. Maret 2022 

P-ISSN: 2656-1743 |E-ISSN: 2656-1735 
DOI: https://doi.org/10.34288/jri.v5i2.512 

Accredited rank 3 (SINTA 3), excerpts from the decision of the Minister of RISTEK-BRIN No. 200/M/KPT/2020 

 
243 

 
Peng, R. D. (2015). R Programming for Data Science. 
The R Project; R Foundation, 132. 
https://doi.org/10.1073/pnas.0703993104 

Shelly, Z., Burch, R. F. V., Tian, W., Strawderman, L., 
Piroli, A., & Bichey, C. (2020). Using K-means 
clustering to create training groups for elite 
american football student-athletes based on 
game demands. International Journal of 
Kinesiology and Sports Science, 8(2), 47–63. 
https://doi.org/10.7575//aiac.ijkss.v.8n.2p.4
7 

Sinaga, K. P., & Yang, M. S. (2020). Unsupervised K-
means clustering algorithm. IEEE Access, 8, 

80716–80727. 
https://doi.org/10.1109/ACCESS.2020.2988
796 

Yuan, C., & Yang, H. (2019). Research on K-Value 
Selection Method of K-Means Clustering 
Algorithm. J, 2(2), 226–235. 
https://doi.org/10.3390/j2020016 

Zhu, C., Idemudia, C. U., & Feng, W. (2019). 
Improved logistic regression model for 
diabetes prediction by integrating PCA and K-
means techniques. Informatics in Medicine 
Unlocked, 17(January), 100179. 
https://doi.org/10.1016/j.imu.2019.100179 

 
P-ISSN: 2656-1743 | E-ISSN: 2656-1735 
DOI: https://doi.org/10.34288/jri.v5i2.512 

JURNAL RISET INFORMATIKA 
Vol. 5, No. 2. Maret 2022 

Accredited rank 3 (SINTA 3), excerpts from the decision of the Minister of RISTEK-BRIN No. 200/M/KPT/2020 

 
244