Microsoft Word - ETASR_V11_N3_pp7262-7272


Engineering, Technology & Applied Science Research Vol. 11, No. 3, 2021, 7262-7272 7262 
 

www.etasr.com Koklonis et al.: Utilization of Machine Learning in Supporting Occupational Safety and Health … 

 
Utilization of Machine Learning in Supporting 

Occupational Safety and Health Decisions in Hospital 

Workplace 
 

Kyriakos Koklonis 

Biomedical Engineering Laboratory 
National Technical University of Athens 

Athens, Greece 

kkoklonis@biomed.ntua.gr  

Michail Sarafidis 

Biomedical Engineering Laboratory 
National Technical University of Athens 

Athens, Greece 

msarafidis@biomed.ntua.gr  

Maria Vastardi 

Metaxa Cancer Hospital of Piraeus 

Piraeus, Greece 
mariavastardi@yahoo.gr  

Dimitrios Koutsouris 

Biomedical Engineering Laboratory 

National Technical University of Athens 
Athens, Greece 

dkoutsou@biomed.ntua.gr  
 

Abstract-The prediction of possible future incidents or accidents 

and the efficiency assessment of the Occupational Safety and 

Health (OSH) interventions are essential for the effective 

protection of healthcare workers, as the occupational risks in 

their workplace are multiple and diverse. Machine learning 

algorithms have been utilized for classifying post-incident and 

post-accident data into the following 5 classes of events: 
Needlestick/Cut, Falling, Incident, Accident, and Safety. 476 

event reports from Metaxa Cancer Hospital (Greece), during 

2014-2019, were used to train the machine learning models. The 

developed models showed high predictive performance, with area 

under the curve range 0.950-0.990 and average accuracy of 93% 

on the 10-fold cross set, compared to the safety engineer’s study 

reports. The proposed DSS model can contribute to the 
prediction of incidents or accidents and efficiency evaluation of 

OSH interventions. 

Keywords-occupational health and safety; OSH; machine 

learning; hospital workplace 

I. INTRODUCTION  

Occupational accidents and diseases result in an additional 
economic burden on public social security agencies. These 
losses are estimated at 3.9% of the global Gross Domestic 
Product (GDP) and 3.3% of the European GDP, as it has been 
reported from the European Agency for Safety and Health at 
Work (EU-OSHA) (with variance according to working fields, 
legislative context and prevention incentives) [1]. According to 
the EU-OSHA, proper management of Occupational Safety and 
Health (OSH) leads to numerous benefits, such as reduced 
absenteeism, lessened costs and improved efficiency of 
working methods and technologies [2]. This necessity becomes 
even more imperative during periods of economic recession, 
since poor OSH outlays valuable resources. Particularly, 

regarding the healthcare sector, the hospital workplace is 
characterized by increased level and diversity of occupational 
risks. In addition, the coronavirus disease 2019 (COVID-19) 
pandemic and its immediate aftermath pose a significant 
burden of workload and a major strain on mental health of 
healthcare workers [3]. 

Health professionals constitute a significant part of the total 
European and Greek workforce. According to a relevant study 
of the Greek Labor Inspectorate Body in 2015, 10% of workers 
in the European Union (EU) belong to the healthcare sector [4]. 
More specifically, according to the Hellenic Statistical 
Authority, the number of Greek hospitals amounted up to 283 
during the 2015-2018 period. Moreover, their staff, during 
2017, consisted of 38,952 nursing, 23,354 medical, 6,044 
auxiliary nursing ,, and 7,752 paramedical staff [5]. In Greek 
healthcare sector, a high percentage of work accidents (34% 
higher than the corresponding average in EU, according to the 
EU-OSHA) and an increased incidence of musculoskeletal 
disorders (second largest after construction sector) are 
recorded. According to the data from the Hellenic Statistical 
Authority, in 2018, 3.2% of all work accidents that have been 
officially recorded are related to the health and social care 
sector [6]. Concurrently, there is a dramatic public health 
funding reduction of up to 60% for the last three years [4].  

Computational intelligence and Machine Learning (ML) 
approaches can be utilized towards an improved management 
of OSH. Their potential in occupational accident analysis has 
been highlighted from several studies, as they have been 
reviewed in [7]. Indicatively, statistical methods have been 
used for analyzing health and safety issues of women in 
industry [8], the OSH risk in fuel stations has been assessed 
through Failure Mode and Effects Analysis (FMEA) method 

Corresponding author: Kyriakos Koklonis


Engineering, Technology & Applied Science Research Vol. 11, No. 3, 2021, 7262-7272 7263 
 

www.etasr.com Koklonis et al.: Utilization of Machine Learning in Supporting Occupational Safety and Health … 

 
[9], Naive Bayesian (NB) model has been used for coding 
causation of workers’ compensation claims [10], Bayesian 
Networks (BNs) have been used for analyzing data on 
occupational accidents [11–13], and decision trees have been 
used in the industrial mining sector for predicting the type of 
accident [14]. Artificial Neural Networks (ANNs) have been 
used to correlate causes and OSH conditions [15–19] and they 
have been combined with Support Vector Machines (SVMs) 
for the prediction of occupational accidents [20]. In addition, 
Analytical Hierarchical Process (AHP) has been used in order 
to analyze, control, and provide the occupational risk in the 
industrial mining sector [21], and fuzzy logic has been used for 
occupational risk assessment [22]. The self-organizing map and 
k-means clustering (SOM k-means) methods have been used 
for identifying the dynamics of critical accidents [23]. In [24], 
body-mounted sensors were combined with ML techniques to 
evaluate the ergonomic risk levels caused by overexertion. The 
performance of ML algorithms in classifying post-incident 
outcomes of occupational injuries in agro-manufacturing 
operations has been evaluated in [25]. The minimization of the 
undesirable adverse effects in the development and 
implementation of ML-based Decision Support Systems 
(DSSs) was the objective in [26]. 

Particularly in the healthcare sector, ML has been utilized 
for targeting specific hazards. For instance, ANNs have been 
used to approach the burnout process [27] and statistical 
analysis has been employed to identify factors of 
musculoskeletal disorders in nursing staff [28] and to improve 
the management of indoor air quality in the hospital workplace 
[29]. Correlation methods have been used to assess the Quality 
of Work-Life (QWL) for the nursing profession in China [30]. 
In a more generalized approach, a tool for estimating OSH 
conditions through factor analysis has been evaluated in [31]. 
Factor analysis has also been used for the assessment of 
workplace violence towards healthcare staff in public hospitals 
in Turkey [32]. In addition, a fuzzy logic model for OSH risk 
assessment has been proposed in [33] and a fuzzy cognitive 
map approach for the impact assessment of processes on the 
OSH management system’s effectiveness has been used in 
[34]. Ambulatory monitoring devices, such as wearable and 
environmental sensors, have been combined with ML 
algorithms for a better understanding of stress evolution in 
healthcare workers [35]. 

Based on the previous concise literature review, there is a 
promising research activity in capitalizing ML methods for 
better implementing effective preventive measures and policies 
in the OSH framework. Most of the published studies are based 
on questionnaires to OSH experts [12, 13, 36], instead of 
supporting immediate incident or accident investigations and 
following up the efficiency of the proposed interventions. To 
the best of our knowledge, a DSS which utilizes the legally 
established safety engineer investigations and their audit 
reports has not been reported in the literature. The present 
study proposes a DSS model which integrates all incident or 
accident data along with the safety engineer investigations and 
audit reports in the hospital workplace. Through the input data 
utilization, our DSS aims at supporting OSH decisions by 
predicting future possible incidents or accidents and by 
assessing the efficiency of OSH interventions. 

II. MATERIALS AND METHODS 

A. Data 

In order to train and test the ML models, 476 event reports 
from Metaxa Cancer Hospital (reference period from January 
1, 2014 to December 31, 2019) were collected and utilized. The 
reports were categorized into 5 classes depending on the 
occurred events: 136 reports of "Needlestick /Cut" injuries, 59 
reports of various "Incidents", 20 reports of "Falling" injuries, 
23 reports of "Accidents" and 238 reports of "Safety" 
conditions. One event report consists of multiple variables 
derived from various reports, as detailed below. Reports 
including undone corrective actions or incomplete data, during 
the reference period, were not included in the dataset. The 
worker population of the 476 records included nursing staff 
(52.52%), medical staff (13.45%), cleaning staff (10.50%), 
laboratory staff (9.24%), administrative staff (5.46%), technical 
staff and workers (4.62%), and auxiliary staff (4.20%). The 
mean age of the worker population of the records was 46.61 
years and the mean experience was 16.98 years. Our study has 
been approved by the Scientific and Administrative Council of 
the Metaxa Hospital. Under the Regulation (EU) 2016/679 of 
the European Parliament (General Data Protection Regulation – 
GDPR) on the protection of natural persons [37], 
pseudonymization was used during the coding and data entry 
process.  

B. Software 

Waikato Environment for Knowledge Analysis version 
3.8.3 has been used for the data analysis and predictive 
modeling of the system [38]. This software has been chosen 
because of its wide applications in data mining and ML. 
Moreover, it is freeware and publicly available under the 
General Public License, it includes several data mining and ML 
methods, libraries, and relative supporting tools–procedures, it 
is compatible with other hardware or software platforms, and it 
includes a graphical user interface, making its usage easier 
even without programming knowledge. 

C. Risk Assessment Method 

A very common risk assessment method is the one which is 
consistent with the Memorandum on Occupational Risk 
Assessment by Directorate-General for Employment in Labor 
Relations and Social Affairs (DG V) of the EU (1997) [39]. 
This methodology has also been recommended by the 
Technical Chamber of Greece in a relevant conference [40]. In 
addition, it is widely used by safety engineers and OSH 
auditors. This risk assessment method was used in the proposed 
model. Its approach is based on defining a risk value as a 
product of three factors: the seriousness of the consequences of 
the potential hazard (values 1, 4, 8, 16), the frequency of 
exposure to the hazard (values 1, 2, 3, 4), and the probability of 
occurrence of this hazard (values 1, 2, 3, 4), according to the 
principle of reasonable ambiguity. Therefore, the risk value (R) 
lies within the range of 1 to 256. The usage of these three 
factors is in accordance with relevant legal requirements 
according to Greek Law (Law 3850/2010, Article 43). These 
factors are presented in Table I. The risk values and their 
correlation with the necessity of taking actions are presented in 
Table II. 


Engineering, Technology & Applied Science Research Vol. 11, No. 3, 2021, 7262-7272 7264 
 

www.etasr.com Koklonis et al.: Utilization of Machine Learning in Supporting Occupational Safety and Health … 

 
TABLE I.  RISK – RATING FACTORS 

Factor Rating Interpretation 

Seriousness 

1 Negligible Minor injury without work absence 

4 Middle Injury or disease with work absence 

8 Critical 
Major injury or disease with the 

possibility of permanent health damage 

16 Disastrous Death 

Frequency 

1 Zero 
An employee is exposed to the hazard 

once a year or more rarely 

2 Limited 
An employee is exposed to the hazard 

up to once a week 

3 Often 
An employee is exposed to the hazard 

daily 

4 Continuous 
An employee is exposed to the hazard 

during his/her whole work time 

Probability 

1 Zero Impossible to happen 

2 Low It can happen 

3 Middle Possible to happen 

4 High About to happen 

TABLE II.  RISK RATING  

Risk value R Risk description Measures - actions 

R<16 

Negligible: Risk is very low 

and may not increase during 

the near future without 

changing working conditions 

Measures – corrective actions 

are not necessary 

16<R<32 
Low: Risk is under control, 
but the incident is possible 

Need for monitoring and 

actions for risk reduction. 
Measures – corrective actions 

are not an urgent need 

32<R<64 

Middle: Risk is not 

effectively controlled. An 

incident cannot be excluded 

Need for programming 

measures – corrective actions 

for risk reduction 

64<R<128 

High: Risk is not effectively 

controlled. A serious 

incident is possible 

Need for programming 

measures – corrective actions 

for risk elimination and urgent 

measures – corrective actions 

for risk reduction 

R>128 

Critical: Possibility of 

human loss. A serious 

incident is about to happen 

Need for urgent measures – 

corrective actions for risk 

elimination 

 
D. DSS Model Information Flow 

The OSH related events are categorized into two main 
categories: incidents and accidents. The accidents, according to 
Greek Law, are all the events that result in damage, injury or 
harm and require, at least a two-day, medical leave and relevant 
notification to the Labor Inspectorate Body. The incidents 
(near-misses) are less serious events that have unintentionally 
happened and are linked to absence from work only on the 
specific day of their occurrence. The procedure that is followed 
after an occupational incident or accident in the hospital 
workplace is described below: the supervisor of the relevant 
division and the responsible desk officer are alerted. The 
update of the associated data files follows, according to the 
internal process of the hospital and the relevant legal 
requirements (Law 3850/2010, Article 2, Paragraph 2). The 
issue is registered in the book of incidents, the book of 
accidents, the incident – accident report, the record of exposure 
to biological agents at work, the report to the Social Insurance 
Institute, the police and the electronic notification to the Labor 
Inspectorate Body. Then, the safety engineer in charge 
investigates the incident or accident in order to identify its 

causes and propose interventions, by completing risk 
assessment tables manually in printed versions (Law 
3850/2010, Articles 14, 15). The safety engineer’s actions aim 
at ensuring that repetition of the incident or the accident could 
be avoided in the future. The manually filled tables are in 
accordance with the Methodological guide for the assessment 
and prevention of occupational risks of the Hellenic Institute 
for Occupational Health and Safety [41] regarding the coding 
of hazards, and with the method described in the previous 
section [39] regarding the risk values. The safety engineer 
cooperates with the occupational doctor, when necessary.  

The above-mentioned data for every incident and accident 
are used as an input to the proposed DSS. For the efficient 
management of this information, the basic idea was to organize 
and incorporate the data into a database, which makes the 
information more usable and manageable in comparison with 
the printed forms. The information flow diagram of the 
proposed DSS is depicted in Figure 1. There are three major 
blocks in this diagram. Firstly, the block of recording and 
notifying incidents – accidents (upper left) which describes the 
process that the responsible personnel follow to record every 
incident and accident in the corresponding books, to fill in the 
incident – accident report and notify the authorities. Secondly, 
the block of risk assessment and investigation (lower left) 
which describes the process that the safety engineer and 
occupational doctor follow to identify OSH hazards, to express 
the corresponding risk values and to associate these hazards 
with every incident and accident through investigation and 
estimation of OSH interventions. These two blocks are the 
inputs to the proposed DSS model. The output is the third 
block which describes the results of the classification process, 
according to the trained models. The decision is based on the 
risk values of the different variables. The input of the model 
includes two sets of variables. The first set consists of 6 
variables (F1 – F6) from the first block which are related to the 
employees. Their values come up from the incident or accident 
relative paper report. The second set consists of 77 variables 
(F7 – F83) from the second block. All 83 variables are 
summarized in Table III. The selection of these variables was 
based on [41]. The main categories of this set are workplace 
related variables relevant to the infrastructural building, 
machinery - equipment, electrical installation, hazardous 
substances, fire, chemical - physical - biological factors, and 
transversal - organizational risks.  

Each occupational event corresponds to one instance and 
was categorized into one of the following 4 classes: 
"Needlestick/Cut", "Falling", "Incident" and "Accident" 
according to their type. The class of "Incident" was divided into 
three different subclasses because of the frequency distribution 
of incidents in the hospital workplace. Needlesticks/cuts and 
falling events present the highest frequency, which is in 
accordance with [4]. For this reason, two separate classes were 
used for this kind of incidents, "Needlestick/Cut" and "Falling", 
respectively. The class named "Incident" was used to include 
all other kinds of incidents in the hospital workplace. The class 
"Accident" was used to include all kinds of accidents. The 
values for these variables express the OSH conditions and 
hazards that are related to the incident or accident, as they have 
been identified by the safety engineer. These values were 


Engineering, Technology & Applied Science Research Vol. 11, No. 3, 2021, 7262-7272 7265 
 

www.etasr.com Koklonis et al.: Utilization of Machine Learning in Supporting Occupational Safety and Health … 

 
recorded according to the safety engineer’s reports and risk 
assessment tables after investigation of the corresponding 
incident or accident. Finally, after the completion of the 
corrective actions, all the above instances were assigned with 
updated risk values and were categorized into the class 
"Safety". Each new instance of "Safety" originated from one 
instance of "Needlestick/Cut" or "Falling" or "Incident" or 
"Accident". In this way, our dataset consists of 238 
incident/accident instances and 238 safety instances (as 
described in section "Data"). More specifically, the values for 
the instances of "Safety" express the proposed measures or 
interventions to reduce risk at acceptable levels (R≤32 for 
accidents and R≤24 for incidents, according to the relative 
methodology). These values were recorded according to the 
post-investigation safety engineer's updated reports and risk 
assessment tables after up to two months for every incident or 
accident. During this period, the corrective actions had been 
completed and confirmed according to the safety engineer’s 
proposed measures. The output of the system is one of the 
above-mentioned 5 classes (Y1 – Y5) and expresses the type of 
the event that certain occupational conditions could lead to 
(Table III). As the predicted classes were a priori known, our 
model appertains to supervising learning algorithms.  

TABLE III.  VARIABLES OF THE PROPOSED MODEL 

Variables Code Type Values 

Employee 

variables 

Age F1 Numeric Years 

Experience F2 Numeric Years 

Training F3 Nominal Yes, No 

Repetitive training every six 

months at least 
F4 Nominal Yes, No 

Relative PPE during the 

incident/accident 
F5 Nominal Yes, No 

Satisfaction for OSH climate F6 Numeric 1-5 

Workplace 

variables 

relatively to 

infrastructural 

building 

Floors/stairs F7 Numeric 1-256 

Workplace space F8 Numeric 1-256 

Workplace height F9 Numeric 1-256 

Workplace volume F10 Numeric 1-256 

Doors/windows F11 Numeric 1-256 

Space illumination F12 Numeric 1-256 

Storage/lofts F13 Numeric 1-256 

Uncovered gaps F14 Numeric 1-256 

Obstacles F15 Numeric 1-256 

Exits/escape routes F16 Numeric 1-256 

Walls F17 Numeric 1-256 

Shelves/dexions F18 Numeric 1-256 

Ceilings F19 Numeric 1-256 

Basements F20 Numeric 1-256 

Walkways F21 Numeric 1-256 

Roof F22 Numeric 1-256 

Cleanliness F23 Numeric 1-256 

Lack of access to exits or 

fire-extinguishing system 
F24 Numeric 1-256 

Safety/escape signing F25 Numeric 1-256 

Workplace 

variables 

relatively to 

machinery – 

equipment 

Maintenance protocol F26 Numeric 1-256 

Lack of safety during usage F27 Numeric 1-256 

Guards for avoiding the 

accidental start 
F28 Numeric 1-256 

Guards from moving parts F29 Numeric 1-256 

Ejectable particles F30 Numeric 1-256 

CE signing F31 Numeric 1-256 

Cutting works F32 Numeric 1-256 

Lifting machinery F33 Numeric 1-256 

Transport vehicles F34 Numeric 1-256 

Ladders F35 Numeric 1-256 

Pneumatic tools F36 Numeric 1-256 

Elevators F37 Numeric 1-256 

Other machinery F38 Numeric 1-256 

Non-usage of PPE F39 Numeric 1-256 

Pressure devices F40 Numeric 1-256 

Access to facilities or 

equipment 
F41 Numeric 1-256 

Workplace 

variables 

relatively to 

electrical 

installation 

Electrical installation F42 Numeric 1-256 

Inappropriate usage of 

electrical installation 
F43 Numeric 1-256 

Inappropriateness in 

explosive atmospheres 
F44 Numeric 1-256 

Lack of safety during usage 

of electrical installation 
F45 Numeric 1-256 

Lack of safety during 

maintenance of the electrical 

installation 

F46 Numeric 1-256 

Hazardous substances related 

to equipment (generators, 

batteries, etc.) 

F47 Numeric 1-256 

Workplace 

variables 

relatively to 

hazardous 

substances 

Toxic substances F48 Numeric 1-256 

Caustic substances F49 Numeric 1-256 

Corrosive substances F50 Numeric 1-256 

Irritant substances F51 Numeric 1-256 

Oxidizing substances F52 Numeric 1-256 

Explosive substances F53 Numeric 1-256 

Flammable material F54 Numeric 1-256 

Appropriate cabinets F55 Numeric 1-256 

Workplace 

variables 

relatively to 

fire 

Fire signing F56 Numeric 1-256 

Banning smoking F57 Numeric 1-256 

Banning flame F58 Numeric 1-256 

Storage of flammable 

material 
F59 Numeric 1-256 

Lack of fire protection 

systems 
F60 Numeric 1-256 

Training in fire emergency 

plan 
F61 Numeric 1-256 

Fire rxtinguishers F62 Numeric 1-256 

Lack of training in fire 

security 
F63 Numeric 1-256 

Workplace 

variables 

relatively to 

chemical 

factors 

Dust F64 Numeric 1-256 

Fibres asbestos F65 Numeric 1-256 

Smoke/steam F66 Numeric 1-256 

Particles F67 Numeric 1-256 

Other substances F68 Numeric 1-256 

Dipping/splashing F69 Numeric 1-256 

Workplace 

variables 

relatively to 

physical factors 

Noise F70 Numeric 1-256 

Vibrations F71 Numeric 1-256 

Radiation F72 Numeric 1-256 

Illumination F73 Numeric 1-256 

Microclimate F74 Numeric 1-256 

Workplace 

variables 

relatively to 

biological 

factors 

Bacteria F75 Numeric 1-256 

Fungi F76 Numeric 1-256 

Viruses F77 Numeric 1-256 

Other factors F78 Numeric 1-256 

Workplace 

variables 

relatively to 

transversal 

risks 

Settlement - manual handling F79 Numeric 1-256 

Psychological factors F80 Numeric 1-256 

Provision - intervention 

programs 
F81 Numeric 1-256 

Ergonomy F82 Numeric 1-256 

Hardship conditions F83 Numeric 1-256 

OUTPUT 

Falling Y1 Nominal Yes, No 

Needlestick/cut Y2 Nominal Yes, No 

Incident Y3 Nominal Yes, No 

Accident Y4 Nominal Yes, No 

Safety Y5 Nominal Yes, No 


Engineering, Technology & Applied Science Research Vol. 11, No. 3, 2021, 7262-7272 7266 
 

www.etasr.com Koklonis et al.: Utilization of Machine Learning in Supporting Occupational Safety and Health … 

 
Fig. 1.  The information flow diagram of the proposed model. 

E. Prediction Models and Classification 

ML is concerned with the development of intelligent 
systems that can learn from data. These systems can be used to 
classify an object into a correct class based on features 
characterizing the object. Actually, the algorithms are 
searching for patterns in data. The ML algorithms that 
implement pattern classification are known as classifiers. There 
are several classifiers, each one of them having certain 
limitations and advantages [42]. In this study, in order to 
develop the proposed DSS model, we employed and tested 4 
classifiers: The Naive Bayesian (NB) classifier [43], the 
Bayesian Network (ΒΝ), the k-Nearest Neighbors (k-NN) 
classifier, and the Multilayer Perceptron (MLP) network. The 
choice was based on the fact that the use of conditional 
probabilities offers a practical metric as it can model 
occupational health in a class conditional way. This 
formulation helps us elucidate class conditional variables of 
interest. 

NB classification is one of the most popular data mining 
algorithms. Its efficiency comes from the assumption of 
attribute independence [40, 41]. It is a simple probabilistic 
classifier based on Bayes’ theorem and it uses the method of 
maximum similarity. It can work in complex real-world 
situations and it requires a small amount of training data [42, 
43]. BN is an effective tool to demonstrate complicated 
relationships and a powerful tool for knowledge representation 
and reasoning. It is also suitable for small and incomplete 
datasets. It is particularly convenient for modeling causal 
relationships among variables (with a combination of different 
sources of information) and handling of uncertainty for 
decision analysis [13]. It is a model of probabilistic inference 
defined by a triplet (X, G, P), where X = (X1, X2, ⋯, Xn) is the 
set of factors, G is a directed acyclic graph, and P is a joint 
probability distribution. The nodes for G are labeled with the 

elements of X and the arcs of G represent the probabilistically 
conditional dependence relationship between nodes. Each 
factor is associated with a conditional probabilistic table that 
defines the probability of each state for that factor [10, 44]. The 
k-NN is a simple and intuitive classifier. It classifies a sample 
based on the k-closest training samples in the feature space, by 
computing the distances between the new sample and the 
samples of the training set. According to these distances it 
classifies the sample to the class of the closest training samples 
(neighbors) [39, 45]. An MLP is a class of feedforward ANN 
and one of the most widely used. It consists of the input layer, 
which receives external inputs, one or more hidden layers, and 
an output layer. Each layer includes one or more neurons 
linked between successive layers. The Back-Propagation (BP) 
algorithm was used for the training of the model. In this 
technique, information about the errors of the network on 
known data is propagated backwards, the connections are 
adjusted, and the error is minimized [49]. All Weka parameters 
for each of the above models are summarized in the Appendix. 

The classifiers were designed to classify the cases into one 
of the afore-mentioned 5 classes. Each classifier takes as inputs 
the data from the reports of each event and outputs its 
classification group, providing in this way a prediction 
regarding the actual occupational risk of hazards of each 
hospital worker in their certain hospital workplace, and under 
certain OSH conditions. 

F. Training, Validation, and Performance Evaluation 

The train/test split method [50] was used as a validation 
technique for training. The available dataset was divided into 2 
subsets: the training set (80% of cases - 379 cases) which was 
used to train the classifiers, and the test set (20% of cases - 97 
cases) which was used to evaluate their predictive performance 
(previously unseen data). The 2 sets were properly stratified so 
that the distribution of classes in each subset is approximately 


Engineering, Technology & Applied Science Research Vol. 11, No. 3, 2021, 7262-7272 7267 
 

www.etasr.com Koklonis et al.: Utilization of Machine Learning in Supporting Occupational Safety and Health … 

 
the same as that in the initial dataset. Thus, each subset 
contains representative samples of the same larger population. 
To minimize sampling bias, k-fold cross-validation was also 
used for the whole dataset for training and test (k=10). In this 
technique, k dataset divisions are used for equivalent separation 
of the dataset into folds, k-1 folds are used for training and one-
fold is used for test. The process is repeated k times. Through 
this process, each one of the available data is used one time as a 
member of the test set and k-1 times as a member of the 
training set [48, 49]. Data entry was done in a spreadsheet file, 
since this format can be easily used in daily OSH practice by 
the safety engineer and hospital staff. Files were saved in .csv 
type. Inserting and processing files to Weka needed their 
conversion to the .arff type. Considering that incidents and 
accidents are stochastic events, that are very differentiated in 
detail, a certain methodology has been used for risk assessment 
(including a certain set of risk variables that should be 
investigated in every case). Feature selection method was not 
applied.  

In order to evaluate the performance of the proposed DSS 
model compared to the safety engineer’s reports involved in 
this study, we calculated the recall (sensitivity), and the Area 
Under the Receiver Operating Characteristic (ROC) Curve of 
the models on the basis of detecting the 5 output classes. The 
ROC curve is a standard validation approach for probabilistic 
classifiers [50, 51]. A ROC curve is a two-dimensional graph in 
which the True Positive Rate (TPR) (sensitivity or recall) is 
plotted on the y-axis and the False Positive Rate (FPR or 1 - 
specificity) is plotted on the x-axis, as a function of the 
threshold u, ranging from 0 to 1 [55]. The Area Under the 
Curve (AUC) can be interpreted as a measure of overall 
accuracy, varying from 0.5 (random guess) to 1 (perfect 
performance) [56]. The recall metric answers the question of 
what proportion of actual positives is correctly classified. Its 
choice was based on the fact that recall measure penalizes false 
negatives but not false positives [57]. In an OSH approach, we 
want to capture as many positives as possible (we want to 
capture any contingent incident or accident even if we are not 
totally sure). These metrics can be expressed according to the 
number of True Positives (TP), True Negatives (TN), False 
Positives (FP), and False Negatives (FN) classifications 
through the following equations [58]:  

Recall = Sensitivity = TPR = TP/(TP+FN)    (1) 

1 – Specificity = FPR = FP/(TN+FP)    (2) 

The confusion matrices obtained through testing the 
classifiers on the training and the test sets were also used for 
the predictive performance of the models. Each column in a 
confusion matrix represents the instances that are classified in 
one class and each row represents the instances in their real 
class. The error rate of the classified instances was calculated 
for each class. 

III. RESULTS 

A. Performance Comparison of the Algorithms 

Figure 2 presents the performance of the different models 
compared to the safety engineer’s reports involved in this 
study, in terms of mean recall, AUC, error rate, and the overall 

accuracy of each model (percentage of the correctly classified 
instances). The detailed table with the predictive performance 
of the models for each of the 5 classes is included in the 
Appendix. 

 
Fig. 2.  The predictive performance of the models. 

The overall classification accuracies of the developed 
models on the training set, test set, and 10-fold cross set were 
97%, 77%, and 93% respectively. The MLP showed the 
highest accuracy for the test set and the 10-fold cross set 
(94.85% and 96.01%, respectively). The MLP and the BN 
showed the lower average Error Rate (ER) on the test set 
(10.71% and 14.35%, respectively) and on the 10-fold cross set 
(6.99% and 8.82%, respectively). The MLP showed the highest 
sensitivity (recall) for the test set and the 10-fold cross set 
(89% and 93%, respectively). In addition, this model showed 
the most highly balanced combination of sensitivity results on 
the 10-fold cross set (see Figure 2). The average AUC was high 
for both the test and the 10-fold cross sets (0.91 and 0.98, 
respectively). The MLP showed the highest AUC value for the 
test set and the 10-fold cross set (1.00 and 0.99, respectively). 
It must be noted that all of the developed models showed high 
AUC (range: 0.83-1.00 on the test set and range: 0.95-0.99 on 
the 10-fold cross set). 

B. Practical Implications 

Safety engineers carry out inspections of workplaces, 
equipment, and OSH conditions, during their visits. The safety 
engineers complete risk assessment tables manually in printed 
versions to record the results of their inspections. This process 
is repeated after every incident or accident. In this case, the 
assessment focuses on causes that are related to the specific 
incident or accident. The developed DSS model aims to 


Engineering, Technology & Applied Science Research Vol. 11, No. 3, 2021, 7262-7272 7268 
 

www.etasr.com Koklonis et al.: Utilization of Machine Learning in Supporting Occupational Safety and Health … 

 
support the safety engineer with evidence for the risk values of 
hazards and recommended actions. The MLP and the BN 
models (10-fold cross set) were used in the DSS, as they were 
the models with the best predictive performance. A further 
elaboration of the solution methodology including the proposed 
model is depicted in Figure 3. The utilization of the DSS model 
can start in every case of incident, accident or audit. The 

associated data files should be then updated by the responsible 
personnel and the safety engineer. Coding these event reports 
to instances in risk assessment tables should follow in order to 
update the database. Afterwards, the re-trained or the existing 
models can be used to support with evidence any OSH decision 
(efficiency of OSH intervention, employee’s movement, 
appropriateness of workplace or equipment etc.). 

 
Fig. 3.  A solution methodology including the proposed model. 

In order to evaluate the proposed DSS model, a case study 
was carried out. The new case study test set resulted from a 
regular process. A typical instance of this process is the 
following: The safety engineers of the hospital, during the audit 
in specific departments and workplaces (01/2020), proceeded 
in identifying hazards of possible incidents. In parallel, they 
proposed specific corrective actions for the limitation or the 
elimination of these hazards by inserting the desired risk values 
for the model’s variables (assuming that corrective actions have 
been taken). The basic safety engineers' observations, during 
the one-month period, and the corresponding incidents were: 

• In the archive room, an employee used a chair instead of an 
appropriate ladder for catching files from shelves. 
Moreover, there was a lack of tidiness in the workplace and 
several obstacles prevented the free movement (possible 
incident: "Falling"). 

• When the safety engineer came out of the archive room, he 
observed that even though the floor had just been wiped 
out, the sign "Caution: slippery floor" had not been placed 
(possible incident: "Falling"). 

• During their visit to the pathologist laboratory, the safety 
engineer found out that there was no preventive nor 
corrective maintenance file for one microtome according to 
its operation manual. In a conversation with an employee in 
this laboratory, the latter expressed complains about OSH 
conditions (possible incident: "Needlestick/Cut"). 

• During the visit in the surgery room, a surgical nurse 
complained about excessive workload and lack of training 
(possible incident: "Needlestick/Cut"). 

• In the laundry room, and specifically in the bedsheet folder 
machine, the proper guard of the moving part was missing 
(possible incident: "Incident"). 

• A member of the waiting staff used an unfunctional trolley 
and lifted many food trays simultaneously to carry them to 
each patient room (possible incident: "Incident"). 

For each one of these instances, a "Safety" class was also 
recorded, relatively to the corresponding measures – corrective 
actions. The dataset that came up from this scenario was used 

as a new case study test set for the proposed DSS model. The 
total 12 instances were correctly classified by the DSS model. 
Only one instance was incorrectly classified by BN model. 
Even in this case, the result was not classified as a "Safety" 
instance but one "Falling" was classified as "Incident". This 
result was expected since this kind of incident (falling from 
chair) was not included in the training set of the used model. 
To evaluate further the usability of the proposed system, the 
DSS model was used to support human resource management 
for an internal employee movement decision approached from 
an OSH perspective. In January 2020, 10 hospital workers were 
moved in different departments to cover the respective 
demands. The safety engineer inspected the destination 
workplaces and completed the relevant risk assessment tables 
manually in printed versions. These tables were classified to 9 
instances of "Safety" and 1 instance of "Incident". The 
proposed interventions for the case of "Incident" included 
training on the work practice (waiting staff duties) and 
corrective actions in equipment (lighter trays and functional 
trolleys). The movement was done after the completion of 
these measures. The "Safety" state was confirmed by the safety 
engineer one week after the movement. 

IV. DISCUSSION AND CONCLUSIONS 

The aim of this study was to create a DSS model for the 
prediction of incidents and accidents in the hospital workplace. 
This model is based on the incident and accident data from 
reports of personnel and safety engineer investigations and 
audits in the hospital workplace. The preliminary results 
suggest that the proposed system may support OSH decisions 
by predicting incidents and accidents, and also by assessing the 
effectiveness of the OSH interventions. Through the evaluation 
of the different classification models, we concluded that MLP 
on the 10-fold cross set had the most balanced predictive 
performance in terms of overall accuracy (96.01%), ER 
(6.99%), AUC (0.990), and recall (93%), rendering this model 
reliable as a tool in OSH analysis in accordance with [13–17, 
25]. Furthermore, the high predictive performance of all the 
developed models is notable (AUC range: 0.95-0.99 and 
average accuracy: 93% on the 10-fold cross set). 

A major benefit of the proposed DSS model is that it 
utilizes processes and resources that already exist and are 


Engineering, Technology & Applied Science Research Vol. 11, No. 3, 2021, 7262-7272 7269 
 

www.etasr.com Koklonis et al.: Utilization of Machine Learning in Supporting Occupational Safety and Health … 

 
applied in the daily hospital working life. The exploitation of 
these printed data records for incidents and accidents, and 
safety engineer audits and investigation reports are essential for 
OSH management. The utilization is related with identifying 
hazards that can lead to incident or accident, reducing the 
occupational risk, evaluating the efficacy of OSH measures – 
interventions, supporting management decisions (employee’s 
recruitment or movement to a different department, etc.) from 
the OSH perspective, improving OSH conditions and reducing 
the overall cost of injuries [23, 34]. The proposed change is 
related to the transfer of coded paper records into a DSS which 
is promising for OSH decisions. This process could result in 
faster and more accurate predictions according to OSH injuries 
and interventions, in accordance with [9, 34, 62], emphasizing 
the significance of quantitative analysis of empirical injury data 
in the safety science [25]. Another advantage of the proposed 
model is the fact that it can be adjusted to various working 
sectors, depending on the training sets and the existing records 
of incidents and accidents in each of them. For its application 
in other hospitals, the system can be easily updated with new 
datasets, according to the status of each hospital (incidents and 
accidents that take place and OSH conditions). The frequency 
of updating or retraining the classification models depends on 
the type and the frequency of incidents and accidents.   

To maximize the usability and the practicality of the DSS 
model, a certain approach for the risk values of various factors 
should be followed. For instance, the classes "Incident" and 
"Accident" may be confused during initial workplace risk 
assessment of a random scenario, because of the lack of 
knowledge about the necessity of the required medical leave 
and its length. The dataset for accidents needs to be improved, 
however, any kind of incident should be interpreted as an alert 
which can anytime lead to an accident. Incidents are very 
important for OSH, in the sense of "near miss", as a situation 
that could lead to an accident, particularly in the case of 
repeated incidents of the same kind [59, 60]. In addition, 
certain incidents can evolve in occupational diseases (like 
blood-borne diseases because of needlestick/cuts and chronic 
musculoskeletal disorders) [61]. An occupational disease 
according to the domestic legislative framework is equivalent 
to an occupational accident [62]. The experience of the safety 
engineer and the accordance with the corresponding 
methodology are very important key factors for this 
assessment. 

According to the proposed DSS model, an alert for taking 
corrective actions is related to all the classes that are different 
from the "Safety" one. The proposed measures and corrective 
actions target the elimination of hazards. The elimination is not 
always possible in all cases. Instead of elimination, limitation 
of hazards to acceptable levels should be the next step forward. 
The relation between cost and optimal level of risk is formed 
according to the ALARP (As Low As Reasonably Practicable) 
theory [60, 61]. In other words, entering the minimum values 
of risk for "Safety" class has no practical meaning since total 
lack of risk cannot happen in real conditions, both in terms of 
cost and inherent risk (a risk that is related to the nature of 
work). On the contrary, "armor logic" should not be applied 
during risk assessment by inserting maximum values during 
identifying and highlighting a hazard. In any case, the 

interesting point related to the values of risk variables is the 
threshold where the value of a specific risk factor leads to the 
transition from one class to another. 

A field for further research is the extensibility of the 
proposed DSS model and its development as a user-friendly 
web application platform. The development of the proposed 
DSS as a web app could make it more usable and accessible. 
Furthermore, the directions of this extensibility can include the 
facilitation of data entry to the system through Natural 
Language Process (NLP) (because there are several 
handwritten records) and the interconnection between the 
proposed DSS model and other tools/applications that will 
improve the efficiency of the proposed measures and corrective 
actions or/and the validity of risk assessment (scales for 
musculoskeletal disorders like key item method [66], working 
stress questionnaires, optimal allocation of OSH capital 
investment, etc.). 

ACKNOWLEDGMENT 

We would like to thank the Administration, the Scientific 
Council, and the personnel of Metaxa Cancer Hospital of 
Piraeus, Greece for their approval and participation in the 
implementation of the proposed model. 

APPENDIX 

WEKA PARAMETERS FOR CLASSIFICATION ALGORITHMS 

Model Weka parameters 

NB 

batch size = 100 

debug = False 

kernel estimator = False 

doNotCheckCapabilities = False 

useSupervisedDiscretization = False 

BN 

batch size = 100 

debug = False 

doNotCheckCapabilities = False 

estimator = Sample Estimator – A 0.5 

searchAlgorithm = K2-P1-S BAYES 

useADTree = False 

k-NN 

k = 9 

batch size = 100 

crossValidate = False 

debug = False 

distanceWeighting = No 

doNotCheckCapabilities = False 

meanSquared = False 

nearestNeighbourSearchAlgorithm = LinearNNSearch - 

Euclidean distance First-Last 

MLP 

autoBuild = True 

batch size = 100 

debug = False 

decay = False 

doNotCheckCapabilities = False 

hiddenLayers = a, where a = (number of attributes + 

number of classes) /2 = 44 learning rate = 0.3 

momentum = 0.2 

nominalToBinaryFilter = True 

normalizeAttributes = True 

normalizeNumericClass = True 

reset = True 

seed = 0 

trainingTime (number of epochs) = 500 

validationSetSize = 0 

validation threshold = 20 


Engineering, Technology & Applied Science Research Vol. 11, No. 3, 2021, 7262-7272 7270 
 

www.etasr.com Koklonis et al.: Utilization of Machine Learning in Supporting Occupational Safety and Health … 

 
PREDICTIVE PERFORMANCE OF THE DIFFERENT MODELS 

VM
a
 
MA

a
 

(%) 

Performance Evaluation Metrics 
MV

a
 

 Accident Falling Incident Cutting Safety 

NB 

Train 

set 

(80%) 

(379) 

93.40 

ER (%) 0.00 0.00 23.40 3.70 5.26 6.47 

Recall 1.000 1.000 0.766 0.963 0.947 0.94 

AUC 1.000 0.995 0.968 0.982 0.986 0.99 

Test 

set 

(20%) 

(97) 

71.13 

ER (%) 60.00 0.00 16.67 3.57 47.92 25.63 

Recall 0.600 1.000 0.833 0.964 0.521 0.78 

AUC 0.753 0.992 0.862 0.964 0.991 0.91 

10-fold 

cross 

(476) 

92.23 

ER (%) 8.69 15.00 28.81 3.67 4.20 12.07 

Recall 0.913 0.850 0.712 0.963 0.958 0.88 

AUC 0.995 0.944 0.890 0.969 0.966 0.95 

k-NN 

Train 

set 

(80%) 

(379) 

95.78 

ER (%) 0.00 37.5 21.28 0.00 0.00 11.76 

Recall 1.000 0.625 0.787 1.000 1.000 0.88 

AUC 1.000 0.986 0.990 0.997 1.000 0.99 

Test 

set 

(20%) 

(97) 

70.10 

ER (%) 0.00 50.00 25.00 3.57 47.92 25.30 

Recall 1.000 0.500 0.750 0.964 0.521 0.75 

AUC 0.875 0.985 0.983 0.997 0.751 0.92 

10-fold 

cross 

(476) 

92.86 

ER (%) 0.00 50.00 37.29 1.47 0.00 17.75 

Recall 1.000 0.500 0.627 0.985 1.000 0.82 

AUC 1.000 0.981 0.966 0.994 1.000 0.99 

BN 

Train 

set 

(80%) 

(379) 

97.10 

ER (%) 0.00 18.75 10.64 2.78 0.00 6.43 

Recall 1.000 0.813 0.894 0.972 1.000 0.94 

AUC 1.000 0.993 0.987 0.996 1.000 1.00 

Test 

set 

(20%) 

(97) 

72.16 

ER (%) 0.00 0.00 16.67 7.14 47.92 14.35 

Recall 1.000 1.000 0.833 0.929 0.521 0.86 

AUC 0.875 1.000 0.810 0.900 0.570 0.83 

10-fold 

cross 

(476) 

91.18 

ER (%) 0.00 15.00 13.56 5.88 9.66 8.82 

Recall 1.000 0.850 0.864 0.941 0.903 0.91 

AUC 0.975 0.985 0.979 0.994 0.913 0.97 

MLP 

Train 

set 

(80%) 

(379) 

99.74 

ER (%) 0.00 0.00 2.12 0.00 0.00 0.42 

Recall 1.000 1.000 0.979 1.000 1.000 1.00 

AUC 1.000 1.000 0.984 0.998 1.000 1.00 

Test 

set 

(20%) 

(97) 

94.85 

ER (%) 0.00 25.00 25.00 3.57 0.00 10.71 

Recall 1.000 0.750 0.750 0.964 1.000 0.89 

AUC 1.000 1.000 0.992 0.996 1.000 1.00 

10-fold 

cross 

(476) 

96.01 

ER (%) 0.00 10.00 22.03 2.94 0.00 6.99 

Recall 1.000 0.900 0.780 0.971 1.000 0.93 

AUC 0.999 0.996 0.971 0.991 1.000 0.99 
a.
 VM: Validation Model, MA: Model Accuracy, MV: Mean Value. 

REFERENCES 

[1] D. Elsler, J. Takala, and J. Remes, "An international comparison of the 

cost of work-related accidents and illnesses," European Agency for 
Safety and Health at Work, 2017. 

[2] "Good OSH is good for business," EU-OSHA. 

https://osha.europa.eu/el/themes/good-osh-is-good-for-business 
(accessed May 26, 2021). 

[3] K. Cosic, S. Popovic, M. Sarlija, I. Kesedzic, and T. Jovanovic, 
"Artificial intelligence in prediction of mental health disorders induced 

by the COVID-19 pandemic among health care workers," Croatian 
Medical Journal, vol. 61, no. 3, pp. 279–288, Jun. 2020, https://doi.org/ 

10.3325/cmj.2020.61.279. 

[4] K. Dimoulas, G. Kollias, C. Bagavos, and T. Ganetaki, Work and health 
problems in Greece. Athens, Greece: INE-GSEE Work Institute, 2015. 

[5] Hospital Inventory 2018. Athens, Greece: Hellenic Statistical Authority, 
2020. 

[6] Survey on Accidents at Work, 2018. Athens, Greece: Hellenic Statistical 

Authority, 2020. 

[7] S. Sarkar and J. Maiti, "Machine learning in occupational accident 
analysis: A review using science mapping approach with citation 

network analysis," Safety Science, vol. 131, p. 104900, Nov. 2020, 
https://doi.org/10.1016/j.ssci.2020.104900. 

[8] F. Siddiqui, M. A. Akhund, A. H. Memon, A. R. Khoso, and H. U. Imad, 
"Health and Safety Issues of Industry Workmen," Engineering, 

Technology & Applied Science Research, vol. 8, no. 4, pp. 3184–3188, 
Aug. 2018, https://doi.org/10.48084/etasr.2138. 

[9] S. Y. Far, R. Mirzaei, M. B. Katrini, M. Haghshenas, and Z. Sayahi, 

"Assessment of Health, Safety and Environmental Risks of Zahedan City 
Gasoline Stations," Engineering Technology & Applied Science 

Research, vol. 8, no. 2, pp. 2689–2692, 2018. 

[10] S. J. Bertke, A. R. Meyers, S. J. Wurzelbacher, J. Bell, M. L. Lampl, and 
D. Robins, "Development and evaluation of a Naïve Bayesian model for 

coding causation of workers’ compensation claims," Journal of Safety 
Research, vol. 43, no. 5, pp. 327–332, Dec. 2012, https://doi.org/ 

10.1016/j.jsr.2012.10.012. 

[11] G. Nanda, K. M. Grattan, M. T. Chu, L. K. Davis, and M. R. Lehto, 
"Bayesian decision support for coding occupational injury data," Journal 

of Safety Research, vol. 57, pp. 71–82, Jun. 2016, https://doi.org/ 
10.1016/j.jsr.2016.03.001. 

[12] J. E. M. E. Martin, J. T.-G. Taboada-Garcia, S. G. Gerassis, A. S. 

Saavedra, and R. Martinez-Alegria, "Bayesian network analysis of 
accident risk in information-deficient scenarios," Revista de la 

Construcción. Journal of Construction, vol. 16, no. 3, pp. 439–446, 
2017, https://doi.org/10.7764/RDLC.16.3.439. 

[13] A. P. C. Chan, F. K. W. Wong, C. K. H. Hon, and T. N. Y. Choi, "A 

Bayesian Network Model for Reducing Accident Rates of Electrical and 
Mechanical (E&amp;M) Work," International Journal of Environmental 

Research and Public Health, vol. 15, no. 11, Nov. 2018, Art. no. 2496, 
https://doi.org/10.3390/ijerph15112496. 

[14] L. Sanmiquel, M. Bascompta, J. M. Rossell, H. F. Anticoi, and E. 
Guash, "Analysis of Occupational Accidents in Underground and 

Surface Mining in Spain Using Data-Mining Techniques," International 
Journal of Environmental Research and Public Health, vol. 15, no. 3, 

Mar. 2018, Art. no. 462, https://doi.org/10.3390/ijerph15030462. 

[15] A. Soltanzadeh, I. Mohammadfam, S. Mahmoudi, B. A. Savareh, and A. 
M. Arani, "Analysis and forecasting the severity of construction 

accidents using artificial neural network," Safety promotion and injury 
prevention (Tehran), vol. 4, no. 3, pp. 185–192, 2016. 

[16] D. A. Patel and K. N. Jha, "Neural Network Approach for Safety 

Climate Prediction," Journal of Management in Engineering, vol. 31, no. 
6, Nov. 2015, Art. no. 05014027, https://doi.org/10.1061/ 

(ASCE)ME.1943-5479.0000348. 

[17] A. M. Abubakar, H. Karadal, S. W. Bayighomog, and E. Merdan, 
"Workplace injuries, safety climate and behaviors: application of an 

artificial neural network," International Journal of Occupational Safety 
and Ergonomics, vol. 26, no. 4, pp. 651–661, Oct. 2020, https://doi.org/ 

10.1080/10803548.2018.1454635. 

[18] F. A. Moayed and R. L. Shell, "Application of Artificial Neural Network 
Models in Occupational Safety and Health Utilizing Ordinal Variables," 

The Annals of Occupational Hygiene, vol. 55, no. 2, pp. 132–142, Mar. 
2011, https://doi.org/10.1093/annhyg/meq079. 

[19] I. Mohammadfam, A. Soltanzadeh, A. Moghimbeigi, and B. A. Savareh, 

"Use of Artificial Neural Networks (ANNs) for the Analysis and 
Modeling of Factors That Affect Occupational Injuries in Large 

Construction Industries," Electronic Physician, vol. 7, no. 7, pp. 1515–
1522, Nov. 2015, https://doi.org/10.19082/1515. 

[20] S. Sarkar, S. Vinay, R. Raj, J. Maiti, and P. Mitra, "Application of 
optimized machine learning techniques for prediction of occupational 

accidents," Computers & Operations Research, vol. 106, pp. 210–224, 
Jun. 2019, https://doi.org/10.1016/j.cor.2018.02.021. 

[21] J. Bao, J. Johansson, and J. Zhang, "An Occupational Disease 

Assessment of the Mining Industry’s Occupational Health and Safety 


Engineering, Technology & Applied Science Research Vol. 11, No. 3, 2021, 7262-7272 7271 
 

www.etasr.com Koklonis et al.: Utilization of Machine Learning in Supporting Occupational Safety and Health … 

 
Management System Based on FMEA and an Improved AHP Model," 
Sustainability, vol. 9, no. 1, Jan. 2017, Art. no. 94, https://doi.org/ 

10.3390/su9010094. 

[22] H. R. S. A. Mard, A. Estiri, P. Hadadi, and M. S. A. Mard, 
"Occupational risk assessment in the construction industry in Iran," 

International Journal of Occupational Safety and Ergonomics, vol. 23, 
no. 4, pp. 570–577, Oct. 2017, https://doi.org/10.1080/10803548. 

2016.1264715. 

[23] L. Comberti, M. Demichela, G. Baldissone, G. Fois, and R. Luzzi, 
"Large Occupational Accidents Data Analysis with a Coupled 

Unsupervised Algorithm: The S.O.M. K-Means Method. An Application 
to the Wood Industry," Safety, vol. 4, no. 4, Dec. 2018, Art. no. 51, 

https://doi.org/10.3390/safety4040051. 

[24] N. D. Nath, T. Chaspari, and A. H. Behzadan, "Automated ergonomic 

risk monitoring using body-mounted sensors and machine learning," 
Advanced Engineering Informatics, vol. 38, pp. 514–526, Oct. 2018, 

https://doi.org/10.1016/j.aei.2018.08.020. 

[25] F. Davoudi Kakhki, S. A. Freeman, and G. A. Mosher, "Utilization of 
Machine Learning in Analyzing Post-incident State of Occupational 

Injuries in Agro-Manufacturing Industries," in Advances in Safety 
Management and Human Performance, P. M. Arezes and R. L. Boring, 

Eds. New York, NY, USA: Springer, 2020, pp. 3–9. 

[26] S. D. Mwmc et al., "Ethical Considerations of Using Machine Learning 
for Decision Support in Occupational Health: An Example Involving 

Periodic Workers’ Health Assessments.," Journal of Occupational 
Rehabilitation, vol. 30, no. 3, pp. 343–353, Sep. 2020, https://doi.org/10. 

1007/s10926-020-09895-x. 

[27] F. Ladstatter, E. Garrosa, B. Moreno-Jimenez, V. Ponsoda, J. M. R. 
Aviles, and J. Dai, "Expanding the occupational health methodology: A 

concatenated artificial neural network approach to model the burnout 
process in Chinese nurses," Ergonomics, vol. 59, no. 2, pp. 207–221, 

Feb. 2016, https://doi.org/10.1080/00140139.2015.1061141. 

[28] Y.-H. Kim and M.-H. Jung, "Effect of occupational health nursing 
practice on musculoskeletal pains among hospital nursing staff in South 

Korea," International Journal of Occupational Safety and Ergonomics, 
vol. 22, no. 2, pp. 199–206, Apr. 2016, https://doi.org/10.1080/ 

10803548.2015.1078046. 

[29] A. Fonseca, I. Abreu, M. J. Guerreiro, C. Abreu, R. Silva, and N. Barros, 
"Indoor Air Quality and Sustainability Management—Case Study in 

Three Portuguese Healthcare Units," Sustainability, vol. 11, no. 1, Jan. 
2019, Art. no. 101, https://doi.org/10.3390/su11010101. 

[30] S. Lin, N. Chaiear, J. Khiewyoo, B. Wu, and N. P. Johns, "Preliminary 
Psychometric Properties of the Chinese Version of the Work-Related 

Quality of Life Scale-2 in the Nursing Profession," Safety and Health at 
Work, vol. 4, no. 1, pp. 37–45, Mar. 2013, https://doi.org/10.5491/ 

SHAW.2013.4.1.37. 

[31] W. Turnberg and W. Daniell, "Evaluation of a healthcare safety climate 
measurement tool," Journal of Safety Research, vol. 39, no. 6, pp. 563–

568, Jan. 2008, https://doi.org/10.1016/j.jsr.2008.09.004. 

[32] A. K. Celik, E. Oktay, and K. Cebi, "Analysing workplace violence 
towards health care staff in public hospitals using alternative ordered 

response models: the case of north-eastern Turkey," International 
Journal of Occupational Safety and Ergonomics, vol. 23, no. 3, pp. 328–

339, Jul. 2017, https://doi.org/10.1080/10803548.2017.1316612. 

[33] M. Stefanovic, D. Tadic, M. Djapan, and I. Macuzic, "Software for 
Occupational Health and Safety Risk Analysis Based on a Fuzzy 

Model," International Journal of Occupational Safety and Ergonomics, 
vol. 18, no. 2, pp. 127–136, Jan. 2012, https://doi.org/10.1080/ 

10803548.2012.11076923. 

[34] A. Sklad, "Assessing the impact of processes on the Occupational Safety 
and Health Management System’s effectiveness using the fuzzy 

cognitive maps approach," Safety Science, vol. 117, pp. 71–80, Aug. 
2019, https://doi.org/10.1016/j.ssci.2019.03.021. 

[35] V. Ravuri et al., "Group-specific models of healthcare workers’ well-

being using iterative participant clustering," in Second International 
Conference on Transdisciplinary AI, Irvine, CA, USA, Sep. 2020, pp. 

115–118, https://doi.org/10.1109/TransAI49837.2020.00026. 

[36] K. Vallmuur, "Machine learning approaches to analysing textual injury 

surveillance data: A systematic review," Accident Analysis & 

Prevention, vol. 79, pp. 41–49, Jun. 2015, https://doi.org/10.1016/ 
j.aap.2015.03.018. 

[37] "Regulation (EU) 2016/679 of the European Parliament and of the 

Council of 27 April 2016 on the protection of natural persons with 
regard to the processing of personal data and on the free movement of 

such data, and repealing Directive 95/46/EC (General Data Protection 
Regulation) (Text with EEA relevance)." Publications Office of the 

European Union, Apr. 27, 2016. 

[38] "Home - Weka Wiki," The University of Waikato. https://waikato.github. 
io/weka-wiki/ (accessed May 27, 2021). 

[39] "Memorandum on Occupational Risk Assessment." Directorate-General 
for Employment in Labor Relations and Social Affairs (DG V) of the 

European Union, 1997. 

[40] "Occupational Risk Assessment." Technical Chamber of Greece, 2001. 

[41] S. Drivas, K. Zorba, and T. Koukoulaki, Methodological guide for the 
assessment and prevention of occupational risk. Athens, Greece: 

Hellenic Institute of Occupational Health and Safety, 2000. 

[42] P. Bountris et al., "An Intelligent Clinical Decision Support System for 
Patient-Specific Predictions to Improve Cervical Intraepithelial 

Neoplasia Detection," BioMed Research International, vol. 2014, 2014, 
https://doi.org/10.1155/2014/341483. 

[43] S. Chen, G. I. Webb, L. Liu, and X. Ma, "A novel selective naïve Bayes 

algorithm," Knowledge-Based Systems, vol. 192, Mar. 2020, Art. no. 
105361, https://doi.org/10.1016/j.knosys.2019.105361. 

[44] K. Koutroumbas and S. Theodoridis, Pattern Recognition, 4th ed. 

London, UK: Elsevier, 2008. 

[45] M. A. Burhanuddin, R. Ismail, N. Izzaimah, A. A.-J. Mohammed, and 

N. Zainol, "Analysis of Mobile Service Providers Performance Using 
Naive Bayes Data Mining Technique," International Journal of 

Electrical & Computer Engineering, vol. 8, no. 6, pp. 5153–5161, 2018. 

[46] R. Shinde, S. Arjun, P. Patil, and J. Waghmare, "An Intelligent Heart 
Disease Prediction System Using K-Means Clustering and Naïve Bayes 

Algorithm," International Journal of Computer Science and Information 
Technologies, vol. 6, no. 1, pp. 637–639, 2015. 

[47] S. J. Russell, P. Norvig, S. Russell, and Russell, Artificial intelligence: A 

Modern Approach. New Jersey, USA: Prentice Hall, 2010. 

[48] D. Michie, D. J. Spiegelhalter, C. C. Taylor, and J. Campbell, Eds., 
Machine learning, neural and statistical classification. New York, NY, 

USA: Ellis Horwood, 1995. 

[49] R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification, 2nd ed. 
Hoboken New Jersey, USA: Wiley, 2001. 

[50] I. H. Witten, E. Frank, and M. A. Hall, Data Mining: Practical Machine 

Learning Tools and Techniques, 3rd ed. Burlington, MA, USA: Morgan 
Kaufmann, 2011. 

[51] S. M. Weiss and C. A. Kulikowski, Computer Systems That Learn: 
Classification and Prediction Methods from Statistics, Neural Nets, 

Machine Learning and Expert Systems. San Mateo, CA, USA: Morgan 
Kaufmann, 1990. 

[52] B. D. Ripley, Pattern Recognition and Neural Networks. Cambridge, 

MA, USA: Cambridge University Press, 2008. 

[53] A. Nola et al., "Occupational accidents in temporary work," La 
Medicina Del Lavoro, vol. 92, no. 4, pp. 281–285, Aug. 2001. 

[54] T. Fawcett, "An introduction to ROC analysis," Pattern Recognition 

Letters, vol. 27, no. 8, pp. 861–874, Jun. 2006, https://doi.org/10.1016/ 
j.patrec.2005.10.010. 

[55] J. López-García, M. Saldaña, S. Herrero, and J. Gutiérrez, "Bayesian 

network analysis of the influence of labour market variables on accident 
rates of workers in Spain," in Risk, Reliability and Safety: Innovating 

Theory and Practice: Proceedings of ESREL 2016, Glasgow, UK, Sep. 
2016, pp. 1660–1667, https://doi.org/10.1201/9781315374987-250. 

[56] J. A. Hanley and B. J. McNeil, "The meaning and use of the area under a 

receiver operating characteristic (ROC) curve.," Radiology, vol. 143, no. 
1, pp. 29–36, Apr. 1982, https://doi.org/10.1148/radiology.143. 

1.7063747. 

[57] S. Alvarez, "An exact analytical relation among recall, precision, and 

classification accuracy in information retrieval," Boston College, 


Engineering, Technology & Applied Science Research Vol. 11, No. 3, 2021, 7262-7272 7272 
 

www.etasr.com Koklonis et al.: Utilization of Machine Learning in Supporting Occupational Safety and Health … 

 
Boston, MA, USA, Technical Report BCCS-02-01 (2002): 1-22, Jan. 
2002. 

[58] R. Burduk, "Classification Performance Metric for Imbalance Data 

Based on Recall and Selectivity Normalized in Class Labels," 
arXiv:2006.13319 [cs, stat], Jun. 2020, Accessed: May 26, 2021. 

[Online]. Available: http://arxiv.org/abs/2006.13319. 

[59] O. Ug, S. Wd, S. M, and P. A, "Improve Process Safety with Near-Miss 
Analysis," Chemical Engineering Progress, vol. 109, no. 5, pp. 20–27, 

2013. 

[60] M. G. Gnoni, S. Andriulo, G. Maggio, and P. Nardone, "‘Lean 

occupational’ safety: An application for a Near-miss Management 
System design," Safety Science, vol. 53, pp. 96–104, Mar. 2013, 

https://doi.org/10.1016/j.ssci.2012.09.012. 

[61] E. Alexopoulos, Greek and International experience of accidents at 
work and occupational diseases of hospital employees. Guide to 

Occupational Risk Assessment and Prevention. Athens, Greece: 
EL.Y.A., 2007. 

[62] "Circular 45/24-06-2010: Occupational Accident 2010." Social Security 

Institution, 2010. 

[63] G. Reniers and T. Brijs, "An Overview of Cost-benefit Models/Tools for 
Investigating Occupational Accidents," Chemical Engineering 

Transactions, vol. 36, pp. 43–48, Apr. 2014, https://doi.org/10.3303/ 
CET1436008. 

[64] Health and Safety Executive, "Risk management: Expert guidance - 

ALARP at a glance." https://www.hse.gov.uk/managing/theory/ 
alarpglance.htm (accessed May 26, 2021). 

[65] S. J. Bertke, A. R. Meyers, S. J. Wurzelbacher, J. Bell, M. L. Lampl, and 

D. Robins, "Development and evaluation of a Naïve Bayesian model for 
coding causation of workers’ compensation claims," Journal of Safety 

Research, vol. 43, no. 5, pp. 327–332, Dec. 2012, https://doi.org/ 
10.1016/j.jsr.2012.10.012. 

[66] K. Koklonis, A. Anastasiou, O. Petropoulou, S. Pitoglou, D. Iliopoulou, 
and D. Koutsouris, "Utilizing Key Item Method to Manage 

Musculoskeletal Disorders in a Hospital Workplace," in 41st Annual 
International Conference of the IEEE Engineering in Medicine and 

Biology Society, Berlin, Germany, Jul. 2019, pp. 3420–3423, 
https://doi.org/10.1109/EMBC.2019.8857649.