Microsoft Word - ETASR_V11_N3_pp7262-7272 Engineering, Technology & Applied Science Research Vol. 11, No. 3, 2021, 7262-7272 7262 www.etasr.com Koklonis et al.: Utilization of Machine Learning in Supporting Occupational Safety and Health … Utilization of Machine Learning in Supporting Occupational Safety and Health Decisions in Hospital Workplace Kyriakos Koklonis Biomedical Engineering Laboratory National Technical University of Athens Athens, Greece kkoklonis@biomed.ntua.gr Michail Sarafidis Biomedical Engineering Laboratory National Technical University of Athens Athens, Greece msarafidis@biomed.ntua.gr Maria Vastardi Metaxa Cancer Hospital of Piraeus Piraeus, Greece mariavastardi@yahoo.gr Dimitrios Koutsouris Biomedical Engineering Laboratory National Technical University of Athens Athens, Greece dkoutsou@biomed.ntua.gr Abstract-The prediction of possible future incidents or accidents and the efficiency assessment of the Occupational Safety and Health (OSH) interventions are essential for the effective protection of healthcare workers, as the occupational risks in their workplace are multiple and diverse. Machine learning algorithms have been utilized for classifying post-incident and post-accident data into the following 5 classes of events: Needlestick/Cut, Falling, Incident, Accident, and Safety. 476 event reports from Metaxa Cancer Hospital (Greece), during 2014-2019, were used to train the machine learning models. The developed models showed high predictive performance, with area under the curve range 0.950-0.990 and average accuracy of 93% on the 10-fold cross set, compared to the safety engineer’s study reports. The proposed DSS model can contribute to the prediction of incidents or accidents and efficiency evaluation of OSH interventions. Keywords-occupational health and safety; OSH; machine learning; hospital workplace I. INTRODUCTION Occupational accidents and diseases result in an additional economic burden on public social security agencies. These losses are estimated at 3.9% of the global Gross Domestic Product (GDP) and 3.3% of the European GDP, as it has been reported from the European Agency for Safety and Health at Work (EU-OSHA) (with variance according to working fields, legislative context and prevention incentives) [1]. According to the EU-OSHA, proper management of Occupational Safety and Health (OSH) leads to numerous benefits, such as reduced absenteeism, lessened costs and improved efficiency of working methods and technologies [2]. This necessity becomes even more imperative during periods of economic recession, since poor OSH outlays valuable resources. Particularly, regarding the healthcare sector, the hospital workplace is characterized by increased level and diversity of occupational risks. In addition, the coronavirus disease 2019 (COVID-19) pandemic and its immediate aftermath pose a significant burden of workload and a major strain on mental health of healthcare workers [3]. Health professionals constitute a significant part of the total European and Greek workforce. According to a relevant study of the Greek Labor Inspectorate Body in 2015, 10% of workers in the European Union (EU) belong to the healthcare sector [4]. More specifically, according to the Hellenic Statistical Authority, the number of Greek hospitals amounted up to 283 during the 2015-2018 period. Moreover, their staff, during 2017, consisted of 38,952 nursing, 23,354 medical, 6,044 auxiliary nursing ,, and 7,752 paramedical staff [5]. In Greek healthcare sector, a high percentage of work accidents (34% higher than the corresponding average in EU, according to the EU-OSHA) and an increased incidence of musculoskeletal disorders (second largest after construction sector) are recorded. According to the data from the Hellenic Statistical Authority, in 2018, 3.2% of all work accidents that have been officially recorded are related to the health and social care sector [6]. Concurrently, there is a dramatic public health funding reduction of up to 60% for the last three years [4]. Computational intelligence and Machine Learning (ML) approaches can be utilized towards an improved management of OSH. Their potential in occupational accident analysis has been highlighted from several studies, as they have been reviewed in [7]. Indicatively, statistical methods have been used for analyzing health and safety issues of women in industry [8], the OSH risk in fuel stations has been assessed through Failure Mode and Effects Analysis (FMEA) method Corresponding author: Kyriakos Koklonis Engineering, Technology & Applied Science Research Vol. 11, No. 3, 2021, 7262-7272 7263 www.etasr.com Koklonis et al.: Utilization of Machine Learning in Supporting Occupational Safety and Health … [9], Naive Bayesian (NB) model has been used for coding causation of workers’ compensation claims [10], Bayesian Networks (BNs) have been used for analyzing data on occupational accidents [11–13], and decision trees have been used in the industrial mining sector for predicting the type of accident [14]. Artificial Neural Networks (ANNs) have been used to correlate causes and OSH conditions [15–19] and they have been combined with Support Vector Machines (SVMs) for the prediction of occupational accidents [20]. In addition, Analytical Hierarchical Process (AHP) has been used in order to analyze, control, and provide the occupational risk in the industrial mining sector [21], and fuzzy logic has been used for occupational risk assessment [22]. The self-organizing map and k-means clustering (SOM k-means) methods have been used for identifying the dynamics of critical accidents [23]. In [24], body-mounted sensors were combined with ML techniques to evaluate the ergonomic risk levels caused by overexertion. The performance of ML algorithms in classifying post-incident outcomes of occupational injuries in agro-manufacturing operations has been evaluated in [25]. The minimization of the undesirable adverse effects in the development and implementation of ML-based Decision Support Systems (DSSs) was the objective in [26]. Particularly in the healthcare sector, ML has been utilized for targeting specific hazards. For instance, ANNs have been used to approach the burnout process [27] and statistical analysis has been employed to identify factors of musculoskeletal disorders in nursing staff [28] and to improve the management of indoor air quality in the hospital workplace [29]. Correlation methods have been used to assess the Quality of Work-Life (QWL) for the nursing profession in China [30]. In a more generalized approach, a tool for estimating OSH conditions through factor analysis has been evaluated in [31]. Factor analysis has also been used for the assessment of workplace violence towards healthcare staff in public hospitals in Turkey [32]. In addition, a fuzzy logic model for OSH risk assessment has been proposed in [33] and a fuzzy cognitive map approach for the impact assessment of processes on the OSH management system’s effectiveness has been used in [34]. Ambulatory monitoring devices, such as wearable and environmental sensors, have been combined with ML algorithms for a better understanding of stress evolution in healthcare workers [35]. Based on the previous concise literature review, there is a promising research activity in capitalizing ML methods for better implementing effective preventive measures and policies in the OSH framework. Most of the published studies are based on questionnaires to OSH experts [12, 13, 36], instead of supporting immediate incident or accident investigations and following up the efficiency of the proposed interventions. To the best of our knowledge, a DSS which utilizes the legally established safety engineer investigations and their audit reports has not been reported in the literature. The present study proposes a DSS model which integrates all incident or accident data along with the safety engineer investigations and audit reports in the hospital workplace. Through the input data utilization, our DSS aims at supporting OSH decisions by predicting future possible incidents or accidents and by assessing the efficiency of OSH interventions. II. MATERIALS AND METHODS A. Data In order to train and test the ML models, 476 event reports from Metaxa Cancer Hospital (reference period from January 1, 2014 to December 31, 2019) were collected and utilized. The reports were categorized into 5 classes depending on the occurred events: 136 reports of "Needlestick /Cut" injuries, 59 reports of various "Incidents", 20 reports of "Falling" injuries, 23 reports of "Accidents" and 238 reports of "Safety" conditions. One event report consists of multiple variables derived from various reports, as detailed below. Reports including undone corrective actions or incomplete data, during the reference period, were not included in the dataset. The worker population of the 476 records included nursing staff (52.52%), medical staff (13.45%), cleaning staff (10.50%), laboratory staff (9.24%), administrative staff (5.46%), technical staff and workers (4.62%), and auxiliary staff (4.20%). The mean age of the worker population of the records was 46.61 years and the mean experience was 16.98 years. Our study has been approved by the Scientific and Administrative Council of the Metaxa Hospital. Under the Regulation (EU) 2016/679 of the European Parliament (General Data Protection Regulation – GDPR) on the protection of natural persons [37], pseudonymization was used during the coding and data entry process. B. Software Waikato Environment for Knowledge Analysis version 3.8.3 has been used for the data analysis and predictive modeling of the system [38]. This software has been chosen because of its wide applications in data mining and ML. Moreover, it is freeware and publicly available under the General Public License, it includes several data mining and ML methods, libraries, and relative supporting tools–procedures, it is compatible with other hardware or software platforms, and it includes a graphical user interface, making its usage easier even without programming knowledge. C. Risk Assessment Method A very common risk assessment method is the one which is consistent with the Memorandum on Occupational Risk Assessment by Directorate-General for Employment in Labor Relations and Social Affairs (DG V) of the EU (1997) [39]. This methodology has also been recommended by the Technical Chamber of Greece in a relevant conference [40]. In addition, it is widely used by safety engineers and OSH auditors. This risk assessment method was used in the proposed model. Its approach is based on defining a risk value as a product of three factors: the seriousness of the consequences of the potential hazard (values 1, 4, 8, 16), the frequency of exposure to the hazard (values 1, 2, 3, 4), and the probability of occurrence of this hazard (values 1, 2, 3, 4), according to the principle of reasonable ambiguity. Therefore, the risk value (R) lies within the range of 1 to 256. The usage of these three factors is in accordance with relevant legal requirements according to Greek Law (Law 3850/2010, Article 43). These factors are presented in Table I. The risk values and their correlation with the necessity of taking actions are presented in Table II. Engineering, Technology & Applied Science Research Vol. 11, No. 3, 2021, 7262-7272 7264 www.etasr.com Koklonis et al.: Utilization of Machine Learning in Supporting Occupational Safety and Health … TABLE I. RISK – RATING FACTORS Factor Rating Interpretation Seriousness 1 Negligible Minor injury without work absence 4 Middle Injury or disease with work absence 8 Critical Major injury or disease with the possibility of permanent health damage 16 Disastrous Death Frequency 1 Zero An employee is exposed to the hazard once a year or more rarely 2 Limited An employee is exposed to the hazard up to once a week 3 Often An employee is exposed to the hazard daily 4 Continuous An employee is exposed to the hazard during his/her whole work time Probability 1 Zero Impossible to happen 2 Low It can happen 3 Middle Possible to happen 4 High About to happen TABLE II. RISK RATING Risk value R Risk description Measures - actions R<16 Negligible: Risk is very low and may not increase during the near future without changing working conditions Measures – corrective actions are not necessary 16128 Critical: Possibility of human loss. A serious incident is about to happen Need for urgent measures – corrective actions for risk elimination D. DSS Model Information Flow The OSH related events are categorized into two main categories: incidents and accidents. The accidents, according to Greek Law, are all the events that result in damage, injury or harm and require, at least a two-day, medical leave and relevant notification to the Labor Inspectorate Body. The incidents (near-misses) are less serious events that have unintentionally happened and are linked to absence from work only on the specific day of their occurrence. The procedure that is followed after an occupational incident or accident in the hospital workplace is described below: the supervisor of the relevant division and the responsible desk officer are alerted. The update of the associated data files follows, according to the internal process of the hospital and the relevant legal requirements (Law 3850/2010, Article 2, Paragraph 2). The issue is registered in the book of incidents, the book of accidents, the incident – accident report, the record of exposure to biological agents at work, the report to the Social Insurance Institute, the police and the electronic notification to the Labor Inspectorate Body. Then, the safety engineer in charge investigates the incident or accident in order to identify its causes and propose interventions, by completing risk assessment tables manually in printed versions (Law 3850/2010, Articles 14, 15). The safety engineer’s actions aim at ensuring that repetition of the incident or the accident could be avoided in the future. The manually filled tables are in accordance with the Methodological guide for the assessment and prevention of occupational risks of the Hellenic Institute for Occupational Health and Safety [41] regarding the coding of hazards, and with the method described in the previous section [39] regarding the risk values. The safety engineer cooperates with the occupational doctor, when necessary. The above-mentioned data for every incident and accident are used as an input to the proposed DSS. For the efficient management of this information, the basic idea was to organize and incorporate the data into a database, which makes the information more usable and manageable in comparison with the printed forms. The information flow diagram of the proposed DSS is depicted in Figure 1. There are three major blocks in this diagram. Firstly, the block of recording and notifying incidents – accidents (upper left) which describes the process that the responsible personnel follow to record every incident and accident in the corresponding books, to fill in the incident – accident report and notify the authorities. Secondly, the block of risk assessment and investigation (lower left) which describes the process that the safety engineer and occupational doctor follow to identify OSH hazards, to express the corresponding risk values and to associate these hazards with every incident and accident through investigation and estimation of OSH interventions. These two blocks are the inputs to the proposed DSS model. The output is the third block which describes the results of the classification process, according to the trained models. The decision is based on the risk values of the different variables. The input of the model includes two sets of variables. The first set consists of 6 variables (F1 – F6) from the first block which are related to the employees. Their values come up from the incident or accident relative paper report. The second set consists of 77 variables (F7 – F83) from the second block. All 83 variables are summarized in Table III. The selection of these variables was based on [41]. The main categories of this set are workplace related variables relevant to the infrastructural building, machinery - equipment, electrical installation, hazardous substances, fire, chemical - physical - biological factors, and transversal - organizational risks. Each occupational event corresponds to one instance and was categorized into one of the following 4 classes: "Needlestick/Cut", "Falling", "Incident" and "Accident" according to their type. The class of "Incident" was divided into three different subclasses because of the frequency distribution of incidents in the hospital workplace. Needlesticks/cuts and falling events present the highest frequency, which is in accordance with [4]. For this reason, two separate classes were used for this kind of incidents, "Needlestick/Cut" and "Falling", respectively. The class named "Incident" was used to include all other kinds of incidents in the hospital workplace. The class "Accident" was used to include all kinds of accidents. The values for these variables express the OSH conditions and hazards that are related to the incident or accident, as they have been identified by the safety engineer. These values were Engineering, Technology & Applied Science Research Vol. 11, No. 3, 2021, 7262-7272 7265 www.etasr.com Koklonis et al.: Utilization of Machine Learning in Supporting Occupational Safety and Health … recorded according to the safety engineer’s reports and risk assessment tables after investigation of the corresponding incident or accident. Finally, after the completion of the corrective actions, all the above instances were assigned with updated risk values and were categorized into the class "Safety". Each new instance of "Safety" originated from one instance of "Needlestick/Cut" or "Falling" or "Incident" or "Accident". In this way, our dataset consists of 238 incident/accident instances and 238 safety instances (as described in section "Data"). More specifically, the values for the instances of "Safety" express the proposed measures or interventions to reduce risk at acceptable levels (R≤32 for accidents and R≤24 for incidents, according to the relative methodology). These values were recorded according to the post-investigation safety engineer's updated reports and risk assessment tables after up to two months for every incident or accident. During this period, the corrective actions had been completed and confirmed according to the safety engineer’s proposed measures. The output of the system is one of the above-mentioned 5 classes (Y1 – Y5) and expresses the type of the event that certain occupational conditions could lead to (Table III). As the predicted classes were a priori known, our model appertains to supervising learning algorithms. TABLE III. VARIABLES OF THE PROPOSED MODEL Variables Code Type Values Employee variables Age F1 Numeric Years Experience F2 Numeric Years Training F3 Nominal Yes, No Repetitive training every six months at least F4 Nominal Yes, No Relative PPE during the incident/accident F5 Nominal Yes, No Satisfaction for OSH climate F6 Numeric 1-5 Workplace variables relatively to infrastructural building Floors/stairs F7 Numeric 1-256 Workplace space F8 Numeric 1-256 Workplace height F9 Numeric 1-256 Workplace volume F10 Numeric 1-256 Doors/windows F11 Numeric 1-256 Space illumination F12 Numeric 1-256 Storage/lofts F13 Numeric 1-256 Uncovered gaps F14 Numeric 1-256 Obstacles F15 Numeric 1-256 Exits/escape routes F16 Numeric 1-256 Walls F17 Numeric 1-256 Shelves/dexions F18 Numeric 1-256 Ceilings F19 Numeric 1-256 Basements F20 Numeric 1-256 Walkways F21 Numeric 1-256 Roof F22 Numeric 1-256 Cleanliness F23 Numeric 1-256 Lack of access to exits or fire-extinguishing system F24 Numeric 1-256 Safety/escape signing F25 Numeric 1-256 Workplace variables relatively to machinery – equipment Maintenance protocol F26 Numeric 1-256 Lack of safety during usage F27 Numeric 1-256 Guards for avoiding the accidental start F28 Numeric 1-256 Guards from moving parts F29 Numeric 1-256 Ejectable particles F30 Numeric 1-256 CE signing F31 Numeric 1-256 Cutting works F32 Numeric 1-256 Lifting machinery F33 Numeric 1-256 Transport vehicles F34 Numeric 1-256 Ladders F35 Numeric 1-256 Pneumatic tools F36 Numeric 1-256 Elevators F37 Numeric 1-256 Other machinery F38 Numeric 1-256 Non-usage of PPE F39 Numeric 1-256 Pressure devices F40 Numeric 1-256 Access to facilities or equipment F41 Numeric 1-256 Workplace variables relatively to electrical installation Electrical installation F42 Numeric 1-256 Inappropriate usage of electrical installation F43 Numeric 1-256 Inappropriateness in explosive atmospheres F44 Numeric 1-256 Lack of safety during usage of electrical installation F45 Numeric 1-256 Lack of safety during maintenance of the electrical installation F46 Numeric 1-256 Hazardous substances related to equipment (generators, batteries, etc.) F47 Numeric 1-256 Workplace variables relatively to hazardous substances Toxic substances F48 Numeric 1-256 Caustic substances F49 Numeric 1-256 Corrosive substances F50 Numeric 1-256 Irritant substances F51 Numeric 1-256 Oxidizing substances F52 Numeric 1-256 Explosive substances F53 Numeric 1-256 Flammable material F54 Numeric 1-256 Appropriate cabinets F55 Numeric 1-256 Workplace variables relatively to fire Fire signing F56 Numeric 1-256 Banning smoking F57 Numeric 1-256 Banning flame F58 Numeric 1-256 Storage of flammable material F59 Numeric 1-256 Lack of fire protection systems F60 Numeric 1-256 Training in fire emergency plan F61 Numeric 1-256 Fire rxtinguishers F62 Numeric 1-256 Lack of training in fire security F63 Numeric 1-256 Workplace variables relatively to chemical factors Dust F64 Numeric 1-256 Fibres asbestos F65 Numeric 1-256 Smoke/steam F66 Numeric 1-256 Particles F67 Numeric 1-256 Other substances F68 Numeric 1-256 Dipping/splashing F69 Numeric 1-256 Workplace variables relatively to physical factors Noise F70 Numeric 1-256 Vibrations F71 Numeric 1-256 Radiation F72 Numeric 1-256 Illumination F73 Numeric 1-256 Microclimate F74 Numeric 1-256 Workplace variables relatively to biological factors Bacteria F75 Numeric 1-256 Fungi F76 Numeric 1-256 Viruses F77 Numeric 1-256 Other factors F78 Numeric 1-256 Workplace variables relatively to transversal risks Settlement - manual handling F79 Numeric 1-256 Psychological factors F80 Numeric 1-256 Provision - intervention programs F81 Numeric 1-256 Ergonomy F82 Numeric 1-256 Hardship conditions F83 Numeric 1-256 OUTPUT Falling Y1 Nominal Yes, No Needlestick/cut Y2 Nominal Yes, No Incident Y3 Nominal Yes, No Accident Y4 Nominal Yes, No Safety Y5 Nominal Yes, No Engineering, Technology & Applied Science Research Vol. 11, No. 3, 2021, 7262-7272 7266 www.etasr.com Koklonis et al.: Utilization of Machine Learning in Supporting Occupational Safety and Health … Fig. 1. The information flow diagram of the proposed model. E. Prediction Models and Classification ML is concerned with the development of intelligent systems that can learn from data. These systems can be used to classify an object into a correct class based on features characterizing the object. Actually, the algorithms are searching for patterns in data. The ML algorithms that implement pattern classification are known as classifiers. There are several classifiers, each one of them having certain limitations and advantages [42]. In this study, in order to develop the proposed DSS model, we employed and tested 4 classifiers: The Naive Bayesian (NB) classifier [43], the Bayesian Network (ΒΝ), the k-Nearest Neighbors (k-NN) classifier, and the Multilayer Perceptron (MLP) network. The choice was based on the fact that the use of conditional probabilities offers a practical metric as it can model occupational health in a class conditional way. This formulation helps us elucidate class conditional variables of interest. NB classification is one of the most popular data mining algorithms. Its efficiency comes from the assumption of attribute independence [40, 41]. It is a simple probabilistic classifier based on Bayes’ theorem and it uses the method of maximum similarity. It can work in complex real-world situations and it requires a small amount of training data [42, 43]. BN is an effective tool to demonstrate complicated relationships and a powerful tool for knowledge representation and reasoning. It is also suitable for small and incomplete datasets. It is particularly convenient for modeling causal relationships among variables (with a combination of different sources of information) and handling of uncertainty for decision analysis [13]. It is a model of probabilistic inference defined by a triplet (X, G, P), where X = (X1, X2, ⋯, Xn) is the set of factors, G is a directed acyclic graph, and P is a joint probability distribution. The nodes for G are labeled with the elements of X and the arcs of G represent the probabilistically conditional dependence relationship between nodes. Each factor is associated with a conditional probabilistic table that defines the probability of each state for that factor [10, 44]. The k-NN is a simple and intuitive classifier. It classifies a sample based on the k-closest training samples in the feature space, by computing the distances between the new sample and the samples of the training set. According to these distances it classifies the sample to the class of the closest training samples (neighbors) [39, 45]. An MLP is a class of feedforward ANN and one of the most widely used. It consists of the input layer, which receives external inputs, one or more hidden layers, and an output layer. Each layer includes one or more neurons linked between successive layers. The Back-Propagation (BP) algorithm was used for the training of the model. In this technique, information about the errors of the network on known data is propagated backwards, the connections are adjusted, and the error is minimized [49]. All Weka parameters for each of the above models are summarized in the Appendix. The classifiers were designed to classify the cases into one of the afore-mentioned 5 classes. Each classifier takes as inputs the data from the reports of each event and outputs its classification group, providing in this way a prediction regarding the actual occupational risk of hazards of each hospital worker in their certain hospital workplace, and under certain OSH conditions. F. Training, Validation, and Performance Evaluation The train/test split method [50] was used as a validation technique for training. The available dataset was divided into 2 subsets: the training set (80% of cases - 379 cases) which was used to train the classifiers, and the test set (20% of cases - 97 cases) which was used to evaluate their predictive performance (previously unseen data). The 2 sets were properly stratified so that the distribution of classes in each subset is approximately Engineering, Technology & Applied Science Research Vol. 11, No. 3, 2021, 7262-7272 7267 www.etasr.com Koklonis et al.: Utilization of Machine Learning in Supporting Occupational Safety and Health … the same as that in the initial dataset. Thus, each subset contains representative samples of the same larger population. To minimize sampling bias, k-fold cross-validation was also used for the whole dataset for training and test (k=10). In this technique, k dataset divisions are used for equivalent separation of the dataset into folds, k-1 folds are used for training and one- fold is used for test. The process is repeated k times. Through this process, each one of the available data is used one time as a member of the test set and k-1 times as a member of the training set [48, 49]. Data entry was done in a spreadsheet file, since this format can be easily used in daily OSH practice by the safety engineer and hospital staff. Files were saved in .csv type. Inserting and processing files to Weka needed their conversion to the .arff type. Considering that incidents and accidents are stochastic events, that are very differentiated in detail, a certain methodology has been used for risk assessment (including a certain set of risk variables that should be investigated in every case). Feature selection method was not applied. In order to evaluate the performance of the proposed DSS model compared to the safety engineer’s reports involved in this study, we calculated the recall (sensitivity), and the Area Under the Receiver Operating Characteristic (ROC) Curve of the models on the basis of detecting the 5 output classes. The ROC curve is a standard validation approach for probabilistic classifiers [50, 51]. A ROC curve is a two-dimensional graph in which the True Positive Rate (TPR) (sensitivity or recall) is plotted on the y-axis and the False Positive Rate (FPR or 1 - specificity) is plotted on the x-axis, as a function of the threshold u, ranging from 0 to 1 [55]. The Area Under the Curve (AUC) can be interpreted as a measure of overall accuracy, varying from 0.5 (random guess) to 1 (perfect performance) [56]. The recall metric answers the question of what proportion of actual positives is correctly classified. Its choice was based on the fact that recall measure penalizes false negatives but not false positives [57]. In an OSH approach, we want to capture as many positives as possible (we want to capture any contingent incident or accident even if we are not totally sure). These metrics can be expressed according to the number of True Positives (TP), True Negatives (TN), False Positives (FP), and False Negatives (FN) classifications through the following equations [58]: Recall = Sensitivity = TPR = TP/(TP+FN) (1) 1 – Specificity = FPR = FP/(TN+FP) (2) The confusion matrices obtained through testing the classifiers on the training and the test sets were also used for the predictive performance of the models. Each column in a confusion matrix represents the instances that are classified in one class and each row represents the instances in their real class. The error rate of the classified instances was calculated for each class. III. RESULTS A. Performance Comparison of the Algorithms Figure 2 presents the performance of the different models compared to the safety engineer’s reports involved in this study, in terms of mean recall, AUC, error rate, and the overall accuracy of each model (percentage of the correctly classified instances). The detailed table with the predictive performance of the models for each of the 5 classes is included in the Appendix. Fig. 2. The predictive performance of the models. The overall classification accuracies of the developed models on the training set, test set, and 10-fold cross set were 97%, 77%, and 93% respectively. The MLP showed the highest accuracy for the test set and the 10-fold cross set (94.85% and 96.01%, respectively). The MLP and the BN showed the lower average Error Rate (ER) on the test set (10.71% and 14.35%, respectively) and on the 10-fold cross set (6.99% and 8.82%, respectively). The MLP showed the highest sensitivity (recall) for the test set and the 10-fold cross set (89% and 93%, respectively). In addition, this model showed the most highly balanced combination of sensitivity results on the 10-fold cross set (see Figure 2). The average AUC was high for both the test and the 10-fold cross sets (0.91 and 0.98, respectively). The MLP showed the highest AUC value for the test set and the 10-fold cross set (1.00 and 0.99, respectively). It must be noted that all of the developed models showed high AUC (range: 0.83-1.00 on the test set and range: 0.95-0.99 on the 10-fold cross set). B. Practical Implications Safety engineers carry out inspections of workplaces, equipment, and OSH conditions, during their visits. The safety engineers complete risk assessment tables manually in printed versions to record the results of their inspections. This process is repeated after every incident or accident. In this case, the assessment focuses on causes that are related to the specific incident or accident. The developed DSS model aims to Engineering, Technology & Applied Science Research Vol. 11, No. 3, 2021, 7262-7272 7268 www.etasr.com Koklonis et al.: Utilization of Machine Learning in Supporting Occupational Safety and Health … support the safety engineer with evidence for the risk values of hazards and recommended actions. The MLP and the BN models (10-fold cross set) were used in the DSS, as they were the models with the best predictive performance. A further elaboration of the solution methodology including the proposed model is depicted in Figure 3. The utilization of the DSS model can start in every case of incident, accident or audit. The associated data files should be then updated by the responsible personnel and the safety engineer. Coding these event reports to instances in risk assessment tables should follow in order to update the database. Afterwards, the re-trained or the existing models can be used to support with evidence any OSH decision (efficiency of OSH intervention, employee’s movement, appropriateness of workplace or equipment etc.). Fig. 3. A solution methodology including the proposed model. In order to evaluate the proposed DSS model, a case study was carried out. The new case study test set resulted from a regular process. A typical instance of this process is the following: The safety engineers of the hospital, during the audit in specific departments and workplaces (01/2020), proceeded in identifying hazards of possible incidents. In parallel, they proposed specific corrective actions for the limitation or the elimination of these hazards by inserting the desired risk values for the model’s variables (assuming that corrective actions have been taken). The basic safety engineers' observations, during the one-month period, and the corresponding incidents were: • In the archive room, an employee used a chair instead of an appropriate ladder for catching files from shelves. Moreover, there was a lack of tidiness in the workplace and several obstacles prevented the free movement (possible incident: "Falling"). • When the safety engineer came out of the archive room, he observed that even though the floor had just been wiped out, the sign "Caution: slippery floor" had not been placed (possible incident: "Falling"). • During their visit to the pathologist laboratory, the safety engineer found out that there was no preventive nor corrective maintenance file for one microtome according to its operation manual. In a conversation with an employee in this laboratory, the latter expressed complains about OSH conditions (possible incident: "Needlestick/Cut"). • During the visit in the surgery room, a surgical nurse complained about excessive workload and lack of training (possible incident: "Needlestick/Cut"). • In the laundry room, and specifically in the bedsheet folder machine, the proper guard of the moving part was missing (possible incident: "Incident"). • A member of the waiting staff used an unfunctional trolley and lifted many food trays simultaneously to carry them to each patient room (possible incident: "Incident"). For each one of these instances, a "Safety" class was also recorded, relatively to the corresponding measures – corrective actions. The dataset that came up from this scenario was used as a new case study test set for the proposed DSS model. The total 12 instances were correctly classified by the DSS model. Only one instance was incorrectly classified by BN model. Even in this case, the result was not classified as a "Safety" instance but one "Falling" was classified as "Incident". This result was expected since this kind of incident (falling from chair) was not included in the training set of the used model. To evaluate further the usability of the proposed system, the DSS model was used to support human resource management for an internal employee movement decision approached from an OSH perspective. In January 2020, 10 hospital workers were moved in different departments to cover the respective demands. The safety engineer inspected the destination workplaces and completed the relevant risk assessment tables manually in printed versions. These tables were classified to 9 instances of "Safety" and 1 instance of "Incident". The proposed interventions for the case of "Incident" included training on the work practice (waiting staff duties) and corrective actions in equipment (lighter trays and functional trolleys). The movement was done after the completion of these measures. The "Safety" state was confirmed by the safety engineer one week after the movement. IV. DISCUSSION AND CONCLUSIONS The aim of this study was to create a DSS model for the prediction of incidents and accidents in the hospital workplace. This model is based on the incident and accident data from reports of personnel and safety engineer investigations and audits in the hospital workplace. The preliminary results suggest that the proposed system may support OSH decisions by predicting incidents and accidents, and also by assessing the effectiveness of the OSH interventions. Through the evaluation of the different classification models, we concluded that MLP on the 10-fold cross set had the most balanced predictive performance in terms of overall accuracy (96.01%), ER (6.99%), AUC (0.990), and recall (93%), rendering this model reliable as a tool in OSH analysis in accordance with [13–17, 25]. Furthermore, the high predictive performance of all the developed models is notable (AUC range: 0.95-0.99 and average accuracy: 93% on the 10-fold cross set). A major benefit of the proposed DSS model is that it utilizes processes and resources that already exist and are Engineering, Technology & Applied Science Research Vol. 11, No. 3, 2021, 7262-7272 7269 www.etasr.com Koklonis et al.: Utilization of Machine Learning in Supporting Occupational Safety and Health … applied in the daily hospital working life. The exploitation of these printed data records for incidents and accidents, and safety engineer audits and investigation reports are essential for OSH management. The utilization is related with identifying hazards that can lead to incident or accident, reducing the occupational risk, evaluating the efficacy of OSH measures – interventions, supporting management decisions (employee’s recruitment or movement to a different department, etc.) from the OSH perspective, improving OSH conditions and reducing the overall cost of injuries [23, 34]. The proposed change is related to the transfer of coded paper records into a DSS which is promising for OSH decisions. This process could result in faster and more accurate predictions according to OSH injuries and interventions, in accordance with [9, 34, 62], emphasizing the significance of quantitative analysis of empirical injury data in the safety science [25]. Another advantage of the proposed model is the fact that it can be adjusted to various working sectors, depending on the training sets and the existing records of incidents and accidents in each of them. For its application in other hospitals, the system can be easily updated with new datasets, according to the status of each hospital (incidents and accidents that take place and OSH conditions). The frequency of updating or retraining the classification models depends on the type and the frequency of incidents and accidents. To maximize the usability and the practicality of the DSS model, a certain approach for the risk values of various factors should be followed. For instance, the classes "Incident" and "Accident" may be confused during initial workplace risk assessment of a random scenario, because of the lack of knowledge about the necessity of the required medical leave and its length. The dataset for accidents needs to be improved, however, any kind of incident should be interpreted as an alert which can anytime lead to an accident. Incidents are very important for OSH, in the sense of "near miss", as a situation that could lead to an accident, particularly in the case of repeated incidents of the same kind [59, 60]. In addition, certain incidents can evolve in occupational diseases (like blood-borne diseases because of needlestick/cuts and chronic musculoskeletal disorders) [61]. An occupational disease according to the domestic legislative framework is equivalent to an occupational accident [62]. The experience of the safety engineer and the accordance with the corresponding methodology are very important key factors for this assessment. According to the proposed DSS model, an alert for taking corrective actions is related to all the classes that are different from the "Safety" one. The proposed measures and corrective actions target the elimination of hazards. The elimination is not always possible in all cases. Instead of elimination, limitation of hazards to acceptable levels should be the next step forward. The relation between cost and optimal level of risk is formed according to the ALARP (As Low As Reasonably Practicable) theory [60, 61]. In other words, entering the minimum values of risk for "Safety" class has no practical meaning since total lack of risk cannot happen in real conditions, both in terms of cost and inherent risk (a risk that is related to the nature of work). On the contrary, "armor logic" should not be applied during risk assessment by inserting maximum values during identifying and highlighting a hazard. In any case, the interesting point related to the values of risk variables is the threshold where the value of a specific risk factor leads to the transition from one class to another. A field for further research is the extensibility of the proposed DSS model and its development as a user-friendly web application platform. The development of the proposed DSS as a web app could make it more usable and accessible. Furthermore, the directions of this extensibility can include the facilitation of data entry to the system through Natural Language Process (NLP) (because there are several handwritten records) and the interconnection between the proposed DSS model and other tools/applications that will improve the efficiency of the proposed measures and corrective actions or/and the validity of risk assessment (scales for musculoskeletal disorders like key item method [66], working stress questionnaires, optimal allocation of OSH capital investment, etc.). ACKNOWLEDGMENT We would like to thank the Administration, the Scientific Council, and the personnel of Metaxa Cancer Hospital of Piraeus, Greece for their approval and participation in the implementation of the proposed model. APPENDIX WEKA PARAMETERS FOR CLASSIFICATION ALGORITHMS Model Weka parameters NB batch size = 100 debug = False kernel estimator = False doNotCheckCapabilities = False useSupervisedDiscretization = False BN batch size = 100 debug = False doNotCheckCapabilities = False estimator = Sample Estimator – A 0.5 searchAlgorithm = K2-P1-S BAYES useADTree = False k-NN k = 9 batch size = 100 crossValidate = False debug = False distanceWeighting = No doNotCheckCapabilities = False meanSquared = False nearestNeighbourSearchAlgorithm = LinearNNSearch - Euclidean distance First-Last MLP autoBuild = True batch size = 100 debug = False decay = False doNotCheckCapabilities = False hiddenLayers = a, where a = (number of attributes + number of classes) /2 = 44 learning rate = 0.3 momentum = 0.2 nominalToBinaryFilter = True normalizeAttributes = True normalizeNumericClass = True reset = True seed = 0 trainingTime (number of epochs) = 500 validationSetSize = 0 validation threshold = 20 Engineering, Technology & Applied Science Research Vol. 11, No. 3, 2021, 7262-7272 7270 www.etasr.com Koklonis et al.: Utilization of Machine Learning in Supporting Occupational Safety and Health … PREDICTIVE PERFORMANCE OF THE DIFFERENT MODELS VM a MA a (%) Performance Evaluation Metrics MV a Accident Falling Incident Cutting Safety NB Train set (80%) (379) 93.40 ER (%) 0.00 0.00 23.40 3.70 5.26 6.47 Recall 1.000 1.000 0.766 0.963 0.947 0.94 AUC 1.000 0.995 0.968 0.982 0.986 0.99 Test set (20%) (97) 71.13 ER (%) 60.00 0.00 16.67 3.57 47.92 25.63 Recall 0.600 1.000 0.833 0.964 0.521 0.78 AUC 0.753 0.992 0.862 0.964 0.991 0.91 10-fold cross (476) 92.23 ER (%) 8.69 15.00 28.81 3.67 4.20 12.07 Recall 0.913 0.850 0.712 0.963 0.958 0.88 AUC 0.995 0.944 0.890 0.969 0.966 0.95 k-NN Train set (80%) (379) 95.78 ER (%) 0.00 37.5 21.28 0.00 0.00 11.76 Recall 1.000 0.625 0.787 1.000 1.000 0.88 AUC 1.000 0.986 0.990 0.997 1.000 0.99 Test set (20%) (97) 70.10 ER (%) 0.00 50.00 25.00 3.57 47.92 25.30 Recall 1.000 0.500 0.750 0.964 0.521 0.75 AUC 0.875 0.985 0.983 0.997 0.751 0.92 10-fold cross (476) 92.86 ER (%) 0.00 50.00 37.29 1.47 0.00 17.75 Recall 1.000 0.500 0.627 0.985 1.000 0.82 AUC 1.000 0.981 0.966 0.994 1.000 0.99 BN Train set (80%) (379) 97.10 ER (%) 0.00 18.75 10.64 2.78 0.00 6.43 Recall 1.000 0.813 0.894 0.972 1.000 0.94 AUC 1.000 0.993 0.987 0.996 1.000 1.00 Test set (20%) (97) 72.16 ER (%) 0.00 0.00 16.67 7.14 47.92 14.35 Recall 1.000 1.000 0.833 0.929 0.521 0.86 AUC 0.875 1.000 0.810 0.900 0.570 0.83 10-fold cross (476) 91.18 ER (%) 0.00 15.00 13.56 5.88 9.66 8.82 Recall 1.000 0.850 0.864 0.941 0.903 0.91 AUC 0.975 0.985 0.979 0.994 0.913 0.97 MLP Train set (80%) (379) 99.74 ER (%) 0.00 0.00 2.12 0.00 0.00 0.42 Recall 1.000 1.000 0.979 1.000 1.000 1.00 AUC 1.000 1.000 0.984 0.998 1.000 1.00 Test set (20%) (97) 94.85 ER (%) 0.00 25.00 25.00 3.57 0.00 10.71 Recall 1.000 0.750 0.750 0.964 1.000 0.89 AUC 1.000 1.000 0.992 0.996 1.000 1.00 10-fold cross (476) 96.01 ER (%) 0.00 10.00 22.03 2.94 0.00 6.99 Recall 1.000 0.900 0.780 0.971 1.000 0.93 AUC 0.999 0.996 0.971 0.991 1.000 0.99 a. VM: Validation Model, MA: Model Accuracy, MV: Mean Value. REFERENCES [1] D. Elsler, J. Takala, and J. Remes, "An international comparison of the cost of work-related accidents and illnesses," European Agency for Safety and Health at Work, 2017. [2] "Good OSH is good for business," EU-OSHA. https://osha.europa.eu/el/themes/good-osh-is-good-for-business (accessed May 26, 2021). [3] K. Cosic, S. Popovic, M. Sarlija, I. Kesedzic, and T. Jovanovic, "Artificial intelligence in prediction of mental health disorders induced by the COVID-19 pandemic among health care workers," Croatian Medical Journal, vol. 61, no. 3, pp. 279–288, Jun. 2020, https://doi.org/ 10.3325/cmj.2020.61.279. [4] K. Dimoulas, G. Kollias, C. Bagavos, and T. Ganetaki, Work and health problems in Greece. Athens, Greece: INE-GSEE Work Institute, 2015. [5] Hospital Inventory 2018. Athens, Greece: Hellenic Statistical Authority, 2020. [6] Survey on Accidents at Work, 2018. Athens, Greece: Hellenic Statistical Authority, 2020. [7] S. Sarkar and J. Maiti, "Machine learning in occupational accident analysis: A review using science mapping approach with citation network analysis," Safety Science, vol. 131, p. 104900, Nov. 2020, https://doi.org/10.1016/j.ssci.2020.104900. [8] F. Siddiqui, M. A. Akhund, A. H. Memon, A. R. Khoso, and H. U. Imad, "Health and Safety Issues of Industry Workmen," Engineering, Technology & Applied Science Research, vol. 8, no. 4, pp. 3184–3188, Aug. 2018, https://doi.org/10.48084/etasr.2138. [9] S. Y. Far, R. Mirzaei, M. B. Katrini, M. Haghshenas, and Z. Sayahi, "Assessment of Health, Safety and Environmental Risks of Zahedan City Gasoline Stations," Engineering Technology & Applied Science Research, vol. 8, no. 2, pp. 2689–2692, 2018. [10] S. J. Bertke, A. R. Meyers, S. J. Wurzelbacher, J. Bell, M. L. Lampl, and D. Robins, "Development and evaluation of a Naïve Bayesian model for coding causation of workers’ compensation claims," Journal of Safety Research, vol. 43, no. 5, pp. 327–332, Dec. 2012, https://doi.org/ 10.1016/j.jsr.2012.10.012. [11] G. Nanda, K. M. Grattan, M. T. Chu, L. K. Davis, and M. R. Lehto, "Bayesian decision support for coding occupational injury data," Journal of Safety Research, vol. 57, pp. 71–82, Jun. 2016, https://doi.org/ 10.1016/j.jsr.2016.03.001. [12] J. E. M. E. Martin, J. T.-G. Taboada-Garcia, S. G. Gerassis, A. S. Saavedra, and R. Martinez-Alegria, "Bayesian network analysis of accident risk in information-deficient scenarios," Revista de la Construcción. Journal of Construction, vol. 16, no. 3, pp. 439–446, 2017, https://doi.org/10.7764/RDLC.16.3.439. [13] A. P. C. Chan, F. K. W. Wong, C. K. H. Hon, and T. N. Y. Choi, "A Bayesian Network Model for Reducing Accident Rates of Electrical and Mechanical (E&M) Work," International Journal of Environmental Research and Public Health, vol. 15, no. 11, Nov. 2018, Art. no. 2496, https://doi.org/10.3390/ijerph15112496. [14] L. Sanmiquel, M. Bascompta, J. M. Rossell, H. F. Anticoi, and E. Guash, "Analysis of Occupational Accidents in Underground and Surface Mining in Spain Using Data-Mining Techniques," International Journal of Environmental Research and Public Health, vol. 15, no. 3, Mar. 2018, Art. no. 462, https://doi.org/10.3390/ijerph15030462. [15] A. Soltanzadeh, I. Mohammadfam, S. Mahmoudi, B. A. Savareh, and A. M. Arani, "Analysis and forecasting the severity of construction accidents using artificial neural network," Safety promotion and injury prevention (Tehran), vol. 4, no. 3, pp. 185–192, 2016. [16] D. A. Patel and K. N. Jha, "Neural Network Approach for Safety Climate Prediction," Journal of Management in Engineering, vol. 31, no. 6, Nov. 2015, Art. no. 05014027, https://doi.org/10.1061/ (ASCE)ME.1943-5479.0000348. [17] A. M. Abubakar, H. Karadal, S. W. Bayighomog, and E. Merdan, "Workplace injuries, safety climate and behaviors: application of an artificial neural network," International Journal of Occupational Safety and Ergonomics, vol. 26, no. 4, pp. 651–661, Oct. 2020, https://doi.org/ 10.1080/10803548.2018.1454635. [18] F. A. Moayed and R. L. Shell, "Application of Artificial Neural Network Models in Occupational Safety and Health Utilizing Ordinal Variables," The Annals of Occupational Hygiene, vol. 55, no. 2, pp. 132–142, Mar. 2011, https://doi.org/10.1093/annhyg/meq079. [19] I. Mohammadfam, A. Soltanzadeh, A. Moghimbeigi, and B. A. Savareh, "Use of Artificial Neural Networks (ANNs) for the Analysis and Modeling of Factors That Affect Occupational Injuries in Large Construction Industries," Electronic Physician, vol. 7, no. 7, pp. 1515– 1522, Nov. 2015, https://doi.org/10.19082/1515. [20] S. Sarkar, S. Vinay, R. Raj, J. Maiti, and P. Mitra, "Application of optimized machine learning techniques for prediction of occupational accidents," Computers & Operations Research, vol. 106, pp. 210–224, Jun. 2019, https://doi.org/10.1016/j.cor.2018.02.021. [21] J. Bao, J. Johansson, and J. Zhang, "An Occupational Disease Assessment of the Mining Industry’s Occupational Health and Safety Engineering, Technology & Applied Science Research Vol. 11, No. 3, 2021, 7262-7272 7271 www.etasr.com Koklonis et al.: Utilization of Machine Learning in Supporting Occupational Safety and Health … Management System Based on FMEA and an Improved AHP Model," Sustainability, vol. 9, no. 1, Jan. 2017, Art. no. 94, https://doi.org/ 10.3390/su9010094. [22] H. R. S. A. Mard, A. Estiri, P. Hadadi, and M. S. A. Mard, "Occupational risk assessment in the construction industry in Iran," International Journal of Occupational Safety and Ergonomics, vol. 23, no. 4, pp. 570–577, Oct. 2017, https://doi.org/10.1080/10803548. 2016.1264715. [23] L. Comberti, M. Demichela, G. Baldissone, G. Fois, and R. Luzzi, "Large Occupational Accidents Data Analysis with a Coupled Unsupervised Algorithm: The S.O.M. K-Means Method. An Application to the Wood Industry," Safety, vol. 4, no. 4, Dec. 2018, Art. no. 51, https://doi.org/10.3390/safety4040051. [24] N. D. Nath, T. Chaspari, and A. H. Behzadan, "Automated ergonomic risk monitoring using body-mounted sensors and machine learning," Advanced Engineering Informatics, vol. 38, pp. 514–526, Oct. 2018, https://doi.org/10.1016/j.aei.2018.08.020. [25] F. Davoudi Kakhki, S. A. Freeman, and G. A. Mosher, "Utilization of Machine Learning in Analyzing Post-incident State of Occupational Injuries in Agro-Manufacturing Industries," in Advances in Safety Management and Human Performance, P. M. Arezes and R. L. Boring, Eds. New York, NY, USA: Springer, 2020, pp. 3–9. [26] S. D. Mwmc et al., "Ethical Considerations of Using Machine Learning for Decision Support in Occupational Health: An Example Involving Periodic Workers’ Health Assessments.," Journal of Occupational Rehabilitation, vol. 30, no. 3, pp. 343–353, Sep. 2020, https://doi.org/10. 1007/s10926-020-09895-x. [27] F. Ladstatter, E. Garrosa, B. Moreno-Jimenez, V. Ponsoda, J. M. R. Aviles, and J. Dai, "Expanding the occupational health methodology: A concatenated artificial neural network approach to model the burnout process in Chinese nurses," Ergonomics, vol. 59, no. 2, pp. 207–221, Feb. 2016, https://doi.org/10.1080/00140139.2015.1061141. [28] Y.-H. Kim and M.-H. Jung, "Effect of occupational health nursing practice on musculoskeletal pains among hospital nursing staff in South Korea," International Journal of Occupational Safety and Ergonomics, vol. 22, no. 2, pp. 199–206, Apr. 2016, https://doi.org/10.1080/ 10803548.2015.1078046. [29] A. Fonseca, I. Abreu, M. J. Guerreiro, C. Abreu, R. Silva, and N. Barros, "Indoor Air Quality and Sustainability Management—Case Study in Three Portuguese Healthcare Units," Sustainability, vol. 11, no. 1, Jan. 2019, Art. no. 101, https://doi.org/10.3390/su11010101. [30] S. Lin, N. Chaiear, J. Khiewyoo, B. Wu, and N. P. Johns, "Preliminary Psychometric Properties of the Chinese Version of the Work-Related Quality of Life Scale-2 in the Nursing Profession," Safety and Health at Work, vol. 4, no. 1, pp. 37–45, Mar. 2013, https://doi.org/10.5491/ SHAW.2013.4.1.37. [31] W. Turnberg and W. Daniell, "Evaluation of a healthcare safety climate measurement tool," Journal of Safety Research, vol. 39, no. 6, pp. 563– 568, Jan. 2008, https://doi.org/10.1016/j.jsr.2008.09.004. [32] A. K. Celik, E. Oktay, and K. Cebi, "Analysing workplace violence towards health care staff in public hospitals using alternative ordered response models: the case of north-eastern Turkey," International Journal of Occupational Safety and Ergonomics, vol. 23, no. 3, pp. 328– 339, Jul. 2017, https://doi.org/10.1080/10803548.2017.1316612. [33] M. Stefanovic, D. Tadic, M. Djapan, and I. Macuzic, "Software for Occupational Health and Safety Risk Analysis Based on a Fuzzy Model," International Journal of Occupational Safety and Ergonomics, vol. 18, no. 2, pp. 127–136, Jan. 2012, https://doi.org/10.1080/ 10803548.2012.11076923. [34] A. Sklad, "Assessing the impact of processes on the Occupational Safety and Health Management System’s effectiveness using the fuzzy cognitive maps approach," Safety Science, vol. 117, pp. 71–80, Aug. 2019, https://doi.org/10.1016/j.ssci.2019.03.021. [35] V. Ravuri et al., "Group-specific models of healthcare workers’ well- being using iterative participant clustering," in Second International Conference on Transdisciplinary AI, Irvine, CA, USA, Sep. 2020, pp. 115–118, https://doi.org/10.1109/TransAI49837.2020.00026. [36] K. Vallmuur, "Machine learning approaches to analysing textual injury surveillance data: A systematic review," Accident Analysis & Prevention, vol. 79, pp. 41–49, Jun. 2015, https://doi.org/10.1016/ j.aap.2015.03.018. [37] "Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation) (Text with EEA relevance)." Publications Office of the European Union, Apr. 27, 2016. [38] "Home - Weka Wiki," The University of Waikato. https://waikato.github. io/weka-wiki/ (accessed May 27, 2021). [39] "Memorandum on Occupational Risk Assessment." Directorate-General for Employment in Labor Relations and Social Affairs (DG V) of the European Union, 1997. [40] "Occupational Risk Assessment." Technical Chamber of Greece, 2001. [41] S. Drivas, K. Zorba, and T. Koukoulaki, Methodological guide for the assessment and prevention of occupational risk. Athens, Greece: Hellenic Institute of Occupational Health and Safety, 2000. [42] P. Bountris et al., "An Intelligent Clinical Decision Support System for Patient-Specific Predictions to Improve Cervical Intraepithelial Neoplasia Detection," BioMed Research International, vol. 2014, 2014, https://doi.org/10.1155/2014/341483. [43] S. Chen, G. I. Webb, L. Liu, and X. Ma, "A novel selective naïve Bayes algorithm," Knowledge-Based Systems, vol. 192, Mar. 2020, Art. no. 105361, https://doi.org/10.1016/j.knosys.2019.105361. [44] K. Koutroumbas and S. Theodoridis, Pattern Recognition, 4th ed. London, UK: Elsevier, 2008. [45] M. A. Burhanuddin, R. Ismail, N. Izzaimah, A. A.-J. Mohammed, and N. Zainol, "Analysis of Mobile Service Providers Performance Using Naive Bayes Data Mining Technique," International Journal of Electrical & Computer Engineering, vol. 8, no. 6, pp. 5153–5161, 2018. [46] R. Shinde, S. Arjun, P. Patil, and J. Waghmare, "An Intelligent Heart Disease Prediction System Using K-Means Clustering and Naïve Bayes Algorithm," International Journal of Computer Science and Information Technologies, vol. 6, no. 1, pp. 637–639, 2015. [47] S. J. Russell, P. Norvig, S. Russell, and Russell, Artificial intelligence: A Modern Approach. New Jersey, USA: Prentice Hall, 2010. [48] D. Michie, D. J. Spiegelhalter, C. C. Taylor, and J. Campbell, Eds., Machine learning, neural and statistical classification. New York, NY, USA: Ellis Horwood, 1995. [49] R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification, 2nd ed. Hoboken New Jersey, USA: Wiley, 2001. [50] I. H. Witten, E. Frank, and M. A. Hall, Data Mining: Practical Machine Learning Tools and Techniques, 3rd ed. Burlington, MA, USA: Morgan Kaufmann, 2011. [51] S. M. Weiss and C. A. Kulikowski, Computer Systems That Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning and Expert Systems. San Mateo, CA, USA: Morgan Kaufmann, 1990. [52] B. D. Ripley, Pattern Recognition and Neural Networks. Cambridge, MA, USA: Cambridge University Press, 2008. [53] A. Nola et al., "Occupational accidents in temporary work," La Medicina Del Lavoro, vol. 92, no. 4, pp. 281–285, Aug. 2001. [54] T. Fawcett, "An introduction to ROC analysis," Pattern Recognition Letters, vol. 27, no. 8, pp. 861–874, Jun. 2006, https://doi.org/10.1016/ j.patrec.2005.10.010. [55] J. López-García, M. Saldaña, S. Herrero, and J. Gutiérrez, "Bayesian network analysis of the influence of labour market variables on accident rates of workers in Spain," in Risk, Reliability and Safety: Innovating Theory and Practice: Proceedings of ESREL 2016, Glasgow, UK, Sep. 2016, pp. 1660–1667, https://doi.org/10.1201/9781315374987-250. [56] J. A. Hanley and B. J. McNeil, "The meaning and use of the area under a receiver operating characteristic (ROC) curve.," Radiology, vol. 143, no. 1, pp. 29–36, Apr. 1982, https://doi.org/10.1148/radiology.143. 1.7063747. [57] S. Alvarez, "An exact analytical relation among recall, precision, and classification accuracy in information retrieval," Boston College, Engineering, Technology & Applied Science Research Vol. 11, No. 3, 2021, 7262-7272 7272 www.etasr.com Koklonis et al.: Utilization of Machine Learning in Supporting Occupational Safety and Health … Boston, MA, USA, Technical Report BCCS-02-01 (2002): 1-22, Jan. 2002. [58] R. Burduk, "Classification Performance Metric for Imbalance Data Based on Recall and Selectivity Normalized in Class Labels," arXiv:2006.13319 [cs, stat], Jun. 2020, Accessed: May 26, 2021. [Online]. Available: http://arxiv.org/abs/2006.13319. [59] O. Ug, S. Wd, S. M, and P. A, "Improve Process Safety with Near-Miss Analysis," Chemical Engineering Progress, vol. 109, no. 5, pp. 20–27, 2013. [60] M. G. Gnoni, S. Andriulo, G. Maggio, and P. Nardone, "‘Lean occupational’ safety: An application for a Near-miss Management System design," Safety Science, vol. 53, pp. 96–104, Mar. 2013, https://doi.org/10.1016/j.ssci.2012.09.012. [61] E. Alexopoulos, Greek and International experience of accidents at work and occupational diseases of hospital employees. Guide to Occupational Risk Assessment and Prevention. Athens, Greece: EL.Y.A., 2007. [62] "Circular 45/24-06-2010: Occupational Accident 2010." Social Security Institution, 2010. [63] G. Reniers and T. Brijs, "An Overview of Cost-benefit Models/Tools for Investigating Occupational Accidents," Chemical Engineering Transactions, vol. 36, pp. 43–48, Apr. 2014, https://doi.org/10.3303/ CET1436008. [64] Health and Safety Executive, "Risk management: Expert guidance - ALARP at a glance." https://www.hse.gov.uk/managing/theory/ alarpglance.htm (accessed May 26, 2021). [65] S. J. Bertke, A. R. Meyers, S. J. Wurzelbacher, J. Bell, M. L. Lampl, and D. Robins, "Development and evaluation of a Naïve Bayesian model for coding causation of workers’ compensation claims," Journal of Safety Research, vol. 43, no. 5, pp. 327–332, Dec. 2012, https://doi.org/ 10.1016/j.jsr.2012.10.012. [66] K. Koklonis, A. Anastasiou, O. Petropoulou, S. Pitoglou, D. Iliopoulou, and D. Koutsouris, "Utilizing Key Item Method to Manage Musculoskeletal Disorders in a Hospital Workplace," in 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Berlin, Germany, Jul. 2019, pp. 3420–3423, https://doi.org/10.1109/EMBC.2019.8857649.