CET Volume 86 DOI: 10.3303/CET2186127 Paper Received: 28 August 2020; Revised: 7 February 2021; Accepted: 26 April 2021 Please cite this article as: Tamascelli N., Scarponi G.E., Paltrinieri N., Cozzani V., 2021, A Data-driven Approach to Improve Control Room Operators’ Response, Chemical Engineering Transactions, 86, 757-762 DOI:10.3303/CET2186127 CHEMICAL ENGINEERING TRANSACTIONS VOL. 86, 2021 A publication of The Italian Association of Chemical Engineering Online at www.cetjournal.it Guest Editors: Sauro Pierucci, Jiří Jaromír Klemeš Copyright © 2021, AIDIC Servizi S.r.l. ISBN 978-88-95608-84-6; ISSN 2283-9216 A Data-driven Approach to Improve Control Room Operators’ Response Nicola Tamascellia,b,*, Giordano Scarponia, Nicola Paltrinierib, Valerio Cozzania a Department of Civil, Chemical, Environmental and Materials Engineering, University of Bologna, Bologna, Italy b Department of Mechanical and Industrial Engineering, NTNU, Trondheim, Norway nicola.tamascelli2@unibo.it Digitalization has significantly improved productivity and efficiency within the chemical industry. Distributed Control Systems and extensive use of sensor networks enable advanced control strategies and increase optimization opportunities. On the other hand, chemical plants are increasingly complex, equipment is highly interlinked, and it is more difficult to describe the system dynamics through first principles. Finding the root causes of process upsets and predicting dangerous deviations in process conditions is often challenging. Advanced and dynamic tools are needed to grant safe and stable operations in such a complex and multivariate environment. In this context, Machine Learning techniques may be used to exploit and retrieve knowledge from the large amount of data that chemical plants produce and store on a daily basis. Data-driven methods may be adopted to develop predictive models and support a proactive approach to process safety. The study aims to develop Machine Learning techniques to improve the response of control room operators during critical events. Specifically, alarm data originated in an upper-tier Seveso site have been collected, cleaned, and analyzed to identify periods of intense alarm activity. Alarm behavior following operator responses has been evaluated to assess whether the actions were adequate to prevent future alarm occurrences. In doing so, alarm events that reoccur within 30 minutes after an operator acknowledgment have been identified and labeled. Subsequently, a hybrid classification algorithm was trained to predict the probability that a critical alarm reoccurs after being acknowledged by the operator. This predictive tool might be used to support the operator’s decision-making process and focus his/her attention on critical alarms that are more likely to occur again in the near future. 1. Introduction The alarm system is one of the first layers of protection to prevent process deviations from escalating into accidents (Stauffer and Clarke, 2016). Still, there are inherent difficulties in designing, operating, and maintaining an efficient alarm system (Goel et al., 2017), which includes both technical (e.g., sensors, DCS, actuators) and human functions (e.g., operators). Alarms inform control room operators about dangerous deviation from normal operating conditions so that appropriate corrective actions could be taken. On the other side, operators should be provided with enough time to detect the issue, diagnose the situation, and determine/implement corrective actions (ANSI/ISA, 2016). Still, manual intervention by operators is subject to human error; improper procedures, worker fatigue, and lack of operator training may prevent an adequate response (Exida, 2009). In fact, several accident reports have highlighted that improper alarm management and inaccurate operator actions play a significant role in the development of process accidents. For example, poor alarm prioritization and an excessive alarm annunciation rate contributed to the Texaco Milford Haven explosion, where 26 workers were injured (Health and Safety Executive, 1997). Also, the non-detection of a loss of coolant led to the Three Mile Island Accident (United States Nuclear Regulatory Commission, 2018), where alarms rang, warning light flashed, but no operator could diagnose the situation. In an attempt to rationalize and provide a methodology for a more effective design and management of alarm systems, standard manuals have been published, such as ISA 18.2 (ANSI/ISA, 2016) and EEMUA 191 (EEMUA, 2013). Still, much remains to be done (Goel et al., 2017). The advent of the Third Industrial Revolution has already 757 brought changes and improvements to the industry. Chemical plants are more productive, more automated, more flexible, and safety systems are more advanced. Nevertheless, some issues have arisen as well. Modern DCS allows the configuration of new alarms with few clicks of mouse (Katzel, 2007). As a result, a larger number of alarms are installed, and the workload for operators has increased to the point where they are often overwhelmed by nuisance alarms (Kondaveeti et al., 2013). In addition, chemical plant complexity has increased, and control/safety functions are more intricate. As complexity increases, failures are more likely to occur, and root causes of process upsets are more difficult to detect (Wall, 2009). Therefore, it may be challenging for control room operators to find the appropriate set of corrective actions in a reasonable time. An intelligent tool to assess and predict the effectiveness of operators’ actions would be of great support in dealing with critical situations. In this context, advancements in IT, IoT, and computer science have led to the development of intelligent computer-based algorithms to extract knowledge from data and support knowledge- based decision-making. In fact, a massive amount of process and alarm data are produced and stored on a daily basis (Reis and Kenett, 2018). Machine Learning algorithms may be used to “mine” these data and create predictive models for, e.g., fault detection e fault diagnosis (Tian et al., 2015), predictive maintenance (Carvalho et al., 2019), Dynamic Risk Assessment (Paltrinieri et al., 2019), modeling and simulation (Aleixandre et al., 2015). This work focuses on the application of Machine Learning techniques for predicting the probability that a critical alarm reoccurs after being acknowledged by an operator. In this way, the operator’s attention would be driven to alarms that are more likely to occur again in a short time. Alarm data from an ammonia production site have been used to evaluate the proposed methodology. 2. Alarm database: structure and features Alarm records originated in an ammonia production plant have been used to test the proposed methodology. Alarms that occurred between July and November 2017 have been extracted and stored in a CSV file (i.e., the alarm database), which contains 26,473 alarm records described by means of 39 different attributes. That is, the database may be considered a 26,473x39 matrix, where each row represents an alarm event, and each column represents an attribute (i.e., a feature) of an alarm event. A reduced version of the database was described and used by Tamascelli et al. (2020a, 2020b). In the present study, the whole dataset has been used.Most of the alarms in the database occurred between 09/09/2017 and 09/10/2017, when a total power outage forced an emergency plant shutdown. During this critical event, the alarm annunciation rate often exceeded 1000 alarms/day, and the workload for control room operators increased drastically. Each alarm event is described by a list of attributes. A comprehensive description of the attributes may be found in Tamascelli et al. (2020b). However, only three attributes are needed for uniquely identifying an alarm event: 1. Timestamp; 2. Source; 3. Identifier. The Timestamp is the date and time of the alarm occurrence. The Source represents the instrument or logic function that generated the alarm. An example of a Source is LI315 (i.e., the level indicator in the control loop 315). The Identifier indicates the alarm status. Nine different identifiers are found in the database, as shown in Table 1. Table 1: Alarm Identifiers Identifier Meaning HHH The measured variable has exceeded the high alarm setpoint HTRP The measured variable has exceeded the very-high alarm setpoint LLL The measured variable has exceeded the low alarm setpoint LTRP The measured variable has exceeded the very-low alarm setpoint ALM Generic alarm IOP Instrument failure or out-of-range measure ACK The operator has acknowledged the alarm NR A generic alarm is terminated (it refers to an earlier ALM alarm) Recover Alarm terminated (it refers to an earlier HHH, HTRP, LLL, LTRP, or IOP alarm) 758 In addition to alarms per se, the database keeps track of two different events: the acknowledgment of an alarm by an operator and the recovery of an alarm. The former is described by the Indicator “ACK”, the latter by “Recover” or “NR” depending on the Identifier of the original alarm. 3. Methodology The method follows three main steps: Data pre-processing, Target identification, and Machine Learning simulations. 3.1 Data pre-processing Alarm attributes (i.e., columns of the database) that have not been deemed relevant for the analyses have been discarded. For example, empty columns have been removed, as well as columns that show the same value for each event in the database. Also, redundant attributes have been removed. Next, missing values have been substituted by the value “0”. This has been done because most Machine Learning algorithms do not tolerate missing values (Brink et al., 2016). The choice of the value “0” is arbitrary; a different numerical or categorical value (i.e., a text string) would be equally effective (Han et al., 2012). In doing so, one must ensure that the chosen value is outside of the domain of the attribute affected by missing values (i.e., the attribute should never take values equal to the one selected for the substitution). Finally, it may happen in industrial databases that different measurements have different units. For example, one may find that some of the pressure measurements are expressed in “bar”, while others in “atm”. Whenever this happens, it is critical to ensure that attribute values referring to homogeneous physical quantities are expressed into common measurement units. Also, numerical values should be normalized in order to suppress scale effects (e.g., using min-max scaling) (Brink et al., 2016). 3.2 Target identification The database must be analyzed to find and highlight events where an operator has acknowledged an alarm, but still, another alarm from the same Source occurs within 30 minutes. Events that meet this criterion are called Target events. The time window has been selected in accordance with the approach mentioned in the PETRO-HRA Guideline, which evaluates the 30 minutes criterion as the time required for action from the operator (Stauffer and Clarke, 2016). A binary categorical variable is assigned to each event in the database to highlight Targets. The binary variable is called the Label of an alarm event, and it assumes the values “YES” or “NO” depending on whether the event is a Target or not. Therefore, if the database contains n events, n Labels are generated, stored in a vector, and appended to the alarm database. Table 2 clarifies the role of Labels. Table 2: Fictitious alarm sequence from LI315. Timestamp Source Identifier Attribute 4 . . . Attribute n Label 01/01/2021 00:00:00 LI315 LLL --- . . . --- NO 01/01/2021 00:03:00 LI315 LLL ACK --- . . . --- NO 01/01/2021 00:19:00 LI315 LLL Recover --- . . . --- NO 01/01/2021 01:30:00 LI315 LLL --- . . . --- NO 01/01/2021 01:15:00 LI315 LLL ACK --- . . . --- YES 01/01/2021 01:40:00 LI315 LTRP --- . . . --- NO The table shows a fictitious alarm sequence from LI315. Data are organized as described in section 2, except for the last column, which contains the Labels. The second-last event of the series has “YES” as a label since another alarm from the same Source (LTRP) has occurred less than 30 minutes after acknowledging the previous low-level alarm (LLL). On the contrary, the first ACK event in Table 2 has “NO” as a label because the alarm has been recovered after 16 minutes (LLL Recover). 3.3 Machine Learning simulations A Wide&Deep classification model has been trained and evaluated on the alarm database. The purpose of the algorithm is to classify alarm events into two categories: Target (i.e., Label = “YES”) and Not Target (i.e., Label = “NO”). That is, the model would predict whether an acknowledgment will be followed by another alarm from the same Source (Label = “YES”) or not (Label = “NO”). The workflow to set up and perform the Machine Learning simulations is illustrated in Figure 1. Two steps must be followed to complete a classification task: training and evaluation. During training, ⅔ of the alarm database is fed to the Wide&Deep model (arrow T1 in Figure 1), which “learns” the relationship between the features of an event (i.e., the attributes) and its Label. 759 The process involves the joint optimization of two distinct models (Cheng et al., 2016): a Linear model and a Deep Neural Network. The structure of a Wide&Deep model is illustrated in Figure 1. Mathematics and technical details behind the model are out of the scope of this work and may be found in Cheng et al. (2016). Figure 1: Training and evaluation of the Wide&Deep model. In general, the algorithm aims at optimizing the internal parameters of a function f to best represent the relationship between features of an event (X), and its Label (Y): f (X) ≈ Y (1) The function f in Eq(1) comprises a linear part, where features are linearly combined and mapped into Labels, and a Deep part, where features are linearly combined and transformed into derived features (i.e., hidden units or “neurons” of the DNN) through nonlinear transformations. The parameters used to set up the model are the same as those used by Tamascelli et al. (2020b). After training, a trained model is obtained (T2 in Figure 1). Next, the model is evaluated. Labels are removed from the rest of the database, which is fed to the trained model (E1 in Figure 1). The algorithm takes as an input the features of each event and returns the probability of the Label being “YES” or “NO” (E2 in Figure 1). By default, a probability decision threshold equal to 0.5 is used to convert Label probabilities into Labels (i.e., if the probability of Label “YES” is greater than 0.5, the event will be labeled as “YES”). Finally, predicted Labels are compared with true Labels to assess the model performance. 4. Results The target identification procedure (step 3.2) highlighted that a total of 119 events meet the requirements to be classified as Target. The training database comprises 17,649 alarm events, of which 78 belong to the Target category. The evaluation database contains 8824 events, of which 41 belong to the Target category. After the evaluation phase, three metrics have been calculated in order to assess the performance of the model: Accuracy = TP+TN TP+TN+FP+FN = 0.995 (2) Precision = TP TP+FP = 0.5 (3) Recall = TP TP+FN = 0.049 (4) Where TP identifies a True Positive (i.e., the model predicted the label “YES”, and the true Label of the event is “YES”), a TN identifies a True Negative (i.e., predicted Label = “NO”, real Label = “NO”), FP identifies a False Positive (predicted Label = “YES”, real Label = “NO”), and FN identifies a False Negative (i.e., predicted Label = “NO”, real Label = “YES”). Therefore, the sum of TP and TN indicates the number of correct predictions, while FN and FP show the number of wrong predictions. 5. Discussion Metrics presented in Eq(2), Eq(3), and Eq(4) indicates that 99.5 % of the prediction were correct. Nevertheless, this result does not imply satisfactory performance. In fact, only 41 out of 8824 events in the evaluation database have “YES” as a true Label. Therefore, the model would have achieved an Accuracy greater than 99 % by always predicting the Label “NO”. This happens because de dataset used for the simulation is imbalanced, meaning that there are only a few examples of Target events within the database. In 760 this situation, Precision and Recall offer more information than Accuracy. Furthermore, it should be considered that mislabelling a Target event as a non-Target one is more critical than labeling a “NO” event as “YES”. Thus, Recall is the most meaningful metric to consider in this specific task. A large precision is desirable but not as important as a high Recall. Precision in Eq(2) indicates that 50 % of the “YES” predictions were correct, and the Recall in Eq(4) shows that only 4.9 % of the Target event were correctly identified (i.e., 2 out of 41). Evidently, the rarity of Target events must have affected the learning process and contributed to the uncertainty of predictions. However, these metrics have been calculated using a standard probability decision threshold equal to 0.5. There is no guarantee that this value will lead to the best performance. Thus, the decision threshold has been varied from 0 to 1; every time the threshold has been changed, predicted labels are calculated again, and so are Precision and Recall. Figure 2 illustrates how Precision and Recall change with the decision threshold. Lowering the threshold to 0.012 would increase the Recall from 0.049 to 0.9 and decrease the Precision from 0.5 to 0.34. As previously mentioned, Recall is the most important metric to consider in this problem. Thus, the performance would significantly improve since the increase in Recall is five times larger than the decrease in Precision. Figure 2: Precision–Recall curve produced by the Machine Learning simulation. Coordinates of points on the curve represent Precision and Recall obtained using different decision thresholds (THOLDs). The Red mark represents the point at threshold = 0.5. The green mark represents the point at threshold = 0.012. It is worth noting that 0.012 is a relatively small threshold. However, most of the predicted probabilities are lower than 0.012. More than 80 % of the prediction probabilities are smaller than 0.1, and only 110 events over 8824 have a probability larger than 0.012. This suggests that the model is relatively unconfident, which may be due to the rarity of the event considered. Still, 90 % of the Target event lies within those 110 events, which is an encouraging result considering the size of the dataset. In this situation, lowering the threshold to such a low value seems an acceptable compromise considering how probabilities are distributed. Future works should investigate whether training the algorithm with more alarm data would partially overcome the issues related to the rarity of Target events and eventually improve the model confidence. Additional tests should also be performed to assess whether different sets of features or different Machine Learning models would be better suited for the problem under assessment. Moreover, it is worth stressing that the analyses rely entirely on historical alarm data. Further tests are needed to assess the algorithm performance in a real environment. For example, the model may be integrated into the plant DCS in order to analyze live streams of alarm data; this would allow evaluating the model effectiveness and highlighting its possible limitations. 6. Conclusions This work proposes a data-driven method to extract knowledge from historical alarm data and perform predictions on the effectiveness of control room operators’ actions. A real industrial database has been used to support the analyses. A Wide&Deep classification model has been trained and evaluated on the database. The model aims at predicting whether or not the operator’s acknowledgment of an alarm will be followed by another alarm from the same Source within 30 minutes. In this way, the model would indirectly predict the effectiveness of the operator’s action and eventually drive his/her attention to alarms that are more likely to occur again in a short time. The issues related to the identification of rare unwanted events (such as those 761 considered in this work) have been discussed. Results show that even if performance may seem inadequate, a high Recall value may be obtained by lowering the decision threshold. After this simple adjustment, the model performance has improved considerably, and more than 90 % of the Target events have been correctly identified. Further investigations should be performed to evaluate the viability of the study in real-time applications. However, the approach suggests that Machine Learning may be used to extract relevant information from historical alarm data and use the acquired knowledge to support the control room operators proactively. Acknowledgments The authors would like to acknowledge Yara International for supplying the data and for the valuable support. Also, we wish to extend our thanks to Davide Santini, whose work has contributed significantly to the analyses described in this paper. References Aleixandre, J., Alvarez, I., García, M., Lizama, V., 2015. Application of Multivariate Regression Methods to Predict Sensory Quality of Red Wines. Czech Journal of Food Sciences 33, 217–227. https://doi.org/10.17221/370/2014-CJFS ANSI/ISA, 2016. ANSI/ISA–18.2–2016 Management of Alarm Systems for the Process Industries. ANSI/ISA. Brink, H., Richards, J., Fetherolf, M., 2016. Real-World Machine Learning, first. ed. Manning Publications, Shelter Island. Carvalho, T.P., Soares, F.A.A.M.N., Vita, R., Francisco, R. da P., Basto, J.P., Alcalá, S.G.S., 2019. A systematic literature review of machine learning methods applied to predictive maintenance. Computers and Industrial Engineering 137, 106024. https://doi.org/10.1016/j.cie.2019.106024 Cheng, H.-T., Koc, L., Harmsen, J., Shaked, T., Chandra, T., Aradhye, H., Anderson, G., Corrado, G., Chai, W., Ispir, M., 2016. Wide & deep learning for recommender systems, in: Proceedings of the 1st Workshop on Deep Learning for Recommender Systems. pp. 7–10. EEMUA, 2013. EEMUA Publication 191 Alarm systems - a guide to design, management and procurement. Exida, 2009. Saved by the Bell : Using Alarm Management to make Your Plant Safer. Sellersville, PA. Goel, P., Datta, A., Mannan, M.S., 2017. Industrial alarm systems: Challenges and opportunities. Journal of Loss Prevention in the Process Industries 50, 23–36. https://doi.org/10.1016/j.jlp.2017.09.001 Han, J., Kamber, M., Pei, J., 2012. Data Mining: Concepts and Techniques, Data Mining: Concepts and Techniques. Elsevier Inc. https://doi.org/10.1016/C2009-0-61819-5 Health and Safety Executive, 1997. The Explosion and Fires at the Texaco Refinery, Milford Haven, 24 July 1994: A Report of the Investigation by the Health and Safety Executive Into the Explosion and Fires on the Pembroke Cracking Company Plant at the Texaco Refinery, Milford Haven on 24 J, Incident Report Series. HSE Books. Katzel, J., 2007. Control Engineering | Managing Alarms [WWW Document]. URL www.controleng.com/articles/managing-alarms (accessed 1.23.20). Kondaveeti, S.R., Izadi, I., Shah, S.L., Shook, D.S., Kadali, R., Chen, T., 2013. Quantification of alarm chatter based on run length distributions. Chemical Engineering Research and Design 91, 2550–2558. https://doi.org/10.1016/j.cherd.2013.02.028 Paltrinieri, N., Comfort, L., Reniers, G., 2019. Learning about risk: Machine learning for risk assessment. Safety Science 118, 475–486. https://doi.org/10.1016/j.ssci.2019.06.001 Reis, M.S., Kenett, R., 2018. Assessing the value of information of data-centric activities in the chemical processing industry 4.0. AIChE Journal 64, 3868–3881. https://doi.org/10.1002/aic.16203 Stauffer, T., Clarke, P., 2016. Using alarms as a layer of protection. Process Safety Progress 35, 76–83. https://doi.org/10.1002/prs.11739 Tamascelli, N., Arslan, T., Shah, S.L., Paltrinieri, N., Cozzani, V., 2020a. A Machine Learning Approach to Predict Chattering Alarms. Chemical Engineering Transactions 82. https://doi.org/10.3303/CET2082032 Tamascelli, N., Paltrinieri, N., Cozzani, V., 2020b. Predicting Chattering Alarms: a Machine Learning Approach. Computers & Chemical Engineering 107122. https://doi.org/https://doi.org/10.1016/j.compchemeng.2020.107122 Tian, Y., Fu, M., Wu, F., 2015. Steel plates fault diagnosis on the basis of support vector machines. Neurocomputing 151, 296–303. https://doi.org/10.1016/j.neucom.2014.09.036 United States Nuclear Regulatory Commission, 2018. Backgrounder on the Three Mile Island Accident, United States Nuclear Regulatory Commission Library. Wall, K., 2009. Complexity of chemical products, plants, processes and control systems. Chemical Engineering Research and Design 87, 1430–1437. https://doi.org/10.1016/j.cherd.2009.03.007 762