Microsoft Word - 025.docx CHEMICAL ENGINEERING TRANSACTIONS VOL. 48, 2016 A publication of The Italian Association of Chemical Engineering Online at www.aidic.it/cet Guest Editors: Eddy de Rademaeker, Peter Schmelzer Copyright © 2016, AIDIC Servizi S.r.l., ISBN 978-88-95608-39-6; ISSN 2283-9216 Pushing Process Limits Without Compromising Safety Hector R. Perez PAS, Inc, 16055 Space Center Blvd., Suite 600, Houston, TX, 77062 HPerez@pas.com Numerous accident investigations highlight lack of proper overview displays and alarms as root causes or contributing factors to accidents. One example of this is the Texas City incident in 2005, in which “the control board display did not provide adequate information on the imbalance of flows in and out of the tower to alert the operators to the dangerously high level” (Source: U.S. Chemical Safety and Hazard Investigation Board, Final Investigation Report, March 23, 2007). Industry has spent billions to design and install automated safety systems, in accordance with highly detailed standards such as ISA-84 (IEC 61511), Safety Instrumented Systems. Despite these efforts, the accidents continue and are often attributed to human error. In many companies, management is highly concerned at verifying, at all times, whether the processes are within a variety of acceptable boundaries. While operating inside safe boundaries sounds simple, modern control systems (e.g., DCS, SCADA) are not designed to track boundaries other than process alarms. Indeed, alarms setpoints and activation rates are enough of a challenge to control; visualization, management and control of operational boundaries are even more complex. Consolidating operational boundaries is difficult because the information resides in multiple databases, or worse – in hard copy files. Additionally, capacity “creep” and debottleneck activities or minor betterments will change the throughput of the process, pushing the process closer to or beyond the original design limits. From an operational perspective, the process needs to be compared to these documented limits in real-time for effective operator situation awareness. The actual operational information resides in the automation system, making this comparison a challenge. How can an operator recognize approaching operational, design or safe limit violations in a timely manner without adding excessive alarms? Moreover, how do violations of the limits get logged, tracked, and investigated to prevent recurrence? Managers, engineers, and operators are responsible for making sure that easily-changed automation systems remain both configured and operated within appropriate boundaries. In this paper, we discuss new technology and methods for aggregating, analyzing, depicting, and controlling process boundary information to increase awareness of the operator while enabling engineers and managers to ensure that the process is always within safe limits. 1. Console Operators and the First Problematic Set of Operational Limits: Process Alarms Console operators constantly make real-time decisions. Time is a luxury they do not have. Operators must immediately remediate abnormal situations before they escalate into bigger problems that lead to incidents and accidents. On a “lucky” day, an unresolved abnormal situation may result in a shutdown affecting production and profitability. As a worst-case scenario, it can end up as a catastrophic loss of life. The console operator, a critical resource, must have visibility to relevant, accurate, and meaningful information at all times. As a result of the previously mentioned Texas City incident and many other incidents like it, the industry began to address the problem of poor alarm management. The first step in doing so was to understand what created the problem. 1.1 History of the Alarm Management Problem Prior to the advent of Distributed Control Systems (DCSs), plants had control boards on which “lightbox” alarms coexisted with the live measurements displayed on trends and gauges on the wall. These lightboxes contained a limited number of alarms. Both the installation of the boxes and the installation and configuration DOI: 10.3303/CET1648153 Please cite this article as: Coward A., 2016, How can operating teams push processes to their optimal limits while maximizing safety and minimizing environmental impact?, Chemical Engineering Transactions, 48, 913-918 DOI:10.3303/CET1648153 913 of alarms on the boxes had costs associated with them, as the engineers had to run wires to make the connections. Due to the high cost, engineers configured only alarms that could be justified to management. For the operator to justify adding an alarm to the lightbox, an alarm had to indicate an abnormal situation with significant consequences if the operator did not take action. For this reason, control panels contained few alarms – typically around 120 alarms.1 1.2 The Birth of the DCS and Lack of Guidelines Creates the Alarm Management Problem Control boards did not provide flexibility for plants to easily modify their control strategies and gain competitive advantage. As an example of this, simply extracting data from the control system to spreadsheets for analysis was nearly impossible. While the DCS provided more flexibility, it also introduced new paradigms that created the alarm management problem. Three well-known factors contributed to the exponential growth of configured alarms: the loss of the “big picture” control wall, “free” alarms with the DCS, and a lack of guidelines for both effective configuration of alarms and creation of process graphics. 1,2 A control wall representing an overview of the entire span of control was eventually replaced by a DCS screen, with a few live values shown on it. The screens were expensive, and operators typically had only a few screens and thus a limited view of the process. To compensate, engineers created numerous alarms in order to direct the operator’s attention. Alarms were (and still are) so easy to configure in the DCS, that engineers began using alarms for inappropriate uses such as maintenance, optimization, use by non-operators, and even for “personal” alarms. As a result, operators received thousands of alarms per day (in many instances, thousands of alarms in minutes during process upsets).1 Human limitations to internalize and process information at such rapid rates led to unfortunate accidents, in which the alarm load was cited as a key contributing factor. 1.3 The Alarm Management Solution Several publications, including The Alarm Management Handbook published by PAS1, address the alarm management problem. The ISA-18.2 and IEC62682 standards have been adopted by a vast number of companies, and alarm management has generally become mandatory for the processing industry. By resolving the alarm management problem, companies improve safety and profitability. What other operational limits can be better managed to continue the same trend and drive greater competitive advantage? 2. The Console Operator’s Job A console operator’s job is typically to monitor and control the process. What does this really mean? When there are process upsets the automation system cannot handle, the operator is expected to intervene and take action to prevent escalating consequences. When there are no process upsets, the operator is expected to adjust and optimize the state of operations. 2.1 The Console Operator Job During Normal and Abnormal Situations The automation system is the first line of defense (for safety and profitability) against abnormal situations. Automated processes performed by interwoven disparate systems prevent human error and increase profitability. While well-defined alarms and alarm limits help operators to identify the root cause of abnormal situations and address them, complex automation systems and poor HMI can make it nearly impossible to provide situation awareness. Most operators view the process through dozens of poorly designed and cryptic P&ID-type screens, covered in hundreds of raw numbers. 2 If operators are inundated with such raw data on screens containing no context, they must rely on personal experience to decipher the data on top of an already heavy mental workload. In this scenario, the operator is reacting to an alarm versus proactively monitoring and controlling the process. Imagine a commercial airline pilot taking off on the plane and setting the controls to ascend and not taking any further action until the “you are flying too high” alarm sounds. The pilot then pushes the controls to descend and does not touch them until the “you are flying too low” alarm goes off. Safety and efficiency would be compromised, and the airline would go out of business. Unfortunately, this is exactly how many plants operate today. Operators wait for an alarm to take an action and then wait for the next alarm to take another action. This problem can easily be resolved by providing the appropriate tools for the expected job. Using enhanced visualization utilizing the concepts as explained in The High Performance HMI Handbook, console operators can proactively intervene. Figure 1 below shows raw data as typically depicted to console operators (left image) versus contextualized data that supports situation awareness (right image). 2 The depiction on the left shows a vessel that has 200 psig of pressure (raw data). Is this good or bad? One possible answer is, “I have no clue.” A better answer is, “it depends.” The latter is a better answer, but it is not 914 good enough. A relatively new and inexperienced console operator may incorrectly assess the 200 psig as an acceptable value, and would not recognize the error until the alarm was activated (confirming that it was the wrong assessment). By this time, a domino effect of escalating consequences may be triggered. Early intervention of abnormal situations is critical. Figure 1: Raw Data vs. Contextualized Information The vessel on the right-hand side of Figure 1 shows an analogue indicator bar with a pointer (triangle) that moves up and down as the pressure increases/decreases as well as the actual value of the pressure (number below the bar). Based on this information, is this pressure value good or bad? We can see that the pointer triangle is quite close to the High Alarm, and it looks like this process variable is about to go into abnormal conditions (alarm). This is easily interpreted by any operator, new or experienced; this graphical representation enables proactive intervention before an alarm occurs and enhances situation awareness. During abnormal situations, the right-hand side depiction helps operators detect abnormal situations before they occur. If diagnosing a situation, operators can see how the process is moving, anticipate additional problems, and address them as they are working on the initial trigger of the problem at hand. 3. Alarms with Improved HMIs and Contextual Information Yield Enhanced Operator Performance Let’s assume the console operator has a fully rationalized alarm system with exactly the alarms that they need, which point to root causes of the problem, and that the operator has a High Performance HMI for complete situation awareness of the evolving abnormal situations. To improve efficiency and improve competitiveness, an organization may consider: 1. What is the console operator’s job description during normal situations? 2. What is the console operator’s job description during abnormal situations? The answer to question #1 should be to optimize the process. If everything is running fine, High Performance HMIs enable the operator to make it run better. Figure 2 shows if the value is close to the alarm setpoint (left image) or if the value is within the depicted optimized operating region (right image). Anything between the optimized range and the alarm region triggers an operator action to optimize the process. It is an important principle in High Performance HMI displays to show the optimum ranges. The answer to the reiterated question #2 remains the same – to take over the control system and take the necessary corrective actions. The operator needs tools that provide information to make the right decision. If best practices have been followed, alarms have been rationalized and every alarm has been documented with potential causes, consequences, and corrective actions, this information should be embedded in context within the graphics. In Figure 3, the alarm has occurred, and the operator should be able to easily “right click” the alarm region or indicator element and access this information to resolve the situation without escalating consequences. 200 psig 200 psig Current  Value High Alarm Range Low Alarm Range 915 Figure 2: Situation Awareness for Process Optimization (Optimization Limits) Figure 3: Contextual Information in HMIs 4. The Relationship Between Alarms and Other Operations Limits To remain competitive, firms should continuously debottleneck processes and push production to the limits of equipment design specifications. To do so, the operator must monitor limits beyond just alarms, such as safety system activation points, environmental limits, and relief valve settings. This data exists in every plant, but unfortunately it is “sprinkled” amongst a multitude of different databases across different systems. To proactively monitor in this environment, an operator would have to memorize thousands of limits, look at the live values on the control system, and make mental comparisons to ensure no limit is violated. In simple words, that is not happening. A best practice is to aggregate all of those limits and visualize them contextually in relation to each other. In Figure 4 below, the operator can see the pressure is in High Alarm and that – if no action is taken – increasing pressure will open an automated vent valve to a flare, resulting in product loss. If the vent valve cannot 180 psig200 psig Current  Value High Alarm  Setpoint Low Alarm  Setpoint Optimized  Range 200 psig Current  Value High Alarm Low Alarm Optimized  Range 1 Clicking on the alarm calls up information Causes: 1. ABC… 2. DEF… Consequences: 1. GHI… 2. JKL… Corrective Actions: 1. MNO… 2. PQR… 916 decrease the pressure and bring the process back to normal, a second High-High alarm (acting as a pre-trip alarm) will indicate the process is headed to the next limit, a safety shutdown, which will cost considerably more production. If the safety shutdown fails to bring the process back to normal, engineers would be responsible to have designed the relief valves per code, so that the vessel will not exceed the Maximum Allowable Working Pressure (MAWP) by an unacceptable amount, and get nowhere near the actual vessel mechanical design limit. In plants where production is constantly adjacent to the design limits of the process, this type of situation awareness assists the operators and guides their actions. A visual representation of limits reduces the mental workload of an operator and greatly enhances plant safety. Figure 4: Contextualized Alarms with Other Operational Limits 4.1 Addressing the Concerns of Plant Management Management must have confidence that the plant is running within proper operational boundaries. This requires assurance that not only the current operating process values are proper, but also that the underlying control system configuration reflects the known and proper operating boundary information. DCSs are notoriously easy to change, and accidents have been traced to improper changes in alarms and interlock settings. These control system configuration questions must be continually verified: 1. Are the current settings in my control system proper considering the design documentation of the process? (Or have they migrated over time?) 2. Are these settings appropriate for normal operations within ranges associated with process design, quality, efficiency, emissions, and production rate? 3. Are alarms properly set to indicate the movement of the process into ranges requiring operator response to address the condition? 4. Are the settings for automatic safety function activation in accordance with design documentation? 5. Have there been any inappropriate changes to any of the setpoints or logic conditions of concern for identifying where the process is running relative to these boundaries? The plant’s current and recent operation relative to proper boundaries should also be easily seen and verifiable by operators, engineers, and managers. Questions to be answered include: Current  Value High Alarm Setpoint Low Alarm Setpoint Optimized  Range Automated Vent Valve Activation High‐High Alarm Setpoint Shutdown Safety System Activation Mechanical Integrity Limit Safety Relief Valve set at MAWP2 Low‐Low Alarm Setpoint 200 psig 917 6. Is the process currently running within the normal ranges associated with safe design, quality, efficiency, emissions, and production rate? 7. How often and to what degree has the process been running outside of such normal ranges? 8. Is my process running within non-optimum or abnormal ranges, but still within safe ranges that do not activate automated shutdown systems, thus causing disruptive shutdowns that necessitate expensive and potentially hazardous restarts? 9. How often and to what degree has the process been running in ranges nearing the activation of automated safety systems? 10. Is the proximity of the process to the activation of automated safety systems clearly depicted to the operator? When the answers to these questions are regularly determined and easily known, management can be much more confident that significant accidents or expensive shutdowns are unlikely. Engineers and operators can have confidence that the process is truly under control. But, in many companies there are no systems in place to answer these straightforward questions. 5. Conclusions Real-time visualization of aggregated and validated operational boundary limits improves safety and compliance. With improved situation awareness, operators can take pre-emptive corrective actions. Using High Performance HMIs to aggregate operational limits, console operators can push operational limits without compromising safety or the environment.2 Process limits include those for quality, efficiency, and safety. Situation awareness of the process relative to those limits is essential. Rules can be established to create dynamic relationships between different limits, and technology tools exist to monitor deviations from these limits. As an example, a rule can ensure that high pressure alarms are set no higher than 90 percent of the relief valve activation pressure. If an alarm is incorrectly configured too closely to the relief valve activation point, technology tools can automatically detect this condition and send an automated email to the responsible party. These tools also provide valuable insight through data analysis on violations per week, most frequent violations, chattering violations, stale violations, and more. Costs or losses associated with violations can be automatically calculated and reported. These analyses can be used to create score cards to help prevent recurrence. Console operators cannot memorize thousands of limits and make the correct decisions in real-time, every time. Their situation awareness can be greatly increased by presenting aggregated and contextual information. The result will be increased efficiency and profitability. References Hollifield B., Habibi E., 2010, The Alarm Management Handbook, Second Edition Hollifield B., Oliver D., Habibi E., Nimmo I., 2008, The High Performance HMI Handbook. 918