DOI: 10.3303/CET2290100 Paper Received: 22 January 2022; Revised: 23 March 2022; Accepted: 10 May 2022 Please cite this article as: Clarke I., 2022, The Hazards of Transient Operations, Chemical Engineering Transactions, 90, 595-600 DOI:10.3303/CET2290100 CHEMICAL ENGINEERING TRANSACTIONS VOL. 90, 2022 A publication of The Italian Association of Chemical Engineering Online at www.cetjournal.it Guest Editors: Aleš Bernatík, Bruno Fabiano Copyright © 2022, AIDIC Servizi S.r.l. ISBN 978-88-95608-88-4; ISSN 2283-9216 The Hazards of Transient Operations Ian Clarke Risk Engineering, Swiss Re Corporate Solutions, London, United Kingdom ian_clarke@swissre.com A review has been carried out of 140 major losses in the onshore oil, gas and petrochemical industries over the last 25 years. The purpose of this study is primarily to guide insurance risk engineers on which loss control areas to focus on during a typical risk engineering survey. This study supports the Lloyd’s Market Association’s key information guidelines for risk engineering survey reports and the guidelines for the conduct of risk engineering surveys and also builds upon previous studies. The study may also be of interest to those working in the oil, gas and petrochemical industries who are looking for lessons learned opportunities to assist in their own risk management programmes. The scope of the analysis was man-made losses only, thus excluding all those from natural catastrophes. Similarly, it focuses only on large monetary losses. Therefore, some losses resulting in a significant number of injuries or fatalities but small monetary loss, may not be included. The analysis focused on the loss control elements that failed, rather than the underlying root causes, as this information is not always available to insurers. The Willis Engineering Loss Database was used to identify all losses greater than US$50 million in the onshore oil, gas and petrochemicals sector from 1996-2021. These losses were then analysed where the information available from insurance industry reports, public investigations or other data sources was sufficiently detailed to allow a robust conclusion about which loss control elements failed. 1. Introduction The data from the 2016 analysis showed that 57% of the losses were primarily due to “operations-related” failure (Figure 1a), with >90% of these occurring during transient (63%) or maintenance operations (Figure 1b). This supports the conclusion that most major losses due to failures of operational controls occur during periods of transition, be it shut down, start-up, utilities failure, equipment switching, etc. Of these, start-up was by far the most common initiator. Several losses are presented which highlight deficiencies in the way process hazard analysis (PHA) is performed, as it often concentrates on steady state operations. The data tells us this is not where the problem lies. 595 The process industries have tools for addressing these problems, but they are frequently not well applied. The tools include, but are not limited to, operator competency, efficacy of process hazard analysis (PHA) and safety critical task analysis (SCTA). The paper will briefly cover each of these tools, and their usage, in turn, and then use several losses as examples of the shortcomings that can frequently be seen. Or, in process safety terms, to suggest some ways for how we can prevent the holes appearing in the barriers. 2. Tools 2.1 Operator competency If the objective is to measure operator competency, frequently KPIs will report the percentage of operator training completed, the number of operator drills completed or various measures of this type. However, operator competency is more than just the amount of training. It is a validation that the personnel at the front line understand the way the process works, both in an operational or a transient state, what hazards are presented in these various states, and how to identify when the decisions being made are no longer putting process safety at the forefront of the management process. The subject of competency validation is very wide but the focus here is on the relationship with transient operations. Many plants these days run for extended periods between turnarounds. The improved process control and emergency shut down systems may mean that operators work on a plant for years without experiencing a transient or emergency state. It is vitally important, therefore, that when they do experience an emergency or transient state, they respond in a way that is consistent with maintaining a high level of process safety. With that in mind: • Operators should be required to review, and be tested on, emergency procedures on an annual basis. This should include start-up, hot re-start, shut down, critical utility failures and other transient scenarios • This training, for panel operators, should ideally include the use of a process simulator • Drills should be held on these procedures to ensure each operator experiences an emergency exercise on each of the main scenarios every three years • Operators should work within a culture that does not penalise them for shutting down a unit if the situation has become sufficiently ambiguous for them to feel uncomfortable with it 2.2 Process hazard analysis (PHA) PHA is an essential process for managing risks and a vital part of any process safety management system (PSMS). For the oil, gas and petrochemicals industry the tool of choice is typically Hazard and Operability studies (HAZOP) but PHA also includes a wide variety of other quantitative and qualitative risk assessment tools. The focus of this paper, in relation to transient hazards, will be on PHA/HAZOP and how it is implemented. As can be seen from the statistics most of the "steady state" losses occur due to asset integrity, and most of the operations related losses occur during transient states. Unfortunately, HAZOP at many facilities focuses on the steady state operations, which is not where the problems typically lie. The HAZOP process, and the pitfalls, are well understood and will not be discussed here. For the HAZOP to correctly identify process safety issues during transient states the following must be in place: • Transient operations should be identified during the normal P&ID HAZOP. The P&ID HAZOP should support the generation of a complete list of “transient operations” safety critical tasks (SCT) for the unit (see section 2.3) • Obtain the written procedure for each task (e.g. the heater start up procedure). If there is no procedure then it should be written as this will form the basis of the transient PHA/HAZOP • Operations must check that all operators/team follow the same procedures/practices (old versions of procedures should be discarded, note different operators might have different practices) • A HAZOP of the SCT is then done by a team using a step by step approach using the procedure as a basis • As with the traditional HAZOP, team composition is important – experience operations and maintenance are critical • On site verification of certain steps is valuable to identify practical issues such as access to valves • With reference to operator competency (section 2.1), it should be ensured that operators are trained in the final procedure and are aware of the hazards through training and competency assessment 2.3 Safety critical task analysis (SCTA) SCTA came about in the early 2010's, following a recognition that assessment of human tasks lagged behind the analysis of process and engineering safety issues. However, despite this, the situation has not significantly 596 improved and many organisations do not carry out robust SCTA. With the recognized consideration that transient operations account for the vast percentage of losses, the main steps of SCTA are as follows: • Consider the main hazards • Identify the safety critical tasks – here the focus should be on transient operations • Understand the critical tasks – including the engineering and procedural barriers involved and how they interact with each other, particularly during transient operations • Represent the safety critical tasks – can be as simple as a checklist, or task analysis if more complex • Identify the human failures – typically using a human factors based HAZOP process or similar checklist approach (as per section 2.2) • Determine the safety measures required to control the risk of human failures – using the hierarchy of controls • Review the effectiveness of the process – in particular embedding the control measures into the existing PSMS 3. Losses 3.1 Formosa Plastics, Illiopolis, Illinois, USA, 23 April 2004 For a complete review of the Formosa Plastics incident please refer to the US CSB report, which contains a more detailed overview of the process and facility. The production building, housing all 24 polyvinyl chloride (PVC) reactors, was manned by six operators, but for the purposes of a single batch the processes are managed by the poly operator, on the upper deck, and the blaster operator, who worked across all levels. The poly operator readied the reactor, added the raw materials and then heated the reactor, monitoring progress on the control systems adjacent to the reactors on the upper level. Once the reaction was complete, he vented the reactor, and instructed the blaster operator to transfer the contents to the downstream stripper, where the residual vinyl chloride monomer (VCM) would be removed, from the valve station on the lower deck. To transfer the batch to the stripper, the blaster operator opened the transfer and reactor bottom valves, then once transfer was complete closed the transfer valve. The poly operator then purged the reactor of hazardous gases ready for cleaning. To clean the reactor the blaster operator opened the manway door and power washed the inside of the reactor with water to remove PVC residue from the upper level, and then opened the reactor bottom and drain valves to drain the water out of the reactor. Following this, the reactor bottom and drain valves were closed and the batch was signed off as complete, with the reactor handed back to the poly operator to commence preparations for the next batch. Because the production of PVC from VCM creates heat, the reactors had a number of safety systems installed. The first line of defence was the poly operator, who would adjust the cooling to control the reaction, add inhibitor to reduce the reaction speed and thus build-up of heat and then manually vent the reactor if these steps failed. If all this failed, the original facility owner (Borden) had added a fourth safeguard, which was to manually connect the reactor to an adjacent (empty) reactor via the bottom valves, thus promoting the inhibitor mixing, maximise cooling and provide extra volume to control the pressure. However, the reactors also had a safety interlock which prevented the opening of the reactor bottom valve if the pressure in the reactor exceeded 10 psi (69 kPa), which would be the case during an emergency. Borden provided a manual bypass, activated with purpose built air connections and hoses to override the safety interlock. This bypass was required to be authorised, although not witnessed, by a production supervisor. At Formosa the reactors were arranged in groups of four (see Figure 2), with the two reactors at the centre of this incident (D306 & D310) occupying the same relative position among the groups of four. For a more detailed description of the incident itself refer to the CSB report, but in summary at the time of the incident all reactors were making PVC except D306, which was being cleaned. 597 Figure 2: Reactor Building lower level view (from US CSB Report) From operator accounts and evidence gathered at the scene it appeared that the blaster operator mistakenly opened reactor D310 instead of D306, thus releasing a cloud of VCM into the reactor building, which subsequently ignited minutes later. To do this it was necessary to bypass the interlock on D310, which was in production at the time. Four operators were killed and another died two weeks later. The resultant fire and smoke required roads to be closed and 150 people to be evacuated. The explosion destroyed the plant and at the time of the issue of the CSB report (March 2007) it had not been rebuilt. The CSB identified a number of contributing factors to the loss, as follows: • The layout of the reactors was the same in each bank of four. While the reactors and the reactor bottom and drain valve consoles were labelled this feature makes a mistake more likely. A more effective PHA regime would have identified this risk • The operators had no means of communications, so the blaster operator on the lower level would have to climb the stairs to check status with the poly operator • When Formosa bought the facility they removed a group leader position, whose role was to oversee the operators and provide advice and specialist technical assistance, including (for example) oversight of the bypass procedure for an emergency reactor situation • Formosa failed to learn from a number of similar incidents in previous years. Including many of these incidents in the PHA/HAZOP programme for discussion may have resulted in a more robust interlock system. It should be noted that Borden, in their 1992 PHA, recommended that the bypass should not be initiated without direct supervision and the air hose and connection system replaced by something more robust. In the 1999 revalidation the team accepted the existing level of controls. Formosa never revalidated the PHA, but note that the extra layer of supervision was removed • The CSB observed that neither of the PHAs addressed the cleaning process. This is exactly the type of transient condition that is frequently missed during PHA/HAZOPs • The design of interlock systems enabled them to be easily bypassed. A more rigorous approach, using a combination of more robust interlocks and SCTA for bypass management for this emergency scenario, would have been more effective 3.2 Williams, Geismar, Louisiana, USA, 13 June 2013 For a complete review of the Williams Geismar incident please refer to the US CSB report, which contains a more detailed overview of the process and facility. The process produced ethylene and propylene from a ethane and propane based cracking and distillation process. The area in question was around the propylene fractionator, which separated propylene and propane. This column was provided with two standard tube and shell heat exchangers, due to operational fouling which typically occurred within these units due to use of the cracker hot section quench water as the heating medium. At the time of the incident the plant was changing from reboiler A, which had been in operation for 16 months, and reboiler B, which had been cleaned and prepared for use. 598 The original design of the system was for both reboilers to be in service, but in order to maintain continuous operations Williams had modified the system by including inlet and outlet valves on both the shell and tube side of the exchangers allowing them to be isolated for cleaning while the other reboiler continued operations. This introduced the potential for the propane in the shell side of the reboiler to be isolated from the pressure relief system (see Figure 3 from CSB report below). Figure 3: Process diagram (from US CSB Report) At the morning meeting on the day of the incident an operations supervisor who was frequently tasked with troubleshooting operations problems assessed the declining quench water flows and determined that reboiler A should be swapped for reboiler B. This required some operations and maintenance preparation, so was required to be referred to the operations manager. However, he was not available so the supervisor returned to the field to evaluate the quench water flow problems further. At 0833 hrs the charts recovered after the incident indicate a large increase in quench water flow, which led the CSB to believe that quench water flow to reboiler B had been commissioned by opening the valves. Three minutes later reboiler B exploded and propane and propylene fluid erupted from the reboiler and ruptured piping around the propylene fractionator. The resulting cloud ignited, creating a massive fireball and ejecting process pipework into a pipe rack 30 feet (9 m) above. An operator near the scene died and the supervisor died from severe burn injuries the following day. There were 167 reported injuries to Williams personnel and contractors who were in the vicinity, and the fire burned for 3.5 hours, releasing an estimated 30,000 pounds (13,620 kg) of flammable hydrocarbons. The site was shut down for 18 months, restarting in January 2015. Metallurgical testing found the reboiler failed at a very high internal pressure (> 670 psi (46 bar) or 4.6 MPa), resulting from liquid thermal expansion of a result of the vessel being blocked in. 599 Standard practice for the offline reboiler was to remain blocked in by nitrogen with all shell and tube side valves closed. These were manual valves with no interlock system. During the 16 months that reboiler B had been isolated propane had leaked either through the valve, or the valve had been inadvertently opened at some point, resulting in the vessel being full of propane. Post incident investigation showed both quench water valves on the tube side were open and both propane valves on the shell side were closed. When the quench water was introduced to the reboiler it heated the propane, which expanded, exceeding the operating pressure of the reboiler and causing it to fail. The CSB identified a number of contributing factors to the loss, as follows: • There was no effective Management Of Change (MOC) on the original decision to provide block valves to the two reboilers. The PHA/HAZOP on the change used a simple checklist approach and appeared not to include transient states such as switching the reboilers. In addition, it was done after the change was actually completed, to meet a regulatory requirement • Recommendations from the PHA/HAZOP programme at the site were not followed up. A PHA in 2006 identified the overpressure hazard but subsequent actions, such as car sealing open only the operating reboiler, did not address all the hazards, only the ones posed under normal operations • There was no operating procedure developed for the operation of switching the reboilers. SCTA would have identified the risk posed by inadvertent "blocking in" of the propane in the reboiler shell and established more rigour around this process 4. Conclusions The data indicates that control of operations, particularly during periods of transition such as planned or unplanned start-up, shut down and equipment switching, should be a major focus of operator concern. Techniques such as SCTA will identify the critical procedures and ensure that these are effective, using a check list approach. Similarly, PHA activities such as HAZOP should evaluate more fully the hazards posed during these transient conditions rather than only covering the plant during normal operation. Most importantly, operator training and refresher training regimes should include transient operations, and the hazards posed by them, as part of an ongoing competency validation framework. Operators are encouraged to adopt techniques such as SCTA, which will prevent erosion of barrier effectiveness due to lack of effective auditing and management oversight, embrace more rigorous competency validation schemes, and to ensure PHA/HAZOP processes include, and indeed focus on, transient operations. Acknowledgments The author would like to acknowledge the Willis Energy Loss Database for providing a list of energy losses to be evaluated, the Lloyds Market Association, who published the first version of the statistical analysis of the losses back in 2016, and the US Chemical Safety Board (CSB) for their work in promoting learning from losses. Also, it should be noted that this is not a criticism of the companies identified or indeed the US oil and gas industry specifically. The fact is that the work of the US CSB, and the excellent reports that they publish, allows us all to learn the lessons the easy way, rather than the hard way. These losses were selected because they illustrate the points, and the investigations are available in the public domain. References An Analysis of Common Causes of Major Losses in the Onshore Oil, Gas and Petrochemical Industries, Lloyds Market Association (LMA), September 2016. Guidance on Human Factors Safety Critical Task Analysis, Second edition, Energy Institute, London, January 2020. US CSB, Investigation Report, Vinyl Chloride Monomer Explosion, Formosa Plastics Corp., Report No; 2004- 10-I-IL, issued March 2007. US CSB, Case Study, Williams Geismar Olefins Plant, Reboiler Rupture and Fire, Report No. 2013-03-I-LA, issued October 2016. 600 lp-2022-abstract-019.pdf The Hazards of Transient Operations