Microsoft Word - 025.docx CHEMICAL ENGINEERING TRANSACTIONS VOL. 48, 2016 A publication of The Italian Association of Chemical Engineering Online at www.aidic.it/cet Guest Editors: Eddy de Rademaeker, Peter Schmelzer Copyright © 2016, AIDIC Servizi S.r.l., ISBN 978-88-95608-39-6; ISSN 2283-9216 What Drives Process Safety Performance? A View from Experience at BASF Hans V.Schwarz BASF SE, GUS/A - M940, Hans.Schwarz@BASF.com Process Safety performance can be defined as the absence of Process Safety incidents, particularly of severe incidents which cause fatalities, severe injuries, have significant impact on the public and the environment or cause significant financial losses. These severe incidents are too infrequent for statistical analysis. KPIs measuring smaller incidents have proven useful instruments in driving progress through monitoring number and severity of less severe incidents combined with causal analysis to establish areas of improvement. Influences of several elements of the Process Safety Management system are discussed regarding their effectiveness and side effects. This includes the discussion of broad programs vs. targeted initiatives to eliminate specific weaknesses and the importance of experience exchange between companies regarding successful practices. 1. Historical Process Safety Incidents at BASF In its 150 year history, BASF has experienced some severe incidents, namely the well known Ammonium Nitrate Explosion in Oppau 1921, resulting in over 500 fatalities, and 2 explosions from overfilled tankcars in the 1940s. From these and more recent, less severe incidents important learnings were drawn for plant design improvements as well as for elements of the Process Safety Management system. 2. Development of the Process Safety Management system An effective process safety management system provides the basis for performance improvements [VCI, 2010],[OECD, 2012]. Elements of BASF's system range from policies on how to review capital projects and existing plant units, through Management of Change, documentation requirements for the plants, Process Safety training requirements, and incident management, to specific procedures for risk assessment with BASF's semiquantitative risk matrix. The management system further draws on elements with origin in occupational safety, such as 'Root cause analysis of incidents', or 'hazard analysis of workplace'. Process Safety at BASF has traditionally been focused on plant design, assuming that plant maintenance and adherence to operational procedures is looked after by operations and technical services. One observation today is that incidents with design causes have declined and represent only 5 to 20% of LOPC (Loss of Primary Containment) incidents depending on region or site. Figure 1 shows Process Safety Incidents (PSI) on BASF's largest site, Ludwigshafen, Germany, split into 3 clusters of causes. BASF uses the PSI definition of the German chemical association VCI and the European association CEFIC, which is centered on LOPC events. While the incident rate has generally been trending downwards, the partial rates have different directions: The rate of incidents with design causes is the lowest, a reflection of efficient treatment of plant design in our safety reviews, which are described elsewhere. The rate of incidents with operational causes is the highest (e.g. left open valves) and increasing. Incidents 'with no conclusive root cause' are decreasing. DOI: 10.3303/CET1648141 Please cite this article as: Schwarz H., 2016, What drives process safety performance ? A view from experience at basf, Chemical Engineering Transactions, 48, 841-846 DOI:10.3303/CET1648141 841 Figure 1: PSI split in 3 groups by main cause cluster. For the further development of the Management system, the experience exchange with other companies is essential. One element picked up recently from a peer company is related to the use of the increasing trove of incident data to identify those plants which have above average numbers of incidents in order to focus the efforts on weaknesses of these plants. Another example was the classification of incident causes in the 3 main groups shown in chart 1, to allow strategies against clusters of causes. Generally the extraction of learnings from large sets of incident data turned out to be very useful for the fine tuning of the Process Safety Management system. 3. Performance Measurement with KPIs and some key observations The KPI based approaches require the measurement of incidents at a low enough threshold of Severity to be able to evaluate a statistically relevant number of incidents. The LOPC incidents [CEFIC, 2011] turned out to be a good basis for such analyses. In BASF these Process Safety Incidents consist mainly of releases with no other consequence (96 - 97 % of PSI). The definition of the PSI can be found in. In addition BASF uses a more 'leading' second tier LOPC indicator 'small PSI' with release thresholds at one tenth of the CEFIC threshold limits plus all minor fires, explosions. A third companywide, truly leading KPI 'AFPD' is used to measure the Activation - or Failure - of Protective Devices (SIL interlocks, mechanical protective devices (Pressure safety valves, rupture discs). The vast majority of these events are 'activations' and only a small fraction are failures (<1%), typically during testing, while failures on demand are extremely rare (<0.1%). From these data we identify weaknesses in the protection devices themselves or of the process which lead to frequent activation of the protective devices. Figure 2: Cases of frequent activation of same Z-function Our main focus was to identify protective devices which were frequently activated, and then to eliminate the underlying process weakness. E.g. in one case in 2010 we had dozens of activations per year of a SIL rated protective function reacting to high organics content in an offgas to an incineration unit, with a process shutdown on each occasion. It was found that a slightly undersized condenser of a distillation column caused the problem, and was easily replaced by a correctly dimensioned one. 0 50 100 2011 2012 2013 2014 2015 (1‐6) PSI Root Cause clusters, BASF Ludwigshafen Plant design causes Mechanical Integrity causes Operational causes No conclusive root cause 0 5 10 2010 2011 2012 2013 2014 2015 (1‐8) Cases of Activation, same protective function, BASF Ludwigshafen, 2010 ‐ 8/2015 activated 3 to 5 times activated 6 to 9 times activated 10 to 20 times activaed > 20 times 842 A Pareto chart of the PSI throughout BASF group within the first 6 months of 2015 is shown below. The smallest unit in this chart is a cluster of plants or a site reporting into one operations manager. It indicates that over half of the incidents are occurring in only 10% of the plants, and that almost three quarters of the plants did not have incidents. The reporting culture is sufficient to allow confidence in these data. Figure 3: Distribution of incidents over the sites and clusters of plants In a closer analysis we separated the PSIs of these plants into 3 groups: incidents caused by either Design shortcomings, Mechanical Integrity issues or Operational Errors (like in Chart 1). Most frequent were operational causes, and within that group valves in the wrong position, often on startup after a turnaround or a short shutdown. One conclusion was to focus on organizational systems which ensure correct valve positions. In the group of incidents with Mechanical Integrity related causes, the most frequent were flange leaks, where either the flange bolts had not been correctly tightened, or the gasket was misaligned, indicating that better quality control of the mechanical work in a shutdown would be the solution. Table 1: Frequent causes of PSI, Ludwigshafen site, 2014 % of total cause 23 % Valve in wrong position (e.g. open) 23 % Flange leakage (loose bolts, gaskets) 26 % on startup, with 50% 40% Valve in wrong position Flange leakage As a word of caution, it must be said, that not all relevant incidents are captured by this 'PSI' definition. E.g. a temperature or pressure rise in a runaway reaction, of which operators then succeed to resume full control over, and which eventually does not lead to an emission, does not fall under the PSI definition, but nevertheless represents a dangerous type of event. It is important not to lose sight of these 'high potential' incidents and to regard the minimization of LOPC incidents not as the only route to success. 4. Incident cause evaluation targeted at deriving effective improvement measures The classification of incidents in the above mentioned 3 causal groups (design, mechanical integrity, operational) follow the logic 'who can influence it ?'. - An incident cause related to plant design will hardly be eliminated by operators coming up with a better procedure. Instead these are causes which need to be managed in P&ID / design document reviews by plant and planning engineers and process safety experts/consultants. . - An incident related to lose bolts on a flange or a misaligned gasket is most likely prevented by the technical services ensuring strong quality controls in the mechanical work during an outage. The operations team has a role in defining the adequate level of these quality controls based on the hazards involved. 0% 20% 40% 60% 80% 100% 1 2 1 4 1 6 1 8 1 1 0 1 1 2 1 1 4 1 1 6 1 1 8 1 2 0 1 2 2 1 2 4 1 2 6 1 2 8 1 3 0 1 3 2 1 3 4 1 3 6 1 3 8 1 4 0 1 PSI, BASF group, first half 2015 100% 418 Sites & plants 10% of all sites & plants63% 26% of all sites & plants 843 - The incident related to a left open drain valve or bypass valve can only be avoided by the plant organization itself with systematic procedures/checklists which ensure the valve position is checked each time before startup. Thus, by splitting the incidents in these 3 causal groups a bridge is built from the analysis to actions targeted at reducing the type of incident by the appropriate organizational unit. We observe that divisions, regions or sites sometimes have different causal profiles, with dominating causes depending among other things on the type of plant being operated. Expect mechanical integrity causes to be more frequent in older plants, or when vibrations and tensions in the piping are not prevented. There is also a link to the quality of mechanical work, e.g putting flanges together. Expect operational errors to be more frequent in less automated plants or batch plants, with more manual activity, or in plants where fouling leads to frequent cleaning shutdowns. Table 2 shows the distribution of PSI causes of 4 operational divisions in the 3 main root cause clusters. Mechanical integrity is the dominating cause in the division with mainly older continuous plants, while operating errors and other operational issues dominate in the divisions with batch plants. For performance improvements the Operating Divisions are setting priorities according to these different focal areas. Table 2: Cause Distribution 2014 in 4 global Operating Divisions with different types of plants Op. Division Design Operational Mechanical A mostly batch B mostly batch C batch, conti, multi stage D mostly conti, large, old 16 15 29 24 67 64 45 36 16 21 26 40 5. Causes of incidents with elevated severity When performance is driven through an LOPC KPI, the underlying assumption is that the severe incidents have the same causes as those measured at a much lower threshold. To check that assumption we investigated our own more severe incidents as well as the historical catastrophies mentioned in the beginning of this paper. From these data we derive that in the past, incidents with a background in plant design were more frequent and caused a significant fraction of the widely known catastrophies. E.g. Oppau 1921: effects of a process change not fully understood. 1940s Tankcar overfillings: at the time no overfill protection.. Also, in recent years we find fewer incidents caused by superficial handling of plant changes due to a strong focus on Management of Change (MOC), which now seems to bear fruits and also due to a program in which all our existing plants receive a fresh safety review, starting with a 'clean sheet'. Both these programs have shown an effect on the rate of the incidents with design related causes. Chart 4 shows the regional development. Part of the improvement in MOC was already before 2010. With design causes of PSI, e.g. in our Ludwigshafen site below 10%, and the dominating causes now being of operational nature, we conclude the focus on operational errors should be the source of further improvements Fig. 4: PSI rates with cause 'design' in BASF regions Fig. 5a: Average PSI severity in 3 cause clusters 0 20 40 2011 2012 2013 2014 2015 (1‐6) % of PSI with 'design' causes, 3 regions Northamerica Southamerica Europe 0 5 10 2012 2013 2014 PSI severity rate per 1 Mio manhrs Northamerica Southamerica Europe 844 We use at BASF the severity of PSI as an additional indicator. This may contribute an answer to the question whether 'severe' incidents follow the same cause distribution as all PSI. The 'severity' of PSI is measured in 4 severity classes in the 4 categories 'Injury', 'Direct financial damage' (e.g. from fires), 'Impact on community or environment' and 'Quantity of a release'. Fig. 5a shows the development of the severity weighted rate of PSI in 3 regions, fig. 5b the average severity per cause cluster, and fig. 5c the distribution of PSI severity within cause clusters. Fig. 5b: Severity weighted PSI rate in 3 regions Fig 5c: Distribution of PSI per severity class in 3 cause clusters cause clusters. Total is 100% of PSI Despite of the use of the same management system the severity trend is not the same in all regions. Severity is spread almost evenly over the 3 cause clusters. Severity class 3 is present in all cause clusters, with the largest fraction in the operational cluster. Average severity is slightly above average in the cause cluster 'mechanical integrity', and lowest in 'design'. None of the incidents reached severity class 4 (e.g. fatalities). We conclude, that Severity is spread evenly enough between cause clusters to confirm the focus on operational causes for further improvements. 6. Performance is driven by broad programs plus initiatives focused on weaknesses What then eventually drives Process Safety performance, which, compared to the first chapter, is now redefined as the severity weighted number of 'PSI' measured in an LOPC KPI ? On the one hand, we have the broad based programs, which form the pillars of the Process Safety management system. All plants must adhere to them to ensure that safety gaps are minimized: - Periodic safety reviews of existing units, safety reviews of investment projects, Management of Change and the follow up on the arising action items are meant to ensure a plant design with minimized safety gaps. The small fraction of incidents with design related causes indicates this is a strength of our PSM system. - Plant inspections and quality assurance of mechanical work ensure good mechanical integrity of the plant. In older plants or with weak contractor management this can be the dominating cause of incidents. - Process Safety training, up-to-date plant documentation, sound procedures combined with operational discipline minimize operational incident causes, which are dominating and increasing. The continuous fine tuning of the broad based programs must be supported by strong Incident management ranging from incident- and root cause analysis, through action items follow up, and communication of learnings, to statistical evaluation for causes and regional differences. These programs should be supported by up-to-date IT tools to support transparency and discipline. BASF has recently introduced advanced software to improve the documentation of safety concepts, safety reviews and the related action item follow up and plans a major upgrade to the IT tools used in incident management. On the other hand, progress can be accelerated by complimentary elements focused on the plants with an over average number and severity of PSI, and on the most frequent incident causes. At BASF we've begun to use a 'Pareto' plot of incidents over the number of plants (fig. 2) to determine the participants in a complimentary improvement program, and evaluate the incident database for incident clusters with similar causes (example in table 1). We've identified a variety of frequent causes/incident types: Left open valves; flange leaks due to low quality of mechanical work; incidents involving certain materials, e.g toxic gas BF3; incidents related to offgas systems; a surprising frequency of implosions; a high number of overfillings in 0 0,5 1 1,5 2 PSI w. Mechanical Integrity cause PSI w. Design cause PSI w. operational cause Average Severity of PSI in 3 cause clusters BASF group, 1‐8/2015 0 5 10 15 20 PSI w. Mechanical Integrity cause PSI w. Design cause PSI w. operational cause PSI no Root cause assigned PSI severity / cause cluster, BASF group, 1‐8/2015 Severity 1 (1 pt.) Severity 2 (3 pt.) Severity 3 (9 pt.) Severity 4 (27 pt.) 50 % 0,3 % 1,8 % 0,8 % 845 one region; etc. By addressing these incident clusters and the focus on plants with above average number of PSI, we speed up our performance improvement beyond the rate achievable from the broad based programs alone. Besides the global KPI AFPD, from which we identify process design weaknesses, we encourage the local use of leading KPIs, such as overdue action items, or Process Safety training status, etc. We also use other 'leading systems', such as companywide EHS audits with a strong Process Safety chapter, which ensure that corporate requirements are followed. Furthermore, technical inspection programs are followed with indicators in the respective technical departments. Following the recommendations of the European Process Safety Center [EPSC, 2012], we plan to use more leading indicators to measure the incident management process, the safety review work process and implementation of action items, supported by the respective IT systems. 7. Bottlenecks, Prioritization and Synergies with 'Operational Excellence' The broad based programs in the chapter above are designed to drive performance alongside fulfilling regulatory requirements. In a company exposed to international competition, the most frequent limitation is resource bottlenecks. Plant maintenance, efficiency improvement projects, capacity expansion projects are performed essentially by the same people who are involved in safety reviews and in the implementation of safety related action items. From this inevitable conflict arises the need for prioritization. We split safety action items into priorities 1 and 2. Priority 1 items enjoy priority over economically driven activities. They are all those which our risk matrix classifies as 'unacceptable' risks, usually implying SIL rated interlocks or mechanical protective devices (pressure safety valves, rupture discs). Priority 2 items are rated 'temporarily tolerable' and can be temporarily mitigated by organisational measures, until the permanent solution is implemented. Process Safety improvements often are in synergy with economic performance improvements as the Center for Chemical Process Safety has pointed out [CCPS, 2006]. - The 'On-stream-time' of plants benefits when plant shutdowns through LOPC events are minimized - Preventive maintenance will reduce safety incidents, but also unplanned downtime from failing equipment. - Competent personnel in operations and the disciplined use of operating procedures help to avoid PSIs, but also support more reliable and efficient plant operation closer to optimal utility usage, yield and product quality. - Solving the process problems, which are causing frequent activations of safety interlocks, not only makes the plant safer, but also reduces downtime and reduces work with starting the plant back up. These synergies would be worth more investigation, e.g. by correlating Process Safety performance and operational excellence parameters. 8. Conclusion Process safety performance is driven by the disciplined execution of key elements of the Process Safety Management system, complimented by programs targeted at focal points. The Process Safety management system at BASF traditionally comprises elements primarily focused on the technical design of the plants. Recent increasing focus on statistical incident evaluations has shown that incidents caused by shortcomings in Mechanical Integrity (MI) or plant operation are now the leading causes. Besides the design reviews, the focus on MI and operational causes is expected to drive Process Safety performance at BASF in coming years. Complimentary programs focus on the operating plants with the most incidents and on the most frequent causes, identified from analyzing large sets of incident data. This is supported by the introduction of better IT tools, which facilitate making sense of large data sets and the use of leading indicators. References CCPS, New York, USA, 2006, 'The business case for process safety', ISBN 0-8169-1026-x, http://www.aiche.org/sites/default/files/docs/pages/ccpsbuscase2nded-120604133414-phpapp02.pdf CEFIC, Brussels, Belgium, 2011, 'Guidance of Process Safety Performance Indicators', http://www.cefic.org/Documents/IndustrySupport/RC%20tools%20for%20SMEs/Document%20Tool%20Bo x/Guidance%20on%20Process%20Safety%20Performance%20Indicators.pdf EPSC, Rugby, United Kingdom, 2012, 'Making the case for leading indicators in Process Safety', http://www.epsc.org/data/files/indicators/PSI_Leaflet_Making_The_Case.pdf OECD, Paris, France, 2012, 'Corporate Governance for Process safety, Guidance for Leaders in high hazard industrieshttp://www.oecd.org/chemicalsafety/chemical- accidents/corporategovernanceforprocesssafety.htm VCI, Frankfurt, Germany, 2010, 'VCI Empfehlungen zur Sicherheitskultur in Unternehmen der chemischen Industrie', http://bit.ly/1NZfjzI (last accessed on 29.4.2016) 846