CHEMICAL ENGINEERING TRANSACTIONS VOL. 90, 2022 A publication of The Italian Association of Chemical Engineering Online at www.cetjournal.it Guest Editors: Aleš Bernatík, Bruno Fabiano Copyright © 2022, AIDIC Servizi S.r.l. ISBN 978-88-95608-88-4; ISSN 2283-9216 A Risk Aspect of Periodic Testing on Pressure Relief Valves Shenae Leea,d,*, Anne Barrosb, Mary Ann Lundteigenc, Nicola Paltrinieria a Dept. of Mechanical and industrial engineering, NTNU, Trondheim, Norway b c Laboratoire Génie Industriel, CentraleSupélec, Paris, France Dept. of Engineering Cybernetics, NTNU, Trondheim, Norway d Dept. of Software Engineering, Safety and Security, SINTEF Digital, Trondheim, Norway shenae.lee@sintef.no A pressure relief valve (PSV) is a key safety barrier to prevent the catastrophic rupture of pressure equipment in a process plant. The safety function of a PSV is to open and relieve the pressure when the equipment pressure exceeds the predefined set point. To achieve the desired availability of the PSV function, periodic function testing is regularly performed to confirm the correct functioning of a PSV. If a fault of the PSV function is detected by a function test, the PSV is repaired to a functioning state. For this reason, the interval between function tests has a direct influence on the probability of failure on demand (PFD) of the PSV function. On the other hand, unwanted leakage can occur due to human errors made during the preparation prior to a test and the reinstatement after the test. Such leakage is not desired due to the potential for being ignited and causing a major accident, but this aspect is often not considered in the availability assessment of PSVs. Therefore, this paper suggests a multi-phase Markov approach that can estimate the PFD of a PSV as well as the frequency of the leaks induced by the periodic tests. The suggested approach may be suitable for supporting the decision about the test interval for a PSV, considering both reliability and risk effect of extending the function test interval. 1. Introduction Pressure relief valves (PSVs) are used in process plants to protect against overpressure in pressure equipment (Bukowski et al., 2020). A PSV plays an important role for in the prevention of a catastrophic rupture due to overpressure (Hellemans, 2009). The safety function of a PSV is to open at the set pressure and relieve the excess pressure in the equipment (MacKay and Pillow, 2011). The inability to perform this required safety function is identified as the failure mode, fail to open on demand (FTO). The average probability of failure on demand (PFDavg) due to FTO is used as a safety performance measure for a PSV (Bukowski et al., 2018). FTO is primarily caused by the corrosion of the valve parts (e.g. spring, valve seat) and may be hidden until the valve is demanded or proof-tested (Mitchell et al., 2013). For this reason, periodic proof-testing (function testing) of a PSV is an important means to detect possible FTO. When a test reveals FTO of a PSV, repair actions are initiated to restore the valve to a functioning state. To test a PSV for FTO, the valve is generally mounted on the test block that applies the test pressure until it pops. For this reason, PSVs are periodically removed from the plant section and transported to the maintenance shop (API RP 576, 2017). Before a PSV is removed for testing, the section associated with the PSV should be isolated from the plant stream. The purpose of an isolation is to separate the process hazards from a plant section such that intrusive activities like inspections can be performed (HSG 253, 2005). An isolation work consists of a sequence of steps, which typically includes draining/venting of the equipment, arranging isolation valves and the installation of blinds (Vinnem and Røed, 2014). After completion of the and repair, the section is reinstated. This requires reversing the isolation, or de-isolation, which includes removing the installed blinds and repressurizing of the equipment. Isolation and reinstatement activities are carried out by maintenance personnel, and the human errors made during these activities are the main contributors to unwanted leakage of hazardous substances (HSG 253, 2005). For example, errors during the implementation of the isolation plan are the causes for a gas leak in offshore oil and gas installations (Sklet et al., 2006). Such leakage is not desired from the safety point of view, due to the potential for being ignited and increasing the major accident risk. According to Vinnem and Røed (2015) and Vinnem et al. (2016), preventive DOI: 10.3303/CET2290025 Paper Received: 15 January 2022; Revised: 13 March 2022; Accepted: 14 May 2022 Please cite this article as: Lee S., Barros A., Lundteigen M.A., Paltrinieri N., 2022, A risk aspect of periodic testing on pressure relief valves, Chemical Engineering Transactions, 90, 145-150 DOI:10.3303/CET2290025 145 maintenance of PSVs has a substantial contribution to the leak frequencies in the Norwegian offshore industry. One reason may be that there are high number of PSVs and correspondingly, high number of isolation and reinstatement activities associated with periodic maintenance. They also address that the industry has focused mainly on the benefit from frequent testing of valves rather than adverse impact. In light of this, extending the test interval will reduce the risk from the unwanted leaks that can occur during maintenance activities. Less frequent testing, however, will also increase the PFD of the PSV function, which implies an additional risk related to overpressure during the operation. Therefore, both positive and negative risk effects of function testing should be considered in the decision-making about the optimal test interval of PSVs. Several authors propose reliability models that consider the effects of different testing strategies on the PFD of safety instrumented functions (Brissaud et al., 2012; Srivastav et al., 2020; Wu et al., 2018). Some authors propose analytical approaches for optimizing the test interval, considering the accident risks arising from the loss of safety functions (Vatn and Aven, 2010; Vaurio, 1995). The aforementioned approaches, however do not explicitliy consider the accidents that occurs during maintenance. On the other hand, some authors suggest approach to quantify risks contribution from maintenance, but without availability modeling of the safety functions (Noroozi et al., 2013; Okoh et al., 2016). Therfore, this paper presents an approach that can analyze both the PFD of the PSV safety function and the frequency of the test-induced leaks. The approach takes into account the contribution of the failure mode FTO to the PFD of a PSV as well as the risk contribution from failure mode external leakage (EL) introduced during periodic maintenance. A multi-phase Markov approach is used to model the effect of maintenance activities on the state of a PSV. Furthermore, analytical formulas are suggested to calculate the time-dependent PFD with respect to FTO and the frequency of EL at a given test interval. 2. Markov approach The Markov approach is suitable for analyzing the change in the system states over time. If the evolution of a system is modelled by a Markov process, the changes of the system states is a stochastic process with memoryless property. In other words, the transition from one state to another state in the future depends only on the present state and the time for making the transition, and the process has time-homogeneous transition probabilities. A Markov process is defined by the state space, transition rate matrix (Rausand and Høyland, 2004). A system may possess the Markov property only within a certain time frame but does not fulfil the Markov property at different points in time. For example, the state of a system may change due to testing and maintenance actions that are executed at the predetermined timepoints. This means that the state of the system changes immediately after a repair action or a test. In this case, the effect of a test or a repair can be modelled by a transition probability matrix. The system behaviour between the two consecutive intervals can be modelled by a standard Markov model with a transition rate matrix. Such an approach is called a multi-phase Markov approach (ISO/TR 12489, 2013; Rausand, 2014). In a multi-phase Markov approach, we define a finite number of state spaces, and each state space corresponds to a phased time period (e.g. normal operation phase, and maintenance phase) and has its own transition rate matrix (Barros, 2017). Table 1 summarizes the main differences between the standard Markov and Multi-phase Markov approach. Table 1: A comparison of Markov approach and Multi-phase Markov approach Markov approach Multi-phase Markov approach - The system behavior (i.e. the state distribution of the system) is defined by a single constant transition rate matrix. - Repair action is assumed to be initiated immediately after a failure. - The time to repair is exponentially distributed. - The system behavior can be defined by different transition rate matrixes. -The effect of maintenance action can be modeled by a transition probability matrix that links two phases. - The time spent for testing and maintenance activities can be deterministic. 3. Case study In this case study, the behaviour of a single PSV is modelled using a multi-phase Markov approach. Two failure modes of a PSV, fail to open on demand (FTO) and external leakage (EL) are considered. The failure mode FTO is caused by deterioration like corrosion and mechanical damage during the operations phase. can occur due to human errors during the isolation and reinstatement activities. 146 3.1. A multi-phase Markov model for a PSV The possible states of a PSV are defined as in the table 2, and the state transition diagram is shown in the Figure 1. Table 2: State of the proposed model State Heading 2 1 PSV is functioning. 2 PSV has the failure mode EL. 3 PSV has the failure mode FTO, and FTO is not detected. 4 PSV has the failure mode FTO and EL, and FTO is not detected. 5 FTO of PSV is detected. 6 FTO of PSV is detected and PSV has EL failure mode. Figure 1: State transition diagram during the operations phase The main assumptions for this multi-phase Markov model include: • The failure rate with respect FTO, 𝜆𝜆𝐹𝐹𝐹𝐹𝐹𝐹 is assumed to be constant. • FTO and EL are independent failure modes, and they can occur randomly during the operations phase. • If EL occurs during the operations phase, it is detected immediately. Then, the repair of EL begins at once, and its duration is exponentially distributed. The mean time to repair is 8 hours. • Four maintenance phases are defined to explicitly model the effect of four different types of activities: isolation, testing, repair, and reinstatement. Maintenance phase j+1 starts immediately after the end of maintenance phase j. Each maintenance phase has the constant duration of one shift (i.e. 8 hours). • In the maintenance phase 1, isolation activity is performed, as preparation for testing in maintenance phase 2. EL may occur upon the completion of the isolation activity, as a result of human errors. Human error probability (HEP) value during isolation is denoted as α. • In maintenance phase 2, function-testing is performed. A test is not perfect, and the probability that a test detect FTO is 1 − 𝛾𝛾. 𝛾𝛾 denotes the HEP that the test crew makes error during the test. • In the maintenance phase 3, repair actions are initiated if FTO is detected in maintenance phase 2. If FTO is not found, preventive maintenance (e.g. cleaning, lubrication) is executed. Repair is not perfect. Repair action requires the same level of competence as testing. The competence of repair crew is considered to be on the same level as the test crew. Thus, the HEP for repair action is 𝛾𝛾, which is the same as the HEP for testing. For this reason, FTO is repaired with the probability of 1 − 𝛾𝛾. • In the maintenance phase 4, reinstatement activity is performed. EL can occur as a result of human errors during reinstatement activity. Reinstatement activity requires the same level of competence as isolation. The competence of crew performing the isolation is on the same level of the crew performing the reinstatement. The HEP for reinstatement activity is α, which is the same as the HEP for isolation. Based on the assumptions above, the time evolution of a PSV during operations phase is described by the Markov process with the transition rate matrix A. The transition probability matrixes M1, M2, M3, M4 are defined. The transition probability matrix 𝑴𝑴𝑗𝑗 are used to describe the effect of the actions executed in maintenance phase j on the state of the PSV. M1 and M4 are equivalent because human errors during the isolation and reinstatement have the same effect on the possible state changes. -(λ λ λ λ 0 0 0 -( λ 0 λ 0 0 0 0 -λ λ 0 0 = 0 0 - 0 0 0 0 0 0 -λ λ 0 0 0 0 - + ) μ μ + ) μ μ μ μ EL FTO EL FTO FTO FTO EL EL EL EL Α                         147 2 1 4 3 λ EL λ EL µ µ λ FTO λ FT O 6 5 λ EL µ 3.2. Analytical formula In this section, the analytical formulas for the multi-phase Markov are proposed. We denote X(t) the state of PSV at time t, and the probability of being in the state i at time t is denoted as ( ) Pr( ( ) )iP t X t i= = The state distribution of a PSV at time t is be denoted in a row vector, [ ]1 2 3 4 5 6( ) ( ) ( ) ( ) ( ) ( ) ( )P t P t P t P t P t P t P t= The PSV is in state 1 at time t=0. The interval 𝑡𝑡 ∈ [0, 𝑇𝑇1], is operation phase 1. In this interval he state of PSV is given in Eq(1). 1( ) (0) , [0, ] AP P tt e t T= ∈ (1) At time t= T1, operation phase 1 ends, the state probability is given by 11( ) (0) AP P TT e= Immediately after the time t= T1, maintenance phase 1 begins where isolation activity is executed, with the duration of m1, such that maintenance phase 1 finishes at 𝑇𝑇1 + 𝑚𝑚1 . It is assumed that the state of PSV does not change before the completion of the activity. Thus, the state probability within time interval 𝑡𝑡 ∈ (𝑇𝑇1, 𝑇𝑇1 + 𝑚𝑚1−] is given as 1 1 1 1( ) ( ), ( , ]P Pt T t T T m −= ∈ + where 𝑇𝑇1 + 𝑚𝑚1− denotes the time immediately before 𝑇𝑇1 + 𝑚𝑚1 . Then, state of PSV at 𝑇𝑇1 + 𝑚𝑚1 is given in Eq. (2) 1 1 1 1 1( ) ( )P P MT m T m −+ = + ⋅ (2) Immediately after the time 𝑇𝑇1 + 𝑚𝑚1 , maintenance phase 2 begins. The state of PSV at 𝑇𝑇1 + 𝑚𝑚1 + 𝑚𝑚2 , immediately after maintenance phase is given 1 1 2 1 1 2 2 1 1 1 2( ) ( ) ( )P P M P M MT m m T m m T m − −+ + = + + ⋅ = + ⋅ ⋅ In the same way, the state of PSV at 𝑇𝑇1 + 𝑚𝑚1 + 𝑚𝑚2 + 𝑚𝑚3 + 𝑚𝑚4 or 𝑇𝑇1 + 𝑚𝑚𝑡𝑡𝑡𝑡𝑡𝑡 is given 1 1 1 2 3 4 1 1 2 3 4( ) ( ) (0) 1 AP P P M M M MTtotT m m m m T m e+ + + + = + = ⋅ ⋅ ⋅ ⋅ At 𝑇𝑇1 + 𝑚𝑚𝑡𝑡𝑡𝑡𝑡𝑡 , the operation phase 2 begins. After the time elapse of the periodic testing interval τ, or at 1 2tott T m Tτ= + + = , the operation phase 2 ends. In the interval 1 2( , ]tott T m T∈ + the time evolution of the unit is described by the Markov process with the transition rate matrix A. The same applies for the operation phase n. Then, the PSV state in the operation phase n is given as 1 1 4 ( ( )) 1 1 ( ) (0) , [ , ]AAP P M n tot n t T m j n tot n j t e e t T m Tτ − − − + − =   = ∈ +     ∏ The state during the maintenance phase 1 after the operations phase 1 is given by 1 1 - α α 0 0 0 0 0 1 0 0 0 0 0 0 1 - α α 0 0 Μ = 0 0 0 1 0 0 0 0 0 0 1 - α α 0 0 0 0 0 1                         2 1 0 0 0 0 0 0 1 0 0 0 0 0 0 γ 0 1 - γ 0 Μ = 0 0 0 γ 0 1 - γ 0 0 0 0 1 0 0 0 0 0 0 1                         3 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 Μ = 0 0 0 1 0 0 1 - 0 0 0 1 1 - 0 0 0 0 γ γ γ γ                         148 1 4 1 1 ( ) (0) , [ , ]A AP P M n j n n j t e e t T T mτ τ − − =    = ∈ +      ∏ The state after the maintenance phase k is given, for k=1,2,3,4 1 4 1 1 1 ( ... ) (0) A AP P M M n k n k j j j j T m m e eτ τ − = =    + + + =      ∏ ∏ The state during the maintenance phase k after the operations phase n is given, for k=2,3,4 1 4 1 1 1 1 1 1 1 ( ) (0) , [ ... , ... ]A AP P M M n k j j n k n k k j j t e e t T m m T m m mτ τ − − − − − = =    = ∈ + + + + + + +      ∏ ∏ 4. Result The state probabilities are calculated by implementing the formulas in MATLAB®. Figure 2 shows time- dependent state probabilities, 𝑃𝑃2(𝑡𝑡), 𝑃𝑃3(𝑡𝑡) and 𝑃𝑃4(𝑡𝑡)when the periodic test interval τ = 1 (year), the HEP for testing 𝛾𝛾 = 0.1, the HEP for isolation or reinstatement activity, α =0.1. The value for 𝜆𝜆𝐹𝐹𝐹𝐹𝐹𝐹 for PSV is assumed to be 2.1∙10-6 per hour (OREDA, 2015). The calculation results show 𝑃𝑃2(𝑡𝑡) increases abruptly after the maintenance phase 1 and maintenance phase 4, because EL can be introduced during isolation and reinstatement and the PSV enters state 2. In the same way, 𝑃𝑃4(𝑡𝑡) and 𝑃𝑃6(𝑡𝑡) increase immediately after the maintenance phase 1 and maintenance phase 4. To obtain the estimates for the leak frequencies, the sum of the visit frequencies to the state 2, state 4, and state 6 is calculated. Figure 3 shows the estimated leak frequencies for the selected value of α = 0.1, which is the upper HEP value for complex tasks (Kirwan, 1994; Williams, 1986). α = 0.05 is chosen to demonstrate that a lower HEP value will result in the estimated leak frequency. PSV has the failure mode FTO in state 3 and state 4. 𝑃𝑃3(𝑡𝑡) increases within the interval between the two consecutive periodic tests (operation phase), and decreases after the maintenance phase 3 (after a repair action is completed). 𝑃𝑃4(𝑡𝑡) is very small as shown in figure 2. For this reason, we consider 𝑃𝑃3(𝑡𝑡) in the calculation of average PFD with respect to FTO, denoted as PFDavg. 𝑃𝑃3(𝑡𝑡) increases almost linearly with the passage of time in the operations phases, and thus 𝑃𝑃3(𝜏𝜏/2) is considered as the PFDavg in operations phase 1. It is noted that 𝑃𝑃3(𝑡𝑡) is not zero at the start of the operations 2,3,4… because of the imperfectness of testing and repair, which is addressed in section 3. For this reason, PFDavg value in operation phase 2, 𝑃𝑃3(𝑇𝑇1 + 𝜏𝜏/2) as the PFDavg is slightly higher than 𝑃𝑃3(𝜏𝜏/2). The PFDavg goes up when the length of the test interval increases. The PFDavg is 0.011 with τ = 1 year while the PFDavg is 0.1018 with τ = 10 years, which means the PFDavg is in the same order of magnitude when τ = 1 up to 9 years. On the other hand, the estimated leak frequency decreases when the length of the test interval is extended, as illustrated in Figure 3. According to the calculation results, extending the test interval from 1 year up to 4 years leads to meaningful reduction in the leak frequency. The calculation results in this case study are as follows: PFDavg = 0.022 with τ = 2 years, PFDavg = 0.032 with τ = 3, and PFDavg = 0.043 with τ = 4. Considering that the standard PFD for PSV in the industry is 0.04 in the Norwegian petroleum industry (PSA, 2016), it may be argued that optimal function test interval is 2 to 3 year, for reducing the test- induced leak frequency, while meeting the desired PFDavg value for a PSV. Figure 2: Time-dependent state probabilities (state 2, state 3 and state 4) Figure 3: The estimated leak frequency with different length of the testing interval 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 1 2 3 4 5 6 7 8 9 10 Length of test interval (Year) Estimated leak frequency per year HEP=0.1 HEP=0.05 149 5. Conclusion This paper presents an approach for availability modeling of a PSV, where adverse impact of periodic testing is examined. A multi-phase Markov approach is used to explicitly model the leakage resulting from isolation and reinstatement activities, as well as the imperfect testing and repair. Moreover, analytical formulas are developed to calculate the state probabilities of the multi-phase Markov model. The numerical results of the formulas are used to estimate the PFDavg of the PSV function and the frequency of test-induced leaks. The case study shows that the frequency of unwanted leaks can be reduced significantly by extending the test interval, while meeting the desired PFDavg of the PSV function. The suggested approach represents a useful extension of availability modeling, which may be used to support decision-making about the optimal function testing interval for a periodically tested safety barrier. References API RP 576, 2017. Inspection of Pressure-relieving Devices. American Petroleum Institute, Washington, DC. Barros, A., 2017. Markov and multiphase markov, a part of a compendium for the course on maintenance optimisation. Lectur note, Norwegian University of Science and Technology, Trondheim,Norway. Brissaud, F., Barros, A., Bérenguer, C., 2012. Probability of failure on demand of safety systems: impact of partial test distribution: http://dx.doi.org/10.1177/1748006X12448142 226, 426–436. https://doi.org/10.1177/1748006X12448142 Bukowski, J. V., Goble, W.M., Gross, R.E., Harris, S.P., 2018. Analysis of spring operated pressure relief valve proof test data: Findings and implications. Process Saf. Prog. https://doi.org/10.1002/prs.12006 Bukowski, J. V., Gross, R.E., Harris, S.P., Goble, W.M., 2020. Failure assessment of spring-operated pressure relief valve proof test data for extending time-in-service. Am. Soc. Mech. Eng. Press. Vessel. Pip. Div. PVP 6, 1–8. https://doi.org/10.1115/PVP2020-21792 Hellemans, M., 2009. The Safety Relief Valve Handbook: Design and Use of Process Safety Valves to ASME and International Codes and Standards, The Safety Relief Valve Handbook: Design and Use of Process Safety Valves to ASME and International Codes and Standards. https://doi.org/10.1016/C2009-0-20219-4 HSG 253, 2005. The safe isolation of plant and equipment. Health and Safety Executive, London. ISO/TR 12489, 2013. Petroleum, petrochemical and natural gas industries — Reliability modelling and calculation of safety systems. International Organization for Standardization, Geneva. Mitchell, E.M., Gross, R.E., Harris, S.P., 2013. Evaluating Risk and Safety Integrity Levels for Pressure Relief Valves Through Probabilistic Modeling. J. Press. Vessel Technol. 135, 021601. https://doi.org/10.1115/1.4007959 Noroozi, A., Khakzad, N., Khan, F., Mackinnon, S., Abbassi, R., 2013. The role of human error in risk analysis: Application to pre- and post-maintenance procedures of process facilities. Reliab. Eng. Syst. Saf. https://doi.org/10.1016/j.ress.2013.06.038 Okoh, P., Haugen, S., Vinnem, J.E., 2016. Optimization of recertification intervals for PSV based on major accident risk. J. Loss Prev. Process Ind. https://doi.org/10.1016/j.jlp.2016.09.003 OREDA, 2015. Offshore reliability data handbook, 6th editio. ed. OREDA Participants, Available from: Det Norske Veritas, NO 1322 Høvik, Norway. PSA, 2016. Trends in risk level in the petroleum activity (RNNP) [WWW Document]. Rausand, M., 2014. Reliability of Safety-Critical Systems, Wiley. Wiley, Hoboken,NJ. Rausand, M., Høyland, A., 2004. System reliability theory: models, statistical methods, and applications, 2nd ed. Wiley, hoboken,NJ. Sklet, S., Vinnem, J.E., Aven, T., 2006. Barrier and operational risk analysis of hydrocarbon releases (BORA- Release). Part II: Results from a case study. J. Hazard. Mater. 137, 692–708. Srivastav, H., Barros, A., Lundteigen, M.A., 2020. Modelling framework for performance analysis of SIS subject to degradation due to proof tests. Reliab. Eng. Syst. Saf. https://doi.org/10.1016/j.ress.2019.106702 Vinnem, J.-E., Røed, W., 2014. Norwegian Oil and Gas Industry Project to Reduce Hydrocarbon Leaks. SPE Econ. Manag. 6, 088–099. https://doi.org/10.2118/164981-PA Vinnem, J.E., Haugen, S., Okoh, P., 2016. Maintenance of petroleum process plant systems as a source of major accidents? J. Loss Prev. Process Ind. https://doi.org/10.1016/j.jlp.2016.01.021 Vinnem, J.E., Røed, W., 2015. Root causes of hydrocarbon leaks on offshore petroleum installations. J. Loss Prev. Process Ind. https://doi.org/10.1016/j.jlp.2015.05.014 Wu, S., Zhang, L., Barros, A., Zheng, W., Liu, Y., 2018. Performance analysis for subsea blind shear ram preventers subject to testing strategies. Reliab. Eng. Syst. Saf. https://doi.org/10.1016/j.ress.2017.08.022 150 A Risk Aspect of Periodic Testing on Pressure Relief Valves