DOI: 10.3303/CET2291020 Paper Received: 28 November 2021; Revised: 23 February 2021; Accepted: 15 March 2022 Please cite this article as: Yi J., Mahgerefteh H., Martynov S., 2022, Estimating the Failure Probability of CO2 Pipeline as Part of Carbon Capture and Storage Chain, Chemical Engineering Transactions, 91, 115-120 DOI:10.3303/CET2291020 CHEMICAL ENGINEERING TRANSACTIONS VOL. 91, 2022 A publication of The Italian Association of Chemical Engineering Online at www.cetjournal.it Guest Editors: Valerio Cozzani, Bruno Fabiano, Genserik Reniers Copyright Β© 2022, AIDIC Servizi S.r.l. ISBN 978-88-95608-89-1; ISSN 2283-9216 Estimating the Failure Probability of CO2 Pipeline as Part of Carbon Capture and Storage Chain Jiahuan Yi, Haroun Mahgerefteh*, Sergey Martynov Department of Chemical Engineering, University College London, London WC1E 7JE, UK h.mahgerefteh@ucl.ac.uk This paper presents the development of an analytical method for predicting the Cumulative Distribution Function (CDF) for CO2 pipeline puncture failures based on fitting the Weibull distribution to the failure hole size data in the Pipeline and Hazardous Material Safety Administration (PHMSA) historical database using the Maximum Likelihood Estimator (MLE). The method starts with obtaining the minimum acceptable sample size for acquiring a reliable MLE through assessing the quality of the MLE as a function of the data sample size using the Mean Squared Error (MSE). For low quality MLE, the bootstrapping method is employed to enhance the confidence of the distribution fitting by calculating the 95% Confidence Interval (CI) of the MLE. The minimum acceptable sample size is then compared with the number of the database CO2 hole size data to decide whether the bootstrapping is needed. The results show that the sample data available are far less than what would be required for obtaining a reliable MLE and hence the bootstrapping method is applied to acquire a range of CDFs that may be considered valid for representing the probability distribution of CO2 pipeline failure hole sizes. The resulting CDF range shows that at least 70% of the failure holes are smaller than 0.25 of the pipe internal diameter for CO2 pipelines. 1. Introduction As part of the quest for decelerating climate change, Carbon Capture and Storage (CCS) is considered as one of the key technologies needed to decarbonise the energy sector and industrial processes (IPCC, 2018). As part of the CCS chain, pressurised pipelines are widely recognised as the most practical and economical method for transporting the huge amounts of the captured CO2 from large emission sources such as fossil fuel power plants for the subsequent geological storage. It is estimated (Element Energy, 2010) that the global demand for CO2 pipelines will reach approximately 500,000 km in length by 2050. Given that CO2 is toxic at high concentrations (Kruse and Tekiela, 1996), the risk assessment of CO2 pipeline in the unlikely event of pipeline failure is of paramount importance to ensure the successful large-scale deployment of CCS. The above requires calculating the consequences of loss of containment and the corresponding probability of occurrence to estimate the individual and societal risks (Goodfellow et al., 2012). To this end, many studies have focused on collecting pipeline failure statistics based on several parameters such as failure types and initiation mechanisms (Muhlbauer, 2004). Although the pipeline failure hole size is the critical parameter that governs the mass release rate and hence the consequences associated with pipeline failure, the probability of occurrence of failure based on the hole size has received little attention. Central to the above is the accurate prediction of the probability distribution of the failure hole size. In previous studies, the probability distribution of the failure hole size is often obtained based on dividing the failure counts of a certain size range by the total failure counts recorded in historic databases. Medina et al. (2012), for example, employed this method to determine the release probability as a function of hole size based on the failure statistics from the CONCAWE database (CONCAWE, 2011) for a risk-based optimisation of on-shore pipeline shutdown systems. In the above study, the through-wall holes following pipeline failure are assumed to have only three sizes including 10 mm, 40 mm and full bore rupture, with their probabilities of occurrence being respectively reported as 59, 29 and 12%. Rusin and Stolecka (2015) on the other hand adopted the method to calculate the frequencies of possible consequences following pipeline failure for a study of optimal safety valve 115 spacing for CO2 and hydrogen pipelines. The failures are assumed to be either puncture or rupture, with their ratio (puncture/rupture) being simply taken as 9:1 based on literature data. Given the simplicity, the above method only presents a rough estimation for the probability distribution of the failure hole size and hence the risks associated with the pipeline failure. The validity of using such results for the subsequent quantitative risk assessment remains unclear. In addition, the statistical quality of the sample data used to derive the probability distributions is not examined. Such a problem is challenging for CO2 pipelines due to the relatively small pipeline failure statistics available. In particular, it is unclear what size of statistical samples suffices for reliable risk assessment. In light of the above, this study describes the development of an analytical method for constructing a credible probability distribution for CO2 pipeline failure hole size. This involves (a) using the Maximum Likelihood Estimator (MLE) to fit statistical distribution functions to historical hole size data, and (b) performing simulation tests to assess the quality (statistical significance) of the MLEs based on the data sample size. When the MLE quality is low, a bootstrapping method, which artificially inflates the sample size, is employed to calculate the Confidence Interval (CI) of the MLE, from which a range of possible probability distributions representing the probability of occurrence of failure for different hole sizes can be obtained. 2. Methodology 2.1 Data review There are many organisations (e.g. CONCAWE) publishing pipeline failure statistics globally, but few provide detailed information on the size of through-wall failure holes. This study adopts the Pipeline and Hazardous Material Safety Administration (PHMSA, 2020) database where such information is available. In the database, there are 57 CO2 pipeline loss of containment incidents in total, but only 18 of them are valid for this study. This is due to the fact that the rest are leaks from pipeline flanges and infrastructure equipment (e.g. relief valves, compressors) rather than through-wall holes and therefore are not relevant to the scope of this study. In the PHMSA database, the size of the through-wall hole is measured in the circumferential and longitudinal lengths of the hole. In order to obtain the representative size of the hole, these lengths are converted into the Equivalent Hole Diameter (EHD) as follows, assuming the hole is an oval (Koch, 2008): EHD = 1.55 𝐴0.625 𝑃0.25 (1) where 𝐴 and 𝑃 are respectively the cross-section area and perimeter of the oval hole calculated based on its circumferential and longitudinal lengths. To normalise the hole size, the Relative Hole Diameter (RHD) is defined: RHD = EHD 𝐷𝑖𝑛 (2) where 𝐷𝑖𝑛 is the internal diameter of the pipe. 2.2 Statistical distribution model The probability distribution is often expressed in the form of Cumulative Distribution Function (CDF). In this study, the Weibull distribution (Weibull, 1951) is employed as the potential statistical function to represent the probability distribution of the hole size given its extensive application in reliability engineering for the assessment of pipeline failure rates. The CDF of the Weibull distribution takes the form: 𝐹(π‘₯; 𝛼, 𝛽) = 1 βˆ’ 𝑒 βˆ’( π‘₯ 𝛼 ) 𝛽 (3) where 𝛼 is the scale parameter that stretches or squeezes the Weibull distribution graph and 𝛽 is the shape parameter that determines the general shape of the graph. 2.3 Distribution fitting The Weibull distribution is fitted to the failure hole size data obtained from the PHMSA database. The fitting parameters are obtained using the MLE method. The method involves estimating the fitting parameters for a given set of observed data by finding the parameter values that will most likely generate the observed data. The MLE is among the most dependable statistical estimators for distribution fitting (Ginos, 2009). Some appealing features of it include that it is consistent, efficient and asymptotically normal (Long and Freese, 2006). However, these features have been only proven to hold provided the number of data being used in the estimation process is large enough (Ji, 2020). 116 To address the small sample issue, the bootstrapping method, which can artificially inflate the sample by random sampling with replacement, is employed to calculate the Confidence Interval (CI) of the MLE. The bootstrapping method has been employed by many authors for enhancing the confidence of maximum likelihood estimation when the sample size is too small (see for example Tsagkanos, 2008; Wei and Li, 2019). The bootstrapping process for calculating the CI of the MLE comprises the following steps: 1) resampling the original data sample with replacement to create resampled datasets that have the same size as the original sample; 2) computing the MLE of each resampled dataset; 3) calculating the CI of the MLE based on the collection of MLE values obtained from step 2). The above process will give a range of value where the unknown parameter is expected to lie. 2.4 Minimum acceptable sample size To determine the sample size appropriate for employing the MLE for distribution fitting, the quality of MLE is assessed using the Mean Squared Error (MSE). The MSE is considered an excellent general-purpose error metric for numerical predictions (Neill and Hashemi, 2018) and is widely adopted in the study of MLE. Mathematically, the MSE of the MLE to an unknown parameter is defined as the addition of the variance and bias squared: MSE(πœƒ) = Variance(οΏ½Μ‚οΏ½) + Bias2(πœƒ, πœƒ) (4) where πœƒ is the unknown parameter and the superscript Μ‚denotes the MLE to πœƒ. In general, an estimator that has an MSE close to zero is considered of high quality. Regarding the present study, simulation tests are performed to investigate the quality of MLE based on computing the MSEs for different sample sizes. The tests involve: 1) for a given sample size, calculating the corresponding MLE using data randomly sampled from a Weibull distribution; 2) repeating step 2) for a large number of times and computing the corresponding MSE. 3. Results and discussion 3.1 Minimum acceptable sample size To investigate the quality of MLE based on the sample size for the scale and shape parameters of Weibull distribution, two simulation tests as described in Section 2.4 are performed. The selected values of the parameters for the simulation tests are summarised in Table 1. For each parameter, three different values are tested. The tested sample sizes are 10, 20, 30, …, 100, 200, …, 500. Table 1: The selected values of the Weibull scale and shape parameters being used in the two simulation tests for investigating the quality of MLE based on the sample size. Test no. Tested parameters Value 1 Scale parameter, 𝛼 𝛼=1, 1.5, 2 𝛽=2 2 Shape parameter, 𝛽 𝛽=1.5, 2, 2.5 𝛼=1 Figure 1 shows the variation of the MSE of the MLE for the Weibull scale (a) and shape (b) parameters as a function of the sample size. As can be observed, for all three tested values for both parameters, the MSE decreases and tends to zero with increased sample size, showing that as the sample size increases, the quality of MLE improves. As it may be observed, generally three regions for the different stages of the MSE decrease for both tests can be identified. First, when the sample size is smaller than 100, the MSE sharply decreases. This shows in this size range the quality of MLE is highly sensitive to the sample size and therefore the MLE should be used with cautions. Second, at sample size between 100 to 200, the decrease in MSE readily slows down, indicating that using sample with more than 100 observations will substantially improve the quality of the MLE. Third, as the sample size surpasses 200, the decrease in MSE further slows down, meaning that further increasing the sample size provides limited improvement to the quality of MLE. The above suggests that the minimum 117 acceptable sample size for obtaining a reliable MLE is 100 while 200 observations can lead to further improved quality. (a) (b) Figure 1: Simulation results of the MSE of the MLE as a function of the sample size for test 1 for Weibull scale parameter, Ξ± (a) and test 2 for Weibull shape parameter, Ξ² (b). 3.2 Distribution fitting results As mentioned in Section 2.1, the number of CO2 pipeline incidents available for this study is 18, which is far less than the minimum acceptable sample size required for obtaining a reliable MLE (see Section 3.1). As a result, the bootstrapping method is employed to obtain the CI of the MLE following the steps described in Section 2.3. In this study, the chosen level of significance for obtaining the CI is 0.05, referring to 95% confidence. 1,000 resampled datasets were generated. Table 2 summarises the bootstrapping results, including the arithmetic means and 95% CIs of the MLEs of the 1,000 resampled datasets. The results are given for both EHD and RHD. Table 2: The arithmetic means and 95% CIs of the MLEs of the Weibull scale and shape parameters for EHD and RHD. Diameter type Weibull parameter 95% CI Arithmetic mean EHD Scale parameter, 𝛼 [17.812,37.011] 26.278 Shape parameter, 𝛽 [1.171,2.221] 1.557 RHD Scale parameter, 𝛼 [0.093,0.191] 0.137 Shape parameter, 𝛽 [1.174,2.712] 1.573 Figures 2 and 3 respectively show the probability distributions of the EHD and RHD determined by the bootstrapping results in Table 2. Here the bootstrapping process gives the range of credible failure hole size CDFs for CO2 pipelines. The CDFs characterised by the lower and upper bounds of the 95% CIs (dashed lines), and by the arithmetic means (solid lines) are highlighted in both figures. As can be observed from Figure 2, for all CDFs in the range, the cumulative probability initially sharply increases to over 70% at relatively small EHD (ca. 20 and 40 mm for the upper and lower bounds respectively) and gradually converges to 1 as the EHD increases. Similar patterns can be seen in the case of RHD in Figure 3 where the cumulative probability increase to over 70% at EHD=ca. 0.1 for the upper bound and 0.25 for the lower bound. This means that the sizes of at least 70% of the holes are smaller than 0.25 of the pipe internal diameter, suggesting that smaller holes are much more likely to occur on CO2 pipelines. This may be attributed to the fact that over 80% of the CO2 pipeline incidents in the PHMSA are resulted from corrosions that are less likely to initiate catastrophic failures. The range defined by the upper and lower bounds essentially presents the uncertainties in the prediction of the hole size probability distribution. The CDFs within the uncertainty range are considered valid hole size probability distribution for CO2 pipelines and hence can be used with confidence for the purpose of quantitative risk assessment where the probability of occurrence of certain failure events is required. Decision makers can choose from the range of CDFs based on their subjective preferences. For example, the CDF characterised by the arithmetic mean can be used as the general-purposed guideline while the lower bound CDF can be taken as the worst-case scenario CDF in determining the risks associated with CO2 pipeline failure, as it represents the highest probability occurrence of larger holes among the possible CDFs. It should be noted that although the upper bound CDF can be a valid distribution from a statistical point of view, it is practically more reasonable 118 to use the CDFs closer to the lower bound as they cover a wider range of hole size and therefore can account for larger safety margins for quantitative risk assessment. Figure 2: The EHD CDF range for CO2 pipelines obtained by the bootstrapping process (see Table 2). The solid line is the CDF characterised by the arithmetic mean and the dashed lines are the CDFs characterised by the 95% CI. Figure 3: The RHD CDF range for CO2 pipelines obtained by the bootstrapping process (see Table 2). The solid line is the CDF characterised by the arithmetic mean and the dashed lines are the CDFs characterised by the 95% CI. 4. Conclusions The development and testing of an analytical method for constructing credible hole size probability distributions for the failure of CO2 pipelines was presented. The method involves fitting the Weibull distribution to the failure hole size data from 18 CO2 pipeline loss of containment incidents recorded in the PHMSA database using the MLE method. The quality of the MLE was then assessed by calculating the MSE based on the sample size. The results indicated that at least 100 hole size data are considered sufficient for obtaining a reliable MLE for CO2 pipelines while 200 are ideal. The 18 recorded CO2 pipeline incidents data are far insufficient, and therefore the bootstrapping method involving computing the 95% CI of the MLE was employed to obtain a credible range of the MLE. The resulting range of CDF suggests that smaller holes are much more likely to occur in CO2 pipelines, 119 with at least 70% of the holes being smaller than 0.25 of the pipe internal diameter. The range can be used by the decision makers for the subsequent quantitative risk assessment, depending on their subjective preferences. The probability distributions derived using the method proposed in this study can be readily updated with the growing wealth of the hole size data for CO2 pipeline failures. Future work is needed to study the quality of the MLE based on other sample features besides the sample size. Whether the sample ideally covers, for example, a sufficiently wide range of failure modes or the entire range of hole sizes is important. Nomenclature 𝐴 – hole cross section area, m2 𝐷𝑖𝑛 – pipe internal diameter, m EHD – equivalent hole diameter, mm 𝑃 – hole perimeter, m RHD – relative hole diameter, dimensionless 𝛼 – Weibull scale parameter 𝛽 – Weibull shape parameter Acknowledgments This work has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement no. 884418. The work reflects only the authors’ views and the European Union is not liable for any use that may be made of the information contained therein. References CONCAWE, 2011. Oil pipelines management group’s special task force on oil pipeline spillages (OP/STF-1). Brussels. Element Energy, 2010. CO2 pipeline infrastructure: An analysis of global challenges and opportunities. Ginos, B.F., 2009. Parameter estimation for the lognormal distribution. Brigham Young University. Goodfellow, G., Turner, S., Haswell, J., Espiner, R., 2012. An Update to the UKOPA Pipeline Damage Distributions, in: International Pipeline Conference. American Society of Mechanical Engineers, pp. 541– 547. Intergovernmental Panel on Climate Change (IPCC), 2018. Global warming of 1.5Β°C. Ji, Q., 2020. Foundation and basics, in: Probabilistic Graphical Models for Computer Vision. Elsevier, pp. 11– 29. Koch, P., 2008. Equivalent diameters of rectangular and oval ducts. Build. Serv. Eng. Res. Technol. 29, 341– 347. Kruse, H., Tekiela, M., 1996. Calculating the consequences of a CO2-pipeline rupture. Energy Convers. Manag. 37, 1013–1018. Long, J.S., Freese, J., 2006. Regression Models for Categorical Dependent Variables using Stata. Stata Press. Medina, H., Arnaldos, J., Casal, J., Bonvicini, S., Cozzani, V., 2012. Risk-based optimization of the design of on-shore pipeline shutdown systems. J. Loss Prev. Process Ind. 25, 489–493. Muhlbauer, W.K., 2004. Pipeline risk management manual (Third edition)-Ideas, techniques, and resources. Elsevier. Pipeline and Hazardous Materials Safety Administration (PHMSA), 2020. Source data [WWW Document]. Rusin, A., Stolecka, K., 2015. Reducing the risk level for pipelines transporting carbon dioxide and hydrogen by means of optimal safety valves spacing. J. Loss Prev. Process Ind. 33, 77–87. Tsagkanos, A., 2008. The bootstrap maximum likelihood estimator: The case of logit. Appl. Financ. Econ. Lett. 4, 209–212. Wei, S., Li, N., 2019. Bootstrap estimation for Weibull distribution parameters based on small sample and censored condition. Stat. Decis. 34–37. (in Chinese) Weibull, W., 1951. A statistical distribution function of wide applicability. J. Appl. Mech. 18, 293–297. 120 36yi.pdf Estimating the Failure Probability of CO2 Pipeline as Part of Carbon Capture and Storage Chain