Microsoft Word - ETASR_V13_N1_pp10175-10180 Engineering, Technology & Applied Science Research Vol. 13, No. 1, 2023, 10175-10180 10175 www.etasr.com Krimil et al.: Best Fit versus Default Distribution and the Impact on the Reliability over the Design … Best Fit versus Default Distribution and the Impact on the Reliability over the Design Lifetime of Hydraulic Structures Farida Krimil Civil and Hydraulic Engineering Department, Sciences and Technology Faculty, University of Mohamed Khider Biskra, Algeria f.krimil@univ-batna2.dz (corresponding author) Nora Bouchahm Technical and Scientific Center on Arid Regions, CRSTRA, Algeria bouchahm.nora@crstra.dz Fatima Zohra Tebbi Natural Hazards and Territory Planning Laboratory (LRNAT), Earth Sciences and Universe Institute, University of Mustapha Benboulaid Batna 2, Algeria f.tebbi@univ-batna2.dz Received: 16 December 2022 | Revised: 29 December 2022 and 6 January 2023 | Accepted: 7 January 2023 ABSTRACT In the present study, Flood Frequency Analysis (FFA) is performed on the daily inflows of a reservoir dam taken as a case study. The Peaks-Over-Threshold (POT) approach was adopted. A comparison between the default generalized Pareto distribution and the best distribution fitted to the data has been carried out. After the risk analysis, the reliability of the structure decreases to 25.60% for the chosen threshold values if the best distribution is adopted instead of the default fit. Keywords-flood frequency analysis; peaks-over-threshold; generalized Pareto distribution; best fit; risk analysis I. INTRODUCTION Flood Frequency Analysis (FFA) [1, 2] plays a major role in the design of hydraulic structures because it affects both safety and cost of the structure. The principal objective of FFA is to construct a relationship between the flood magnitude and the return period by the estimation of the probability of exceedance [3]. Hydrologists generally apply two types of approaches to perform FFA, Annual Maximum (AM) and Peaks-Over-Threshold (POT), also called partial duration series approach. The AM series approach is the most used method in FFA due to its simplicity in the sampling process [4, 5]. It uses only one maximum discharge value from each year. However, defining the samples in such a way eliminates a large portion of the data and results in loss of useful information, e.g. the second highest flow data in a year (which could be higher than many data points in the AM series) is not selected [4, 5]. The POT approach consists in retaining all peak values that are above a certain level usually called the threshold. The POT approach is more useful to the analysis of extreme values. The main advantage of the POT approach is that it allows the selection of an enriched series of events to be considered as floods [7] and controls the number of flood occurrences to be used in the analysis, unlike the AM approach which includes only one event per year. Authors in [7] prepared a guide for the use of the POT approach. Authors in [6] indicate that the POT method can provide adequate and comparable estimates of N- year discharges for more stations with short temporal coverage. Authors in [8] reviewed recent advances in the POT approach from a statistical perspective. Authors in [4] reported that the POT method has better results than the AM method after comparing the results of both methods for data from the Litija 1 gauging station on the Sava River in Slovenia. Authors in [5] reviewed and summarized the current status of the POT model and identified the difficulties in applying it in FFA. Ensuring the independence of the data series and choosing an appropriate threshold value are two of the main difficulties associated with the POT approach [4]. The threshold should be high enough to maintain the assumption of flood independence, but should not be so high as to increase the variance by reducing the number Engineering, Technology & Applied Science Research Vol. 13, No. 1, 2023, 10175-10180 10176 www.etasr.com Krimil et al.: Best Fit versus Default Distribution and the Impact on the Reliability over the Design … of events needed for flood analysis [9]. Therefore, the selection of an appropriate threshold is the subject of several studies and many methods, either graphical or analytical, have been reported. Graphical methods are widely used [10]. There are several probability distributions for modeling flood extreme events, the selection of the best fit distribution and the associated parameter estimation is an important step in FFA [11]. Generalized Pareto Distribution (GPD) is broadly used to model extreme floods over a threshold and the distribution of POT can be approximated by the GPD [12]. The annual number of events above the threshold can be approximated by either Poisson or Negative binomial distributions [13, 14]. In the current study, the FFA method, applied to the daily inflows of the Koudiat Medouar reservoir and the POT approach are adopted. Several additional distributions are tested to find the best one that fits the data. A risk analysis is then conducted. II. STUDY AREA Located 7km north-east of Timgad and 35km from Batna, the Koudiet Medouar dam was built in 1994 on the Oued Rebôa. It has a height of 48m. The reservoir is a part of the great strategic hydraulic complex Beni Haroun Transfer, with a capacity of 70 million m 3 . It supplies drinking and irrigation water to several towns. Figure 1 shows the Koudiat Medouar reservoir location. Fig. 1. Location of the Koudiat Medouar reservoir. III. METHODOLOGY FFA is a statistical method that consists of studying past events and characteristics of a given process (hydrological or other) to define the probability of future occurrence by a probability distribution using the AM or POT series. The POT approach consists in the constitution of a series of data by extracting the values superior to a well determined threshold. The selection of an appropriate threshold value is an important phase, Mean Residual Life Plot (MRLP) and the stability graph of the parameters are recommended in [15]. The threshold value is chosen upon a certain stability of the scale and shape graphs. Threshold is taken at the tail of the MRLP when the function of the mean value begins to be almost linear. The POT flood series are adjusted by the distribution of GPD [16]. The adjustment of excesses above a threshold according to the GPD model hypothesis must be ensured by two discrete distributions, Poisson or negative binomial. The main step in the application of the POT method is the choice of the threshold. Indeed, selecting a very high threshold has the effect of decreasing the size of the series and thus increasing the sampling uncertainties. The Number of Maximums per Year (NMpY ≈1) subsequently gives a very short AM series [17]. On the other hand, a threshold that is too low (NMpY ≥ 6) gives series artificially enriched with information but one will be confronted on the problem of obtaining dependent series. Therefore, a trade-off must be found between acquiring information and obtaining independent series (Figure 2). However, there is no universal criterion for identifying such a threshold value [18]. In this paper, we focus on the following criteria for threshold selection:  MRLPs.  The stability of scale and modified shape graphs.  Fixing the NMpY. Once the preselection has been performed according to the above criteria, the basic hypothesis tests of FFA, i.e. homogeneity, independence, and stationarity, must be verified. To verify these hypotheses, three non-parametric statistical tests are used: [19] to verify the independence, [20, 21] to verify whether the data come from the same distribution or not, and [22, 23] to verify the stationarity of the data. Fig. 2. Methodology chart. A. Model Fitting Several types of distribution are used to estimate flow extremes. The reasons for the operational use of a particular distribution type in many countries are often subjective or historical [24]. Table I illustrates the probability distributions mostly used to fit inflows. The choice of the appropriate estimator is one of the most important issues in FFA [25, 28]. The most commonly used are: the Method Of Moments (MOM), the Method of L-moments (ML), and the Maximum Likelihood Estimation (MLE). The ML is more used in statistical hydrology for estimating various hydro- meteorological variables [29, 30]. The objective of the FFA is to find reliable estimates of quantiles that will help us in the design of structures. After the selection of the thresholds, and defining the best fit, the quantiles are evaluated for a return period of 100 years. Engineering, Technology & Applied Science Research Vol. 13, No. 1, 2023, 10175-10180 10177 www.etasr.com Krimil et al.: Best Fit versus Default Distribution and the Impact on the Reliability over the Design … TABLE I. USUAL PROBABILITY DISTRIBUTIONS FUNCTIONS Distribution PDF Normal 2 2 1 2 1 ),;(              x exf Log normal    20 /ln 2 1 0 0 2)( 1 ),,;(       xx e xx xxf Gumbel                       0exp exp )( 0 xx e xf xx Pearson 3                           x e x xf 1 1 ),,;( Generalized extreme value                              1 1 1 )(1exp1 1 ),,;( xxxf Generalized Pareto distribution                                             0,exp 1 0,1 1 ),,;( 1 1           x x xf Kappa   hxFxkhxf          1 1 1 1 )( )( 1),,,;(     , where h x hhxF 1 1 )( 11),,,;(                   α: scale parameter, μ: location parameter, κ: shape parameter, γ: second scale parameter, h: second shape parameter, Γ( ): gamma function, F( ): cumulative distribution function B. Risk Analysis Water-control design requires the consideration of risk [31]. Risk and reliability analysis have great importance when it comes to natural phenomena such as floods. Risk analysis of future events requires a probabilistic approach [32]. This natural hydrological risk can be calculated using the following equations: at least once in years 1 ( ) 1 (1 ) N T Np X x T     (1)  NTxXpR )(11  (2) where: 1 ( )Tp X x T   (3) 1 1 1 N R T         (4) where N is the expected life of the structure (in our case N=50 years), T is the return period (usually the life expectancy of a hydraulic structure is 100 years), and �� is the probability that an event � ≥ �� will occur at least once in N years. The reliability Re is defined by: Re = 100 − �� (5) IV. RESULTS AND DISCUSSION A. Data Description The used database in this study was collected from the National Agency for Dams and Transfer (ANBT). Daily inflows of Koudiat Medouar reservoir spanning the period from 2004 to 2019 were calculated according to the following water balance equation: Inflow= (� − �� ) + (��� + ��� + ��� + ��� + ��� + ���) (6) where Inflow is the daily inflow, ���i and ���� represent the initial and the volume in the reservoir, DEV is the daily evaporated volume, DWS the daily volume allocated to water supply, DRV the daily volume allocated to irrigation, DSV the daily spilled volume, �FV the daily flashed volume, and DLV is the daily leakage volume. Daily inflows are not normally distributed. The box plot in Figure 3 shows that two values 117.608m 3 /s and 110.915m 3 /s appear as outliers, but in fact they are real occurred events. Figure 4 shows the interannual variations of daily inflows at the Koudiat Medouar reservoir. Even if the stability of scale and shape parameters are confirmed (Figure 5), it is important to verify the hypothesis tests (homogeneity, independence, and stationarity). Between the thresholds of 5m 3 /s and 11m 3 /s, even with that of 14m 3 /s, the tests were not verified. Engineering, Technology & Applied Science Research Vol. 13, No. 1, 2023, 10175-10180 10178 www.etasr.com Krimil et al.: Best Fit versus Default Distribution and the Impact on the Reliability over the Design … Fig. 3. Daily inflows description of Koudiat Medouar reservoir. Fig. 4. Time series of the daily inflows at Koudiat Medouar reservoir (2004-2019). Fig. 5. Mean residual life plot for the daily inflows of the Koudiat Medouar reservoir. This led us to keep only the values of 12, 13, and 15m 3 /s respectively (Table II). Using the R code RStudio [33] and for the previously chosen thresholds, we tested whether the different distributions fit. The results show that GPD distribution is not the best fit for 12m 3 /s and 13m 3 /s threshold (Figure 6). The threshold of 15m 3 /s just ensures an exceedance number of 2 per year, for this purpose, this threshold is presented as a limit of choice beyond which we find the statistical situation of the AM. For thresholds below 12m 3 /s, the opposite situation arises: these are samples artificially enriched in size with a greater chance of losing the satisfaction of the hypothesis tests, in particular the homogeneity and the randomness required for any reliable frequency analysis without statistical violation (Table III). The frequency distribution results are shown in Figure 6. TABLE II. STATISTICAL PARAMETERS CORRESPONDING TO POT TIME SERIES Statistic u=12m 3 /s u=13m 3 /s u=15m 3 /s Sample size 50 40 31 Min 12.08 13.12 15.22 1 st quantile 13.34 15.31 19.85 Median 17.06 22.41 27.67 Mean 26.81 30.40 35.20 3 rd quantile 30.27 37.26 43.42 Max 117.60 117.60 117.60 Wald-Wolfowitz p-value 0.086 0.337 0.137 Wilcoxon-Mann- Whitney p-value 0.851 0.814 0.759 Mann-Kendal p-value 0.536 0.954 0.838 TABLE III. ESTIMATED QUANTILES FOR RETURN PERIOD T=100 YEARS Threshold Type of fit Distribution Quantiles (m 3 /s) RMSE (m 3 /s) u =12m 3 /s Best fit Pearson 3 159.285 0.024 Default fit GPD 211.924 0.049 u =13m 3 /s Best fit Kappa 169.318 0.026 Default fit GPD 190.416 0.037 u =15m 3 /s Best fit = default fit GPD 174.899 0.028 The frequency distribution results are shown in Figure 6. The plots indicate that the Pearson 3 (RMSE=0.024) and Kappa (RMSE=0.026) distributions are the best fit for u = 12 and 13m 3 /s, respectively. Using (4) and (5), the risk reliability is calculated for a return period of 100 years and is presented for best fit and GPD distribution results. The quantile corresponding to 100 years return period for Pearson 3 distribution is equivalent to a 48 years return period of the GPD for the 12m 3 /s threshold. Respectively, quantile corresponding to 100 years return period for Kappa distribution is equivalent to a return period of 70 years of the GPD fit for the 13m 3 /s threshold (Table IV). By selecting the best fit, the reliability of the structure decreases by 25.60% and 11.80% for thresholds of 12m 3 /s and 13m 3 /s, respectively, compared to the default fit. Quantile Pearson 3= 159.285m 3 /s for T=100 years, corresponding to T=48 years quantile in GPD (reliability=-25.60%) and quantile Kappa= 169.318m 3 /s for T=100 years in GPD corresponding to T=70 years (reliability=-11.80%). We note that when the reliability decreases, the risk increases, so we have under dimensioned hydraulic structures. Engineering, Technology & Applied Science Research Vol. 13, No. 1, 2023, 10175-10180 10179 www.etasr.com Krimil et al.: Best Fit versus Default Distribution and the Impact on the Reliability over the Design … Fig. 6. Probability density function of best and default fit for the selected thresholds. TABLE IV. RISK RELIABILITY Pearson 3 GPD u (m 3 /s) 12 Return period (T) 100 48 Risk (for N=50 years) 39.50% 65.10% Reliability 60.50% 34.90% 13 Kappa GPD Return period (T) 100 70 Risk (for N=50 years) 39.50% 51.30% Reliability 60.50% 48.70% V. CONCLUSION This article presents high flood analysis in relation to the data from daily inflows of the Koudiat Medouar reservoir for the period from 2004 to 2019. FFA was carried out with the POT series. One of the main disadvantages concerning the POT approach is the selection of the threshold value. In FFA, choosing a very high threshold Number of Maximums per Year (NMpY ≈1), has the effect of decreasing the size of the series and therefore increasing the sampling uncertainty which subsequently gives a very short series. On the other hand, a too small threshold gives series artificially enriched information (NMpY ≥ 6). For this reason, we have retained only the series of 2 to 4 max per year. After the verification of the basic hypothesis tests, the thresholds retained were 12, 13, and 15m 3 /s. In this study, the GPD distribution is not systematically the best fit of the daily inflows. The study showed that the Pearson 3 and Kappa distributions are better suited than GPD for thresholds equal to 12 and 13m 3 /s, respectively, for the estimation of flood discharge. The carried out risk analysis shows that the choice of GPD distribution decreases the reliability of the structure compared to the case of choosing the best fit distribution. The results presented in the current paper are helpful to the planning and optimization of the dimensions of hydraulic structures. REFERENCES [1] N. Harkat, S. Chaouche, and M. Bencherif, "Flood Hazard Spatialization Applied to The City of Batna: A Methodological Approach," Engineering, Technology & Applied Science Research, vol. 10, no. 3, pp. 5748–5758, Jun. 2020, https://doi.org/10.48084/etasr.3429. [2] K. Loumi and A. Redjem, "Integration of GIS and Hierarchical Multi- Criteria Analysis for Mapping Flood Vulnerability: The Case Study of M’sila, Algeria," Engineering, Technology & Applied Science Research, vol. 11, no. 4, pp. 7381–7385, Aug. 2021, https://doi.org/10.48084/ etasr.4266. [3] S. Mkhandi, A. Opere, and P. Willems, "Comparison between annual maximum and peaks over threshold models for flood frequency prediction," in International Conference of UNESCO Flanders FIT FRIEND/Nile Project - "Towards a better cooperation," Sharm El- Sheikh, Egypt, Nov. 2005. [4] N. Bezak, M. Brilly, and M. Sraj, "Comparison between the peaks-over- threshold method and the annual maximum method for flood frequency analysis," Hydrological Sciences Journal, vol. 59, no. 5, pp. 959–977, May 2014, https://doi.org/10.1080/02626667.2013.831174. [5] X. Pan, A. Rahman, K. Haddad, and T. B. M. J. Ouarda, "Peaks-over- threshold model in flood frequency analysis: a scoping review," Stochastic Environmental Research and Risk Assessment, vol. 36, no. 9, pp. 2419–2435, Sep. 2022, https://doi.org/10.1007/s00477-022-02174-6. [6] V. Bacova-Mitkova and M. Onderka, "Analysis of extreme hydrological Events on THE danube using the Peak Over Threshold method," Journal of Hydrology and Hydromechanics, vol. 58, no. 2, pp. 88–101, Jun. 2010, https://doi.org/10.2478/v10098-010-0009-x. [7] M. Lang, T. B. M. J. Ouarda, and B. Bobee, "Towards operational guidelines for over-threshold modeling," Journal of Hydrology, vol. 225, no. 3, pp. 103–117, Dec. 1999, https://doi.org/10.1016/S0022- 1694(99)00167-5. [8] C. Scarrott and A. MacDonald, "A Review of Extreme Value Threshold Estimation and Uncertainty Quantification," REVSTAT-Statistical Journal, vol. 10, no. 1, pp. 33–60, Apr. 2012, https://doi.org/10.57805/ revstat.v10i1.110. [9] A. Gharib, E. G. R. Davies, G. G. Goss, and M. Faramarzi, "Assessment of the Combined Effects of Threshold Selection and Parameter Estimation of Generalized Pareto Distribution with Applications to Flood Frequency Analysis," Water, vol. 9, no. 9, Sep. 2017, Art. no. 692, https://doi.org/10.3390/w9090692. [10] B. Bader, J. Yan, and X. Zhang, "Automated threshold selection for extreme value analysis via ordered goodness-of-fit tests with adjustment for false discovery rate," The Annals of Applied Statistics, vol. 12, no. 1, pp. 310–329, Mar. 2018, https://doi.org/10.1214/17-AOAS1092. [11] A. H. Syafrina, A. Norzaida, and O. N. Shazwani, "Stochastic Modeling of Rainfall Series in Kelantan Using an Advanced Weather Generator," Engineering, Technology & Applied Science Research, vol. 8, no. 1, pp. 2537–2541, Feb. 2018, https://doi.org/10.48084/etasr.1709. [12] J. Pickands, "Statistical Inference Using Extreme Order Statistics," The Annals of Statistics, vol. 3, no. 1, pp. 119–131, 1975. [13] C. Cunnane, "A note on the Poisson assumption in partial duration series models," Water Resources Research, vol. 15, no. 2, pp. 489–494, 1979, https://doi.org/10.1029/WR015i002p00489. [14] B. Onoz and M. Bayazit, "Effect of the occurrence process of the peaks over threshold on the flood estimates," Journal of Hydrology, vol. 244, no. 1, pp. 86–96, Apr. 2001, https://doi.org/10.1016/S0022-1694(01) 00330-4. [15] S. Coles, An introduction to statistical modeling of extreme values. New York, NY, USA: Springer, 2001. Engineering, Technology & Applied Science Research Vol. 13, No. 1, 2023, 10175-10180 10180 www.etasr.com Krimil et al.: Best Fit versus Default Distribution and the Impact on the Reliability over the Design … [16] S. D. Grimshaw, "Computing Maximum Likelihood Estimates for the Generalized Pareto Distribution," Technometrics, vol. 35, no. 2, pp. 185– 191, May 1993, https://doi.org/10.1080/00401706.1993.10485040. [17] P. Dubreuil, Initiation a l’analyse hydrologique: dix exercices suivis des corriges. Paris, France: Masson & Cie, 1974. [18] S. Lachance-Cloutier, "Series de durees partielles : application en contexte non homogene et non stationnaire," M.S. thesis, Universite du Quebec, Quebec, Canada, 2011. [19] A. Wald and J. Wolfowitz, "An Exact Test for Randomness in the Non- Parametric Case Based on Serial Correlation," The Annals of Mathematical Statistics, vol. 14, no. 4, pp. 378–388, 1943. [20] F. Wilcoxon, "Individual Comparisons by Ranking Methods," Biometrics Bulletin, vol. 1, no. 6, pp. 80–83, Dec. 1945, https://doi.org/ 10.2307/3001968. [21] H. B. Mann and D. R. Whitney, "On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other," The Annals of Mathematical Statistics, vol. 18, no. 1, pp. 50–60, 1947. [22] H. B. Mann, "Nonparametric Tests Against Trend," Econometrica, vol. 13, no. 3, pp. 245–259, 1945, https://doi.org/10.2307/1907187. [23] M. G. Kendall, Rank Correlation Methods. Oxford, UK: Oxford University Press, 1975. [24] C. Cunnane, Statistical distributions for flood frequency analysis. Geneva, Switzerland: World Meteorological Organization, 1989. [25] R. W. Vogel and D. E. McMartin, "Probability Plot Goodness-of-Fit and Skewness Estimation Procedures for the Pearson Type 3 Distribution," Water Resources Research, vol. 27, no. 12, pp. 3149–3158, 1991, https://doi.org/10.1029/91WR02116. [26] B. Merz and A. H. Thieken, "Flood risk curves and uncertainty bounds," Natural Hazards, vol. 51, no. 3, pp. 437–458, Dec. 2009, https://doi.org/ 10.1007/s11069-009-9452-6. [27] F. Laio, G. Di Baldassarre, and A. Montanari, "Model selection techniques for the frequency analysis of hydrological extremes," Water Resources Research, vol. 45, 2009, Art. no. 07416, https://doi.org/ 10.1029/2007WR006666. [28] K. Haddad and A. Rahman, "Selection of the best fit flood frequency distribution and parameter estimation procedure: a case study for Tasmania in Australia," Stochastic Environmental Research and Risk Assessment, vol. 25, no. 3, pp. 415–428, Mar. 2011, https://doi.org/ 10.1007/s00477-010-0412-1. [29] T. S. Gubareva and B. I. Gartsman, "Estimating distribution parameters of extreme hydrometeorological characteristics by L-moments method," Water Resources, vol. 37, no. 4, pp. 437–445, Jul. 2010, https://doi.org/ 10.1134/S0097807810040020. [30] J. R. M. Hosking, "L-Moments: Analysis and Estimation of Distributions Using Linear Combinations of Order Statistics," Journal of the Royal Statistical Society: Series B (Methodological), vol. 52, no. 1, pp. 105–124, 1990, https://doi.org/10.1111/j.2517-6161.1990.tb01775.x. [31] V. T. Chow, Applied Hydrology. New York, NY, USA: McGraw-Hill, 2010. [32] B. Bobee and Ashkar, Gamma Family and Derived Distributions Applied in Hydrology. Littleton, CO, USA: Water Resources Pubns, 1991. [33] R Core Team, R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing, 2018.