Vol50,3,2007 329 ANNALS OF GEOPHYSICS, VOL. 50, N. 3, June 2007 Key words b-value – seismology – least squares technique – syntheticearthquake catalogs 1. Introduction The Gutenberg-Richter Law (from now on GR Law) (Gutenberg and Richter, 1954) is cer- tainly one of the most remarkable and ubiqui- tous features of seismicity worldwide. In the most common form it reads . (1.1)( )Log N M a bM= −6 @ In the so-called binned form of the GR Law, N represents the number of events with magnitude M, while in the cumulative form N is the num- ber of events with magnitude larger or equal to M. The constants a and b are the coefficients of the linear relationship. As discussed in a previous paper (Marzoc- chi and Sandri, 2003, from now on MS03), the scientific relevance of the GR Law is linked to its apparent ubiquity, and to the theoretical im- plications and meaning of its possible univer- sality. In particular, the value of b, representing the opposite of the slope of the linear relation- ship in eq. (1.1), is considered very important. In fact, in spite of a «first order» validity of the GR Law with a constant b-value ≈ 1 observed in a variety of tectonic settings, significant spatial and temporal variations in the b-value are found (e.g., Riedel et al., 2003; Del Pezzo et al., 2004; Legrand et al., 2004; Mandal et al., 2004; Ratchkovski et al., 2004; Wyss et al., 2004; Mur- A technical note on the bias in the estimation of the b-value and its uncertainty through the Least Squares technique Laura Sandri and Warner Marzocchi Istituto Nazionale di Geofisica e Vulcanologia, Sezione Bologna, Italy Abstract We investigate conceptually, analytically, and numerically the biases in the estimation of the b-value of the Gutenberg-Richter Law and of its uncertainty made through the least squares technique. The biases are intro- duced by the cumulation operation for the cumulative form of the Gutenberg-Richter Law, by the logarithmic transformation, and by the measurement errors on the magnitude. We find that the least squares technique, ap- plied to the cumulative and binned form of the Gutenberg-Richter Law, produces strong bias in the b-value and its uncertainty, whose amplitudes depend on the size of the sample. Furthermore, the logarithmic transformation produces two different endemic bends in the Log(N) versus M curve. This means that this plot might produce fake significant departures from the Gutenberg-Richter Law. The effect of the measurement errors is negligible compared to those of cumulation operation and logarithmic transformation. The results obtained show that the least squares technique should never be used to determine the slope of the Gutenberg-Richter Law and its un- certainty. Mailing address: Dr. Laura Sandri, Istituto Nazionale di Geofisica e Vulcanologia, Sezione Bologna, Via Donato Creti 12, 40128 Bologna, Italy; e-mail: sandri@bo.ingv.it 330 Laura Sandri and Warner Marzocchi ru et al., 2005; Schorlemmer et al., 2005). These possible variations are very important from a theoretical point of view, and for seismic hazard studies. We refer to our previous paper (MS03) for a deeper discussion on these issues. As a matter of fact, a crucial aspect in infer- ring variations or constancy of the b-value is represented by the method used to determine the b-value and its uncertainty, given a seismic catalog, as shown in MS03. In that paper, we discussed the maximum likelihood method to estimate the b-value and its uncertainty, show- ing that incorrect formulae based on this method produce a bias in the b-value and an un- derestimation of its uncertainty. The latter im- plies that we might observe a fake variation in the b-value only because we have underestimat- ed the uncertainty. Intentionally, in MS03 we did not consider the least squares (from now on, LS) method, because it had already been recog- nized (Page, 1968; Bender, 1983) that this tech- nique applied to the problem of the estimation of the b-value and of its uncertainty does not have any statistical foundation. However, the LS technique is still widely employed to esti- mate these two quantities, both in the binned and in the cumulative form, also in recent liter- ature (e.g., Working Group on California Earth- quake Probabilities, 2003; Gruppo di Lavoro, 2004; Lopez Pineda and Rebollar, 2005). The main purpose of this paper is to demon- strate, by means of conceptual issues, analitical formulations and numerical simulations, that the use of the LS technique in the estimation of the b-value and of its uncertainty leads to strongly biased estimates of these two quanti- ties. For the correct estimation of these two quantities, we refer to the appropriate formulae given in MS03 and in references therein. 2. Estimation of b and σbtt through the LS technique The LS estimation consists of applying lin- ear regression analysis to the GR Law de- scribed by eq. (1.1) in the binned or cumulative form. The parameter b should be the same in the two formulations, while the intercept a is different in the two cases. 2.1. Cumulative form of the GR Law Despite its widespread adoption in past and recent papers (Pacheco and Sykes, 1992; Pa- checo et al., 1992; Karnik and Klima, 1993; Okal and Kirby, 1995; Scholz, 1997; Triep and Sykes, 1997; Main, 2000; Working Group on California Earthquake Probabilities, 2003; Grup- po di Lavoro, 2004; Lopez Pineda and Rebollar, 2005), the use of LS technique to estimate the b- value on the cumulative form of the GR Law does not have any statistical motivation. The strongest assumption of regression analysis, i.e., the independence of the observations, is clearly violated. In practice, the cumulative form is an integration and, therefore, it represents a filter for the high frequency noise. As a result, the uncer- tainty of the slope of the GR Law is certainly strongly underestimated. In order to evaluate the bias in the regression analysis applied to the cu- mulative form numerically, as a function of the number of earthquakes contained in the catalog, we simulated 1000 seismic catalogs, for different catalog sizes. The magnitudes Mi are obtained by binning, with a fixed bin width of 0.1, a continu- ous random variable distributed with a probabil- ity density function (pdf) given by equation (2.1) which is valid if the catalog contains earth- quakes with a completeness magnitude equal to Mmin and with a magnitude range of at least three units (see MS03). In this way, Mi is the magnitude attached to all the synthetic seismic events with real continuous magnitude in the range Mi − 0.05 ≤ M< Mi + 0.05. Then, for each synthetic catalog, we esti- mated the b-value (from now on, using the no- tation bt for the estimated value, and b for the theoretical value) by means of the LS technique applied to the cumulative GR Law described by eq. (1.1), i.e. when N is the number of events with magnitude larger or equal to Mi. Figure 1 reports the averages of bt calculated in 1000 synthetic catalogs as a function of the number of data, for the cases b = 1 and b = 2. The 95% confidence interval is attached to each av- erage. There is a clear negative bias, that de- creases with the number of data and ranges ( ) ( )lnf M b 10 10 ( )b M Mmin= − − 331 A technical note on the bias in the estimation of the b-value and its uncertainty through the Least Squares technique from 5% to 2%. An explanation of this bias will be given in the following when we describe the effects of the logarithmic transformation in the regression analysis of the binned magnitudes. As mentioned above, another crucial aspect concerns the estimation of the uncertainty at- tached to bt , here indicated by σt bt and estimated as the average error on bt provided by the 1000 Fig. 2. F test values (see eq. (2.2)) for the case of figs. 1 and 4 (see text). The plus signs represent the cumu- lative regression, and the squares the binned regression. The dotted line represents the critical value to reject the null hypothesis at a significance level of 0.05. Note the logarithmic scale also on the vertical axis. Fig. 1. Average of bt (dashed line) calculated through cumulative LS technique in 1000 synthetic catalogs, as a function of the catalog size, for the cases b=1 (left) and b=2 (right). At each average is attached the 95% con- fidence interval. The solid line represents the true b-value. 332 Laura Sandri and Warner Marzocchi linear regressions. In particular, we are interest- ed in evaluating if the estimation of σt bt 2 is an un- biased estimator of the true variance of the b es- timation of bt around its central value. This can be done by simulating 1000 seismic catalogs as described above. For each catalog we calculat- ed bt and σt bt . Then, we compared the dispersion of bt around its average with the average of σt bt , through the Fisher test (e.g., Kalbfleisch, 1985) (2.2) The null hypothesis we tested is that the aver- age of σt bt 2 is equal to the variance of bt . The results, reported in fig. 2, are very inter- esting. In particular, the Fisher test shows that σt bt strongly underestimates the dispersion of b, the former being at least one order of magnitude smaller than the latter. This confirms that the un- certainty on the b-value estimated through the cu- mulative LS method is strongly underestimated. In general, the potential factors which can bias the estimations made through the LS tech- nique (both in the cumulative and binned form) are the logarithmic transformation of the number of events, and the presence of the measurement errors on the magnitude. By taking into account the criticism at the regression analysis applied to the cumulative form just reported, in the follow- ing we studied the impact of these factors only on the estimation of the b-value made by means of the regression analysis applied to the binned form. 2.2. Effects of the logarithmic transformation The expected number of events for a magni- tude M is (2.3) where n is the total number of earthquakes in the catalog, ∆M is the bin width used, and f(M) is given by eq. (2.1). The expected fluctuations around this value can be written as ( )np n f M dM /2 / i i M M M M 2 i i ν = = ∆ ∆+ − # .= Average of the square of the uncertainty Variance of the estimation of the value F b − . (2.4) In this frame, the observed number of events Ni for a binned magnitude Mi can be written as Ni = = νi(1 + ξi), where ξi is a random variable with zero mean and standard deviation equal to ∆Ni/νi. By taking into account eqs.((2.3) and (2.4)), we can write . (2.5) It is easy to demonstrate that ∆Ni/νi is a strictly monotonic decreasing function of pi. This means that the expected fluctuations of the variable ξ are larger for lower pi, that is for large magni- tudes. The estimation of b in the classical linear re- gression analysis is (see e.g., Draper and Smith, 1981) . (2.6) If we substitute Ni = νi(1 + ξi) we obtain (2.7) If we apply the expectation operator to both sides of eq. (2.7), it is possible to verify that the bias is zero only if E[Log(1 + ξi)] = 0. This is certainly not true because E[Log(1 + ξi)] 1, the term inside the round brackets cannot be neglected ( )MiO ( ) ( ) ( )N M N M P 10 10 10 2i i a bM b M b Mi= + + −ε ∆ ∆− −O Fig. 4. Average of bt (dashed line) calculated through binned LS technique in 1000 synthetic catalogs, as a func- tion of the catalog size, for the cases b=1 (left) and b=2 (right). At each average is attached the 95% confidence interval. The solid line represents the true b-value. Fig. 5. Effect of the measurement errors on the number of earthquakes with binned magnitude Mi. Adding the measurement errors, the number of events that come into the bin with central magnitude Mi from the adjacent left bin is mIL, while those coming from the adjacent right bin is mIR. The number of events that go out of the bin with central magnitude Mi towards the adjacent left bin is mOL, while those going out towards the adjacent right bin is mOR. 336 Laura Sandri and Warner Marzocchi and N(Mi) ≠N . This obviously introduces a further bias in the b-value estimation. Figure 6 reports the effect of the added measurement errors. As argued before, the ( )MiO largest difference with the case without meas- urement errors reported in fig. 4 is for b = 2, where the term of eq. (2.13) inside the round brackets is significantly different from zero. Fig. 6. Averages of bt (dashed lines) calculated through binned LS technique in 1000 synthetic catalogs as a function of the catalog size, for the cases b=1 (top) and b=2 (bottom). The magnitudes have measurement errors of standard deviation σε=0.1, 0.3, 0.5 (from left to right). At each average is attached the 95% confidence inter- val. The solid line represents the true b-value. 337 A technical note on the bias in the estimation of the b-value and its uncertainty through the Least Squares technique Since Pε depends on the amplitude of the measurement error (σε), eq. (2.13) explains al- so the proportionality of the bias with σε not- ed in fig. 6. As regards the estimation of the uncertainty of the b-value, the results reported in fig. 7 show that the addition of the meas- urement errors does not produce any further bias. Fig. 7. F test values (see eq. (2.2)) for the case of fig. 6 (see text). The dotted line represents the critical value to reject the null hypothesis at a significance level of 0.05. 338 Laura Sandri and Warner Marzocchi 3. Final remarks We have studied conceptually, analytically, and numerically the biases introduced by some factors (the cumulation operation, the logarith- mic transformation, and the measurement er- rors on the magnitudes) on the estimation of the b-value and of its uncertainty by means of the least squares technique. We have found many problems. First, besides violating a basic assumption of the regression analysis, the use of the cumula- tive form of the Gutenberg-Richter Law leads to a very strong underestimation of the uncertainty of the b-value, due to the filtering effect of the cumulation operation. Secondly, the logarithmic transformation of the discrete random number of events produces a significant bias in the estimation of the b-value both for the cumulative and binned form of the Gutenberg-Richter Law. Moreover, the bias strongly depends on the size of the data set; this means that two samples of different sizes coming from the same statistical distribution will have a significant difference in the b-value estimated. The same logarithmic transformation produces two en- demic bends of different signs in the Log(N) ver- sus M plot. These bends are not linked to the phys- ical process. This means that some departures from a straight line in the plot Log(N) versus M do not invalidate the Gutenberg-Richter Law. Finally, the influence of the measurement er- rors appears to be less important than the bias in- troduced by the logarithmic transformation. The main purpose of this paper was to show that the estimation of the b-value by means of the least squares method is strongly biased and its un- certainty results heavily underestimated. In this view, this work intended to make clear why this method should not be used to determine the slope of the Gutenberg-Richter Law, and to make infer- ences about its constancy or variations. In practice, any spatial or temporal variation of the b-value ob- tained by this method has to be regarded with a strong skepticism. On the other hand, unbiased es- timates of the b-value and of its uncertainty can be obtained by using the correct formulae, based on the maximum likelihood method, that have been analitically and numerically tested in Marzocchi and Sandri (2003) (see also references therein). REFERENCES BENDER, B. (1983): Maximum likelihood estimation of b- values for magnitude grouped data, Bull. Seismol. Soc. Am., 73, 831-851. DEL PEZZO, E., F. BIANCO, S. PETROSINO and G. SACCOROTTI (2004): Changes in the coda decay rate and shear-wave splitting parameters associated with seismic swarms at Mt. Vesuvius, Italy, Bull. Seismol. Soc. Am., 94 (2), 439- 452. DRAPER, N.R. and H. SMITH (1981): Applied Regression Analysis (John Wiley & Sons Ltd., New York), 2nd edi- tion, pp. 305. GROUP ON CALIFORNIA EARTHQUAKE PROBABILITIES (2003): Earthquake probabilities in the San Francisco Bay region: 2002-2031, U.S. Geol. Surv. Open File Rep. 03-214. GRUPPO DI LAVORO (2004): Redazione della mappa di peri- colositá sismica prevista dall’Ordinanza PCM 3274 del 20 marzo 2003, Rapporto Conclusivo per il Diparti- mento della Protezione Civile (INGV, Milano-Roma, aprile 2004), pp. 65. GUTENBERG, B. and C. RICHTER (1954): Seismicity of the Earth and Associated Phenomena (Princeton Universi- ty Press, Princeton, New Jersey), 2nd edition, pp. 310. KALBFLEISCH, J.D. (1985): Probability and Statistical Infer- ence (Springer-Verlag, New York), 2nd edition, pp. 364. KARNIK, V. and K. KLIMA (1993): Magnitude-frequency distribution in the European-Mediterranean earthquake regions, Tectonophysics, 220, 309-323. LEGRAND, D., D. VILLAGOMEZ, H. YEPES and A. CALAHORRA- NO (2004): Multifractal dimension and b-value analysis of the 1998-1999 Quito swarm related to Guagua Pichin- cha Volcano activity, Ecuador J. Geophys. Res., 109 (B1), B01307, doi: 10.1029/2003JB002572. LOPEZ PINEDA, L. and C.J. REBOLLAR (2005): Source char- acteristics of the Mw 6.2 Loreto earthquake of 12 March 2003 that occurred in a transform fault in the middle of the Gulf of California, Mexico, Bull. Seismol. Soc. Am., 95 (2), 419-430. MAIN, I. (2000): Apparent breaks in scaling in the earth- quake cumulative frequency-magnitude distribution: fact or artifact?, Bull. Seismol. Soc. Am., 90, 86-97. MANDAL, P., B.K. RASTOGI, H.V.S. SATYANARAYANA and M. KOUSALYA (2004): Results from local earthquake ve- locity tomography; implications toward the source process involved in generating the 2001 earthquake in the lower crust beneath Kachchh (India), Bull. Seismol. Soc. Am., 94 (2), 633-649. MARZOCCHI, W. and L. SANDRI (2003): A review and new insights on the estimation of the b-value and its uncer- tainty, Ann. Geophysics, 46 (6), 1271-1282. MURRU, M., C. MONTUORI, R. CONSOLE and A. LISI (2005): Mapping of the b-value anomalies beneath Mt. Etna, Italy, during JulyAugust 2001 lateral eruption, Geophys. Res. Lett., 32 (5), L05309, doi: 10.1029/ 2004GL021545. OKAL, E.A. and S.H. KIRBY (1995): Frequency-moment distribution of deep earthquake; implications for the seismogenic zone at the bottom of Slabs, Phys. Earth Planet. Inter., 92, 169-187. PACHECO, J.F. and L.R. SYKES (1992): Seismic moment cat- alog of large shallow earthquakes, 1900 to 1989, Bull. Seismol. Soc. Am., 82, 1306-1349. PACHECO, J.F., C.H. SCHOLZ and L.R. SYKES (1992): 339 A technical note on the bias in the estimation of the b-value and its uncertainty through the Least Squares technique Changes in frequencysize relationship from small to large earthquakes, Nature, 355, 71-73. PAGE, R. (1968): Aftershocks and microaftershocks of the Great Alaska earthquake of 1964, Bull. Seismol. Soc. Am., 58, 1131-1168. RATCHKOVSKI, N.A., S. WIEMER and R.A. HANSEN (2004): Seismotectonics of the Central Denali Fault, Alaska, and the 2002 Denali Fault earthquake sequence, Bull. Seismol. Soc. Am., 94 (6), Part B, 156-174. RIEDEL, C., T. PETERSEN, F. THEILEN and S. NEBEN (2003): High b-values in the leaky segment of the Tjornes frac- ture zone north of Iceland; are they evidence for shal- low magmatic heat sources?, J. Volcanol. Geotherm. Res., 128 (1-3), 15-29. SCHOLZ, C.H. (1997): Size distributions for large and small earthquakes, Bull. Seismol. Soc. Am., 87, 1074-1077. SCHORLEMMER, D., S. WIEMER and M. WYSS (2005): Varia- tions in earthquake-size distribution across different stress regimes, Nature, 437, doi: 10.1038/nature04094. TRIEP, E.G. and L.R. SYKES (1997): Frequency of occur- rence of moderate to great earthquakes in intracontine- tal regions: implications for changes in stress, earth- quake prediction, and hazard assessment, J. Geophys. Res., 102, 9923-9948. WYSS, M., C.G. SAMMIS, R.M. NAUDEAU and S. WIEMER (2004): Fractal dimension and b-value on creeping and locked patches of the San Andreas Fault near Parkfield, California, Bull. Seismol. Soc. Am., 94 (2), 410-421. (received December 27, 2006; accepted March 21, 2007)