Earthquake forecasting: statistics and information ANNALS OF GEOPHYSICS, 58, 6, 2015, S0658; doi:10.4401/ag-6816 S0658 Earthquake forecasting: statistics and information Vladimir Gertsik1,*, Mark Kelbert1,2,3, Anatoly Krichevets4 1 Institute of Earthquake Prediction Theory and Mathematical Geophysics RAS, Moscow, Russia 2 Swansea University, Department of Mathematics, Swansea, United Kingdom 3 Higher School of Economics, Moscow, Russia 4 Lomonosov MSU, Department of Psychology, Moscow, Russia ABSTRACT The paper presents a decision rule forming a mathematical basis of earth- quake forecasting problem. We develop an axiomatic approach to earth- quake forecasting in terms of multicomponent random fields on a lattice. This approach provides a method for constructing point estimates and confidence intervals for conditional probabilities of strong earthquakes under conditions on the levels of precursors. Also, it provides an approach for setting a multilevel alarm system and hypothesis testing for binary alarms. We use a method of comparison for different algorithms of earth- quake forecasts in terms of the increase of Shannon information. ‘Fore- casting’ (the calculation of the probabilities) and ‘prediction’ (the alarm declaring) of earthquakes are equivalent in this approach. 1. Introduction The methodology of selecting and processing of relevant information about the future occurrence of potentially damaging earthquakes has reached a rea- sonable level of maturity over the recent years. How- ever, the problem as a whole still lacks a comprehensive and generally accepted solution. Further efforts for op- timization of the methodology of forecasting would be productive and well justified. One of our objectives here is to address some drawbacks of the approach to earthquake prediction based on pattern recognition theory. It is developed by “Keilis-Borok team” and presented in Keilis-Borok and Soloviev [2003]. The pattern recognition is used in fore- cast with the hope that ‘training sets’ in which strong earthquakes have higher values of precursors may help to decide whether or not a strong earthquake will hap- pen in the future. As it is demonstrated by below pub- lished Krichevets’s theorem, this hope has no basis. A comprehensive review of the modern earth- quake forecasting state of knowledge and guidelines for utilization can be found in Jordan et al. [2011]. Note that all methods of evaluating the probabilities of earthquakes are based on a combination of geophysi- cal, geological and probabilistic models and consider- ations. Even the best and very detailed models used in practice are in fact only ‘caricatures’ of immensely complicated real processes. (For example, the equations of the theory of elasticity are valid only for infinitesi- mal strains, but used even at the edges of the cracks where strains tend to infinity. It is also clear that we do not know about the inhomogeneity of elastic proper- ties in the real medium, and we have no way of know- ing about it with sufficient accuracy.) A mathematical toolkit for earthquake forecasting is well presented by Harte and Vere-Jones [2005]. This work is based on the modeling of earthquake sequences in terms of the marked point processes. However, the mathematical technique used is quite sophisticated and does not provide direct practical tools to investigate the relations of the structure of temporal-spatial random fields of precursors to the appearance of strong earth- quakes. The use of the multicomponent lattice models (in- stead of marked point processes) gives a different novel way of investigating these relations by a more elemen- tary way. Discretization of space and time allows us to separate the problem in question into two separate tasks. The first task is the selection of relevant precur- sors, i.e., observable and theoretically explained physi- cal and geological facts which are causally related to a high probability of strong earthquakes. Particularly, this task involves the development of models of seismic events and computing probabilities of strong earth- quakes in the framework of these models. Such proba- bilities are used as precursors in the second task. The second task is the development of methodol- Article history Received June 25, 2015; accepted December 21, 2015. Subject classification: Earthquake forecasting, Lattice, Random process, Information. ogy of working with these precursors in order to extract the maximum information about the probabilities of strong earthquakes. This is the main topic of this paper. Our approach allows us to obtain the following results: - Estimates of probabilities of strong earthquakes for given values of precursors are calculated in terms of the frequencies of historic data. - Confidence intervals are also constructed to pro- vide reasonable bounds of precision for point estimates. - Methods of predictions (i.e., binary alarm an- nouncement [Keilis-Borok and Kossobokov 1990, Keilis- Borok 1996, Holliday et al. 2007]) and forecasting (i.e., calculating probabilities of earthquakes [Kagan and Jackson 2000, Harte and Vere-Jones 2005, WGNCEP 2007, Jordan et al. 2011]) are equivalent in the following sense: the setting of some threshold for the probability of earthquakes allows to update the alarm level. On the other hand, the knowledge of the alarm domain based on historical data allows us to evaluate the probabilities of earthquakes. In a sense, the prediction is equivalent to hypothesis testing as well (see Section 11). - In our scheme we propose a scalar statistic which is the ratio of the actual increment of information to the maximal possible increment of information. This statistic allows us to linearly order all possible forecast- ing algorithms. Nowadays the final judgement about the quality of earthquake forecasting algorithms is left to experts. This arrangement puts the problem outside the scope of natural sciences which are trying to avoid subjective judgements. The foundation of our proposed scheme is the as- sumption that the seismic process is random and can- not be described by a purely deterministic model. Indeed, if the seismic process is deterministic then the inaccu- racy of the forecast could be explained by the non-com- pleteness of our knowledge about the seismic events and non-precision of the available information. This may explain, at least in principle, attacks from the au- thorities addressed to geophysicists who failed to pre- dict a damaging earthquake. However, these attacks have no grounds if one accepts that the seismic process is random. At the end of the last century (February- April 1999) a group of leading seismologists organized a debate via the web to form a collective opinion of the scientific community on the topic: ‘Is the reliable pre- diction of individual earthquakes a realistic scientific goal?’ (see http://www.nature.com/nature/debates/ earthquake/). Despite a considerable divergence in peripheral is- sues all experts taking part in the debate agreed on the following main principles: - the deterministic prediction of an individual earthquake, within sufficiently narrow limits to allow a planned evacuation programme, is an unrealistic goal; - forecasting of at least some forms of time-depen- dent seismic hazard can be justified on both physical and observational grounds. The following facts form the basis of our agreement with this point of view. The string- block Burridge-Knopov model, generally accepted as a mathematic tool to demonstrate the power-like Guten- berg-Richter relationship between the magnitude and the number of earthquakes, involves the generators of chaotic behaviour or dynamic stochasticity. In fact, the nonlinearity makes the seismic processes stochastic: a small change in the shift force may lead to completely different consequences. If the force is below the thresh- old of static friction the block is immovable, if the force exceeds this threshold it starts moving, producing an avalanche of unpredictable size. This mechanism is widespread in the Earth. Suppose that the front propagation of the earthquake approaches a region of enhanced strength of the rocks. The earth- quake magnitude depends on whether this region will be destroyed or remains intact. In the first case the front moves further on, in the second case the earthquake re- mains localized. So, if the strength of the rocks is below the threshold the first scenario prevails, if it is above the threshold the second scenario is adapted. The whole situation is usually labelled as a butterfly effect: infini- tesimally small changes of strength and stress lead to macroscopic consequences which cannot be predicted because this infinitesimal change is below any precision of the measurement. For these reasons determinism of seismic processes looks more doubtful than stochasticity. The only comment we would like to contribute to this discussion is that the forecasting algorithms based exclusively on the empirical data without consistent physical models could hardly be effective in practice (see Section 13 for more details). Finally, note that our approach may be well-applic- able for the space-time forecasting of different extremal events outside the scope of earthquake prediction. 2. Critical analisis of the prediction based on pattern recognition The main objective of this paper is to present a mathematical formalism of the earthquake forecasting alternative to the approach based on the pattern recog- nition theory. So, it is appropriate to summarize the lat- ter approach. An advantage of the formalism of the Keilis-Borok team [Keilis-Borok and Kossobokov 1990, Keilis-Borok 1996, Keilis-Borok and Solovoev 2003] is a separation of the forecast into two separate tasks: the selection of relevant precursors and the technique of their process- GERTSIK ET AL. 2 3 ing. On the other hand, some drawbacks of the scheme should be pointed out. 2.1. Pattern recognition The method is relaying on the pattern recognition technique that can provide only empirical but not the- oretical foundation. The possibility to base a reliable forcast on the idea of supervised pattern recognition is highly questionable in principle. Indeed, the features se- lection based on the observations only and not related to the physics of earthquakes is not helpful and any hopes for valid ‘training sets’ are not grounded. A suc- cessful supervised recognition is possible only if the fea- tures has proved causal relation with pattern. This principle is illustrated by a simple but important theo- rem by A.N. Krichevets. Theorem. Let A be a finite set, B1 ∪ B2 ⊂ A; B1 ∩ B2= Ø. We say that B1 and B2 are finite educational samples. Let X ∈ A; X ∉ B1∪B2 be a new object. Then among all classifi- cations, i.e., subsets {A1, A2} such that B1 ⊂ A1, B2 ⊂ A2, A1∪ A2= A, A1∩ A2 = Ø satisfying condition that either B1 ∪ X ⊂ A1 or B2 ∪ X ⊂ A2 exactly a half classifies X as an ob- ject of sample B1 and a half classifies X as an object from B2. Proof. It is easy to define a one-to-one correspon- dence between classifications. Indeed, if {A1, A2} A1∪ A2= A; is a classification such that B1 ⊂ A1, X ∈ A1, B2 ⊂ A2, one maps it into the unique classification {A'1, A'2} such that B1⊂A'1, X∈A'2, B2⊂ A'2, where A'1= A1\X, A'2= A2∪X. Corollary. A supervised pattern recognition is impos- sible. After the training procedure the probability to classify correctly a new object is the same as before training. This result implies that a clairvoyance is involved in a successful forecast: one should know in advance (or guess correctly) a “relevant” set of precursors. 2.2. Voting procedure As a decision rule for a forecast a voting procedure is adopted: an alarm is announced when the number of precursors taking abnormally large values exceeds certain thershold. This procedure is well-justified only if the pattern in question is unique. Generically, this is not true: if two or more possible scenarios are present, different precursors may be strongly affected in differ- ent scenarious. In this case a combination of abnormal values for precursors related to different scenarios may produce a false alarm. 2.3. Alarm domain The values of precursors are computed at the point of some lattice, and are assigned a domain sur- rounding the point. The above scheme employs a circle with a center at a lattice site and a radius proportional to the minimal magnitude of the forecasted earth- quake. However, this choice leads to a contradiction. Indeed, suppose we announce an alarm for earth- quakes with minimal magnitude 6 in a domain A6. Note that outside the alarm domain we expected that the earthquake will not occur. But the announcement of the alarm with minimal magnitude 7 is also correct for this point and the alarm should be announced in the domain A7 as well. By definition A6 ⊂ A7 and we expect an earthquake with a magnitude of at least 7 and do not expect an earthquake with à magnitude of at least 6 in the domain A7\ A6. This is absurd. Futhermore, the alarm area is too large with this approach. 2.4. Molchan’s error diagram Below we demonstrate that the procedure of alarm announcement is equivalent to the classical prob- lem of hypothesis testing. So, the standard error dia- gram provides the dependence of the second type error (probability of missing a target) from the first type error (probability of false alarm). In line with the general case the well-known Molchan’s error diagram [Molchan 1990] displays the dependence of the first kind error probability versus the share of alarms. H0: the distribution of the precursor is equal to its unconditional distribution (earthquake “could either happen or not happen”). H1: the distribution of the precursor is equal to its conditional distribution under the condition that the earthquake will happen. Note that the rejection of hypothesis H0 leads to absurd results: “earthquake will not happen and will happen”. Of course, Molchan’s error diagram can be inter- preted not in terms of hypothesis testing. Then there is no contradiction, but the standard error diagram is dif- ferent from the Molchan’s error diagram by a small value proportional to the share of cells occupied by the strong earthquakes (in our approach the spacetime is divided into disjoint cells). 2.5. The loss function, risk function and the economic criterion for the quality of the forecast The Keilis-Borok team refers to the economic loss function for the quantitative evaluation of the quality of forecast. First, we calculate the expected loss cor- rectly. A natural economic measure for a quality of bi- nary forecast is the economic risk function (it should not be confused with seismic risk, there is the concept of the theory of statistical decisions) or damage r re- lated to the earthquakes and the necessary protective measures. In mathematical statistics the risk function is EARTHQUAKE FORECASTING: STATISTICS AND INFORMATION defined as the expectation of the loss function, which is an expected losses for different outcomes. In our case there are two types of losses: damage and expenses re- lated to protection. For each cell of our grid (in our ap- proach the space-time is divided into disjoint cells) the risk may be specied by the formula r = bPr {no earthquake; alarm} + aPr {earthquake; no alarm} + cPr {earthquake; alarm} here the symbol Pr {·} denotes a probability, a stands for the average damage from a seismic event, b stands for the average expenses for protection after a seismic alarm is announced, c stands for the average damage after the alarm, c = a + b − d, where d is the damage prevented by the alarm. The coefficient in front of Pr {no earthquake; no alarm}, obviously, equals 0, be- cause in the absence both of a seismic event and an alarm there is no loss of any kind. Clearly, only the case when d > b is economically justified, i.e., the gain from the prevention measures is positive. Obviously, d should be less than a+b, i.e., an earthquake cannot be prof- itable. Taking into account that a, b and c depend on the geographical position of the cell, we write the total risk as the summation over all cells in the region of a given forecast. In the simplest case of the absence of the spacial component, when a single cell represents a region of forecast, the expression for the risk is simplied as follows r = amo+bx+cm(1 −o), where x is a spatio- temporal share of alarm, m is a share of cells occupied by the earthquakes, o is the share of missed targets. In Molchan’s theory related to the one-dimensional case, the third component is missing. However, the risk function r, which is very useful for economical considerations and as a basis for an ad- ministrative decision, could hardly be used as a criteria for the quality of seismic prediction. First of all, it can- not be computed in a consistent way because the coef- ficients a, b and c are not known in practice, and hence no effective way of its numerical evaluation is known. The computation of these coefficients is a difficult eco- nomic problem and goes far beyond of the competence of scientists. On the other hand, the readiness of the au- thority to commit resources to solving the problem de- pends on the quality of the geophysical forecast. This situation leads to a vicious circle. The second drawback of the economic risk as a cri- terion for the quality of prediction is related to the fact that it depends on many factors which have no relation to geophysics or earthquake prediction. These factors include the density of population, the number and size of industrial enterprises, infrastructure, etc. It also de- pends on subjective factors such as the willingness of authorities to use resources for prevention of the dam- age from earthquakes. The natural sciences could hardly accept the criteria for the forecast quality which depend on the type of state organization, priorities of ruling parties, results of the recent elections, etc. In this paper, we introduce a scalar efficiency (in- formation efficiency) forecast that only depends on in- formation about future earthquakes contained in the precursors. 2.6. The measure of an area In Molchan and Keilis-Borok [2008] the area of the alarm domain is defined in terms of a non-homoge- neous measure depending on spacial coordinates. Specifically, the measure of an area is proportional to the number of earthquake epicenters from a sample catalog. This approach is used to eliminate the decrease of the share of alarmed sites x with the extension of the domain when a purely safe and aseismic territory is included into consideration. It would be well-justied if the quantity x could be accepted as an adequate crite- rion of the quality of forecast in its own right. On the other hand, it can be demonstrated that the our infor- mation efficiency converges to a non-zero value 1 −o when the number of cells with an alarm is fixed but the total number of cells tends to infinity. An inhomoge- neous area of the territory under forecast which is pro- portional to “activity” does not enable us to calculate the informational efficiency. Moreover, it possesses a num- ber of unnatural features from the view of evaluating economical damage. A seismic event in the territory of low seismicity is more costly because no precautions are taken to prevent the damage of infrastructure. How- ever, in this inhomogeneous area an alarm announced in an aseismic territory will have a smaller contribution than an alarm in a seismically active territory where the losses would be in fact smaller. We conclude that this approach ‘hides’ the most costly events and does not provide a reasonable estimate of economic damage. Next, we build a forecasting method free from this kind of flaws. 3. Events and precursors on the lattice In order to define explicitly estimates of probabil- ities of strong earthquakes we discretize the two-di- mensional physical space and time, i.e., introduce a partition of three-dimensional space-time into rectan- gular cells with the space partition in the shape of squares and time partition in the shape of intervals. These cells should not intersect to avoid an ambiguity in computing the frequencies for each cell. In fact, the space cells should not be perfect squares because of the curvature of the Earth’s surface, but this may be neg- GERTSIK ET AL. 4 5 lected if the region of forecasting is not too large. So, we obtain a discrete set XK with N = I × J × K points which is defined as follows. Let us select a rec- tangular domain A of the two-dimensional lattice with I × J points x = (xi, yj), xi = a × i, i = 1, ..., I and yj = a × j, j = 1, ..., J, a is the step of the lattice. A cell in XK takes the shape of parallelepiped of height Dt with a square base. Clearly, any point in XK has coordinates (xi, yj, tk); tk = t + kDt, k = 0, ..., K. We say that a seismic event happens if an earth- quake with magnitude greater than some pre-selected threshold M0 is registered. For any cell in our space- time grid we define an indicator of an event, i.e., a bi- nary function h (i, j, k). This function takes the value 1 if at least one seismic event is registered in a given cell and 0 otherwise. Suppose that for all points (xi, yj, tk) the value of a vector precursor f (i, j, k) = { fq(i, j, k), q = 1, ..., Q} is given. The components of the precursor fq(i, j, k), q = 1, ..., Q are the scalar statistics constructed on the base of our understanding of the phenomena that precede a seismic event. 4. Mathematical assumptions A number of basic assumptions form the founda- tion of the mathematical technique of earthquake fore- casting. In the framework of mathematical theory they can be treated as axioms but are, in fact, an idealization and simplication with respect to the description of the real phenomena. Below we summarize the basic as- sumptions which are routinely used in existing studies of seismicity and algorithms of earthquake forecasting even the authors do not always formulate them explic- itly. We accept the following assumptions or axioms of the mathematical theory: (i) The multicomponent random process {h (i, j, k), f (i, j, k)}, describing the joint evolution of the vector precursors and the indicator of seismic events, is sta- tionary. This assumption provides an opportunity to in- vestigate the intrinsic relations between the precursors and the seismic events based on the historical data. In other words, the experience obtained by analysing the series of events in the past, is applicable to the future as the properties of the process do not depend on time. In reality, this assumption holds only approximately and for a restricted time period. Indeed, plate tectonics de- stroys the stationarity for a number of reasons including some purely geometrical considerations. For instance, the movements of the plates leads to their collisions, their partial destruction and also changes their shapes. Nevertheless, the seismic process can be treated as a quasi-stationary one for considerable periods of time. At the time when the system changes from one quasi- stationary regime to another (say, nowadays, many re- searchers speak about the abrupt climate change) the reliability of any prediction including the forecast of seismic events is severely restricted. (ii) The multicomponent random process {h (i , j, k), f (i, j, k)} is ergodic. Any quantitative characteristic of seismicity more representative than a registration of an individual event is, in fact, the result of averaging over time. For instance, the Gutenberg-Richter law, applied to a given region re- lates the magnitude with the average number of earth- quakes where the averaging is taken over a specific time interval. In order to associate with the time averaging a proper probabilistic characteristic of the process and make a forecast about the future one naturally needs the assumption of ergodicity. This exactly means that any averaging over a time interval [0, T] will converge to the stochastic average when T → ∞. In view of ergod- icity one can also construct unbiased and consistent es- timates of conditional probabilities of strong earthquakes under the condition that the precursors take values in some intervals. Naturally, these estimates are the fre- quencies of observed earthquakes, i.e., ratios of the number of cells with seismic events and prescribed val- ues of precursors to the total number of cells with the prescribed values of precursors. (Recall that an unbiased point estimate î of parameter i satisfies the condition E [i] = î and a consistent estimate converges to the true value i when the sample size tends to infinity). (iii) Any statement about the value of the indicator of a seismic event h (i, j, k) in the cell (i, j, k) or its prob- ability should be based on the values of the precursor f (i, j, k) only. This assumption means that the precursor in the given cell accumulates all the relevant information about the past and the information about the local properties of the area that may be used for the forecast of the seis- mic event in this cell. In other words, the best possible precursor is used (which is not always the case in prac- tice). As in the other cases, this assumption is only an approximation to reality, and the quality of a forecast depends on the quality of the selection and accumula- tion of relevant information in the precursors. Below we present some corollaries and further specifications. (iii-a) For any k the random variables h (i, j, k), i = 1, ..., I, j = 1, ..., J are conditionally independent under the condition that the values of any measurable func- tion u (f (i, j, k)) of the precursors f (i, j, k), i = 1, ..., I, j = 1, ..., J are fixed. In practice this assumption means that the forecast for the time tk = t0+ kDt cannot be affected by the values related to the future time intervals (tk, tk+Dt]. In reality EARTHQUAKE FORECASTING: STATISTICS AND INFORMATION all of these events may be dependent, but our forecast does not use the information from the future after tk. (iii-b) The conditional distribution of the random variable h at a given cell depends on the values of the precursors f at this cell and is independent of all other variables. (iii-c) The conditional probabilities Prij {h|u ( f )} of the indicator of seismic events h in the cell (i, j, k), under condition u ( f ) in this cell do not depend on the posi- tion of the cell in space (the time index k related to this probability may be dropped due to the stationarity of the process). In other words, the rule for computing the condi- tional probability Pr based on the values of precursors is the same for all cells, and the space indices of proba- bility Pr may be dropped. This condition is widely ac- cepted in constructions of the forecasting algoritms but rarely formulated explicitly. However, the probability of a seismic event depends to a large extent on the local properties of the area. Hence, the quality of the fore- casting depends on how adequately these properties are summarized in the precursors. This formalism properly describes the space inhomogenuity of the physical space because the stationary joint distribution of Prij {h, f ≤ x} for an arbitrary precursor f depends, in general, on the position of the cell in the domain A. Below we will use the distributions of precursors and indicators of seis- mic events in domain A that do not depend on the spa- tial coordinates and have the following form (iii-d) Note that assumption (iii) implies that the conditional probabilities Prij {h|u ( f )} are computed via the probabilities PrA{h, f ≤ x} only. The properties listed above are sufficient to obtain the point estimates for the conditional probabilities of seis- mic events under conditions formulated in terms of the values of precursors. However, additional assumptions are required for a testing of the forecasting algorithm: (iv) The random variables f (i, j, k) are condition- ally independent under condition that h (i, j, k) = 1. Again, these conditions are not exactly true, how- ever they may be treated as a reasonable approxima- tion to reality. Indeed, if the threshold M0 is sufficiently high then the strong earthquakes may be treated as rare events, and the cells where they are observed are far apart with a high probability. Any two events re- lated to cells separated by the time intervals t are as- ymptotically independent as t → ∞ because the seismic process has decaying correlations (the mixing property in the language of random processes). The loss of de- pendence (or decaying memory) is related to the physical phemonema such as healing of the defects in the rocks, relaxation of strength due to viscosity, etc. As usual in physical theories, we accept an idealized model of the real phenomena applying this asymptotic property for large but finite intervals between localizations of seismic events. The independence of strong earthquakes is not a new assumption, in the case of continuous space-time it is equivalent to the assumption that the locations of these events form a Poisson random field. (Note that the dis- tribution of strong earthquake should be homogeneous in space a priori, because by condition there is no infor- mation about the heterogeneity outside precursors. The information about the location of epicenters should be contained in the precursors) The Poisson hypothesis is used in many papers, see, e.g. Harte and Vere-Jones [2005]. It is very natural for the analysis of the tails of the Gutenberg-Richter law for large magnitudes [Pisarenko et al. 2008]. Summing up, the development of the strict mathematical theory of earthquake forecasting does not require any additional assumption except those routinely accepted in the existing algorithms but usually not for- mulated explicitly. 5. The standard form for precursors The correct solution of the forecasting problem given the values of precursors f (i, j, k) is provided by the estimate of conditional probability Prij{h|f (i, j, k)} of the indicator of a seismic event in the cell (i, j, k). In prac- tice this solution may be difficult to obtain because the number of events in a catalog might be not sufficient. Indeed, the range of value of a scalar precursor is usually divided into a number M of intervals, and only a few events are registered for any such interval. For a Q-dimensional precursor the number of Q-dimensional rectangles, covering the range, is already MQ, and ma- jority of them contains zero events. Only a small num- ber of such rectangles contains one or more events, that is the precision of such an estimate of conditional prob- ability is usually too low to have a practical value. For this reason one constructs a new scalar precur- sor in the form of the scalar function of component of the vector precursor, and optimizes its predictive power. This approach leads to additional complication as the units of measurement and the physical sense of different components of a precursor are substantially different. In order to overcome this problem we use some transfor- mation to reduce all the components of the precursor to a standard form with the same sense and range of values. , ,PrA PrA PrA Pri PrA Pri Prih f x fE x I J fE x I J h I J h f x x p h 1 1 1 1 1 , , , rA rA rA rijij i j A rA rijij i j A A rijij i j A # # # # # # # / / = = = = = ! ! ! Q Q Q Q V V V V E E fE E E fE H H H H H H| | | GERTSIK ET AL. 6 7 Let us transform all the precursors fq(i, j, k), q = 1, ..., Q to variables with the values in [0, 1] providing es- timates of conditional probabilities. So, after some trans- formation F we obtain an estimate of Pr {h|u ( f (i, j, k) = 1}, where u is a characteristic function of some interval B, i.e., the probability of event h (i, j, k) = 1 under condi- tion that this precursor takes the value f (i, j, k)∈B. The transformation F of a scalar precursor f (i, j, k) is defined as follows. Fix an arbitrary small number f. Let L be a number of cells (i, j, k) such that h (i, j, k) = 1, and Zl, l = 1, ..., L, be the order statistics, i.e., the val- ues f (i, j, k) in these cells listed in non-decreasing order. Define a new sequence zm, m = 1, ..., M, from the or- dered statistics Zl by the following recursion: z0 = −∞, zm is defined as the first point in the sequence Zl, such that zm − zm−1 ≥ f. Next, construct the sequence zm* = zm + (zm +1− zm )/2, m = 1, ..., M − 1, and add the auxil- iary elements z0 = −∞, zM = 1. Define also a sequence nm, m = 1, ..., M, where nm equals to the number of val- ues in the sequence Zl, such that zm* −1 ≤ Zl < zm*. Finally, define the numbers Nm, m = 1, ..., M counting all cells such that zm* −1 ≤ f (i, j, k) < zm*, m = 1, ..., M. Observe that ∑Mm−1 nm= L, ∑ M m−1 Nm= N, and use the ratios as estimate of unconditional probability of a seismic event in a given cell The transformation F is defined as follows This definition implies that transformation F re- places the value of a precursor for the frequency, i.e., the ratio of a number of cells containing a seismic event and the values of a precursor from [zm* −1, zm* ) to the number of cells with the value of precursor in this range. These frequencies are the natural estimates of condi- tional probabilities Pr {h (i, j, k) = 1|zm* −1 ≤ f (i, j, k) < zm*}, m = 1, ..., M, computed with respect to stationary dis- tribution PA(x): (This conditional probability can be written as Pr {h = 1|u ( f )}, where u is the characteristic function of interval [zm* −1, zm* ). The function g has a stepwise shape, and the length of the step is bounded from below by f. It can be checked that there exists the limit ~g = lim—— f→∞ lim—— K→∞ g = Pr {h = 1|u ( f )}. The estimates of conditional probabilities in terms of the function g are quite rough because typically the numbers nm are of the order 1. As a final result we will present below more sharp but less detailed estimates of conditional probabilities and confidence intervals for them. 6. Combinations of precursors There are many ways to construct a single scalar precursor based on the vector precursor (F fq, q = 1, ..., Q). Each such construction inevitably contains a number of parameters or degrees of freedom. These parameters (including the parameters used for construction of the precursors themselves) should be selected in a way to optimize the predictive power of the forecasting algo- rithm. The optimization procedure will be presented below, its goal is to adapt the parameters of precursors to a given catalog of earthquakes, that is to obtain the best possible retrospective forecast. However, this adap- tation procedure creates a “ghost” information related with the specic features of the given catalog but not present in physical propertities of real seismicity. This “ghost” information will not be reproduced if the al- gorithm is applied to another catalog of earthquakes. It is necessary to increase the volume of the catalog and to reduce the number of free parameters to get rid of this “ghost” information. Clearly, the first goal requires the considerable increase of the observation period and may be achieved in the remote future only. So, one con- centrates on the reduction of numbers of degrees of freedom. The simplest variant including Q − 1 parame- ters is the linear combination As a strictly monotonic function of a precursor is a precursor itself the log-linear combination is an equally suitable choice Here cq, q = 1, ..., Q − 1 are free parameters. The re- sult of the procedure has the form g = F f *. 7. Alarm levels, point and interval estimations In view of Equation (5.3) the precursor g is the set of estimates for probabilities N L m= ; ; ; ; PrA Pr p h i j k h i j k x dPA x1 1A rA PA / = = = = 3 3 - # Q Q Q V V V E F H I - # , , if ; ; , , ..., . g F f i j k N n z f i j k z m M1 m m m m1 1# = = = =) )- Q Q V V ; ; ; ; ; ;Pr Pr h i j k h i j k x dPA x z f i j k z dPA x 1 1 PA m m PA z # z z # z 1 m m m m 1 1 1# = = = = ) ) - ) ) ) ) - - Q Q Q Q Q V V V V V F F I I z # z # f) F f1 q Q f1 2 = +f) = | .c F fqq fq1 2 -| .ln lnf) F f1 c F fqq q Q fqf1 1 2 = +f) - = Q QV V| EARTHQUAKE FORECASTING: STATISTICS AND INFORMATION (5.1) (5.2) (6.1) (6.2) (5.3) (5.4) Its serious drawback is that typically { z*l−1 ≤ f (i; j; k) < z*l } correspond to single events, and therefore the precision of these estimates is very low (the confidence intervals discussed below may be taken as a convenient measure of precision). In order to increase the precision it is recommended to use the larger groups of cells con- taining a larger number of events, that is a more coarse covering of the space where the precursor takes its val- ues. In a sense, the precision of the estimation and the resolving power are related by a kind of “uncertainty principle”: the more precise estimate one wants to get the more coarse is the time-space range of their values and vice versa. We adapt the following approach in order to achieve a reasonable compromise. (1) For fixed thresholds as, s = 1, ..., S +1, a1 = 1, as< as+1; aS+1= 0, we define S possible alarm levels as+1 ≤ g (i; j; k) < as and subsets Xs, s = 1, ..., S , of the set XK corresponding to alarm levels, i.e., Xs is a set of cells of XK, such that as+1 ≤ g (i; j; k) < as. There are different ways to choose the number S of alarm levels and the thresholds as, s = 2, ..., S . Say, fix S = 5, and select as = 10 −a(s−1). This is a natural choice of the alarm level because at a = 1 it corresponds to decimal places of the estimate of the conditional prob- ability given by the precursor. The problem with S = 2, i.e. two-level alarm, may be reduced to the hypothesis testing and is discussed in more details below. (2) Compute the point estimates of probabili- ties obtained via the distribution PX(x) of a precursor g in the same way as in Equation (4.4). The property (iv) implies that for any domain Xs the binary random vari- ables h (i, j, k) are independent and identically distrib- uted, i.e. and the unbiased estimate of ps takes the form where ns stands for the number of cells in domain Xs, and by ms we denote the number of cells in Xs contain- ing seismic events. (3) The random variable ms takes integer values be- tween 0 and ns. The probabilities of these values are computed via the well-known Bernoulli formula Let us specify the confidence interval covering the unknown parameter ps with the confidence level c. In view of the integral Mouvre-Laplace theorem for ns large enough the statistics is approximately Gaussian N (0, 1) with zero mean and unit variance. Note that the values ns increase with time. Omitting straightforward calculations and re- placing the parameter ps by its estimate is we obtain that is − < is < +is +; where and tc is the solution of equation U(tc) = c — 2 . Here U stands for the standard Gaussian distribution function. (4) As a result of these considerations we intro- duce ‘the precursor of alarms’ which indicates the alarm level: R ( f (i, j, k) ) = s (i, j, k) . It will be used for cal- culations of point estimates and the confidence interval in the form This result will be used for a prospective forecast- ing procedure. 8. The information gain and the precursor quality The construction of a ‘combined’ precursor R in- volves parameters from formula (6.1) or (6.2) as well as parameters which appear in the definition of each in- dividual precursors fq. It is natural to optimize the fore- casting algorithm in such a way that the information gain related to the seismic events is maximal. In a one- dimensional case the information gain as a measure of the forecast efficiency was first intoduced by Vere-Jones [1998]. Here we exploit a modification of his ideas in the case of multi-dimensional space-time process. Re- mind the notions of the entropy and information. Put- ting aside the mathematical subtlety (see Kelbert and Suhov [2013] for details) we follow below an intuitive approach by Prohorov and Rozanov [1969]. The infor- mation containing in a given text is, basically, the length of the shortest compression of this text without the loss of its content. The smallest length S of the sequence of digits 0 and 1 (in a binary code) for counting N dif- ferent objects satises the relations 0 ≤ S − log2 N ≤ 1. So, ; ; , , ..., .Pr h z f i j k z l L f1 1l l1 1#= = ) ) - Q QV VF I , , , , ..., .Pr h a g i j k a s S1 1s s1 1#= =+ Q VF I , , ,Pr h a g i j k a p1 s s s1 1# /= + Q VF I n m s s i= .Pr m k n k p p1s s s k s nV ks= = - -Q S QV X nV p p p n 1 s s s s si - - Q Q V V , n t n t 1 1 s s s s s s s s s s i i i i i i i i = - - = - + c c - + Q Q V V , , , ,s i j k s i j1i i - Q QVF , , ,k s i j k1 i+QV VI GERTSIK ET AL. 8 (7.1) 9 the quantity S ~ log2 N characterizes the shortest length of coding the numbers of N objects. Consider an experiment that can produce one of N non-intersecting events A1, ..., AN with probabilities q1, ..., qN, respectively, q1+ ...+ qN = 1. A message in- forming about the outcomes of n such independent identical experiments may look as a sequence (Ai1, ..., Ain), where Aik is the outcome of the experiment k. But for long enough series of observations the frequency ni/n of event Ai is very close to its probability qi. It means that in our message (Ai1, ..., Ain) the event Ai ap- pears ni times. The number of such messages is By the Stirling formula the length of the shortest coding of these messages The quantity Sn measures the uncertainty of the given experiment before its start, in our case we are look- ing for one of the possible outcomes of n independent trials. The specific measure of uncertainty for one trial is known as Shannon’s entropy of distribution q1, ..., qN (in physical literature it is also known as a measure of chaos or disorder). After one trial the uncertainty about the future outcomes decreases by the value S = Sn− Sn−1, this decrement equals to the information gain I = S, ob- tained as a result of single trial. The quantity is the (unconditional) entropy of distribution for the in- dicator of the seismic event h in a space-time cell in the absence of any precursors. The conditional entropy S (h|as+1 ≤ g < as) under condition that in the cell (i, j, k) the alarm level s is set up equals The expected conditional entropy SR(h) of indica- tor of seismic events where the averaging in performed by the distribution of precursors R takes the form We conclude that the knowledge of the precursor values helps to reduce the uncertainty about the future experiment by S (h) − SR(h) which is precisely informa- tion I (R,h) obtained from the precursor. Taking into ac- count Equations (8.1), (8.2) and the fact that we specify the information gain as By analogy with the one-dimensional case [Kol- mogorov 1965] the quantity I (R,h) may be called the mutual information about the random field h that may be obtained from observations of random field R. It is known that the information I (R,h) is non-negative and equals to 0 if and only if the random fields h and R are independent. This mutual information I (R,h) takes its maximal value S (h) in an idealized case of absolutely exact forecast. The mutual information is the informa- tion that the distributions of precursors contribute to that of the indicator of a seismic event. For this reason it may be considered as an adequate scalar estimate for the quality of the forecast. The quantity I (R,h) depends on the cell size, i.e., on the space discretization length a and time interval Dt. We need a formal test to compare precursors de- fined for different size of the discretization cells. For this aim let us introduce the so-called ‘efficiency’ of pre- cursors as the ratio of information gains This efficiency varies between 0 and 1 and serves as a natural estimate of information quality of precur- sors. It allows comparing different forecasting algo- rithms and select the best one. A natural estimate of S (h) based on Equations (5.1) and (5.2) is defined as follows Taking into account Equation (8.1) and using an estimate of PA(as+1 ≤ g < as) in the form of ratio xs= ns— N , we construct an estimate of I (R,h) as follows Remark. It seems reasonable to introduce a penalty !... ! !N n n n m N1 = ! . .log logS Nn n q qn Nn i i N i2 1 2. .- = | , ..., logn S n S q q q q 1 1 n n N i i N i1 2 1 = =- = Q V | log logS h pp- p p 11 AA A A2 2 -= p- - -Q Q QV V V . log log S h p p! p p PA a g a11 R s s s S s s P$ A s s 1 2 2 1 1#- =- + + - = +Q Q Q Q V V V V p! P$ | p p PA a g aA s s S PA s s 1 1 1#= = +Q V| , . R log log I h p p p# p p p PA a g a1 1 1 A s s s S A s s PA s s 1 2 2 1 1#- - = + + - = + Q Q Q V V V p# & | log logS h a g a pp- p p 11s s ss s s1 2 21# -= p- - -+R Q QW V V ,R R r h S h I =Q QQV , . h h V V .log logS h 1 12 2m m m m=- - - -tQ Q QV V V , ,R log logI h 1 1 1 s s s s s S s2 2 1 i m i i m i x= + - - - = tQ QV V# &| ; ,R r R h S h I =t t tQ QQV , . h h V V EARTHQUAKE FORECASTING: STATISTICS AND INFORMATION (8.1) (8.2) (8.3) (8.4) (8.5) related to the number of superficious parameters in eval- uating the quality of forecast pointing to the natural analogy with the Akaike test [Akaike 1974] and similar methods in information theory. In our context the main parameter of importance is r̂ (R; h) and its limit as t →∞. This quantity does not involve the number of parame- ters directly. Probably, the rate of convergence depends on the number of parameters but this dependence is not studied yet. 9. The forecasting procedure The number of time intervals, i.e., the number of observations N used in the construction of estimates increases with the growth of observation time. So, the computation procedure requires constant innovations. On the other hand some computation time is required to ‘adapt’ the model parameters to the updated infor- mation about seismic events via an iterative procedure. For these reasons we propose the following forecasting algorithm. (1) Given initial parameter values at the moment tK−1= t0 + (K − 2)Dt we optimize them to obtain the maximum of efficiencys r̂ (R; h) of a precursor in do- main XK−1. For this aim the Monte-Carlo methods is helpful: one perturbs the current values of parameters randomly and adapts the new values if the efficiency increases. The process continues before the value of ef- ficiency stabilized, this may give a local maximum, so the precedure is repeated for a sufficient number of times. The choice of the initial value on the first step of the optimization procedure is somewhat arbitrary but a reasonable iteration procedure usually leads to con- sistent results. The optimization procedure takes the period of time tK−1< t ≤ tK. (2) Next, we construct the forecast in the follow- ing way. At the moment tK the values of precursor ĝ in each cell (i, j, K + 1) is computed with optimized pa- rameters. Based on these parameter values the alarm levels, the point estimates and confidence intervals are computed in each cell as well as the values of efficiency of precursors. (3) The estimates of stationary probabilities of seismic events in the cell î(i, j) are defined as follows: they can be used for creation of maps of seismic danger (but not “hazard”, this term is reserved) in the region. 10. Retrospective and prospective informativities The efficiency of precursor which is achieved as a result of parameters optimization could be considered as retrospective as it is constructed by the precursors adaptation to the historical catalogs of seismic events. The prospective efficiency for the space-time domain containing the cell in the ‘future’ is based on the forecast. It is computed via formulas (8.3), (8.4), (8.5) with the only difference to the retrospective efficiency, however, ap- proaches this value with time. In principle, the prospec- tive efficiency is an ultimate criterion of the precursors quality and the retrospective efficiency could serve only for the preliminary selection of precursors and their adaptation to the past history of seismic events. 11. Testing of the forecasting algorithm The efficiency of precursor could be computed ex- actly only in an idealized case of infinite observation time. However, its estimate may be obtained based on the observation over a finite time interval. So, if an es- timate produces a non-zero value not necessarily the real effect is present. It may be simply a random fluc- tuation even if the precursor provides no information about the future earthquake. For this reason we would like to check the hypothesis H0 about the independence of a precursor and an event indicator with a reasonable level of confidence. In case the hypothesis is rejected one has additional assurance that the forecasting is real, not just a “ghost”. So, consider the distributions and The function PA(PA −1(y)) = y of variable y = PA(x) provides an uniform distribution F*(y) = Pr {p ≤ y} of some random variable p on [0,1]. Next, consider a dis- tribution function G(y) = P'A(PA −1(y)) on [0,1], and use a parametric representation for abscissa PA(x) and ordi- nate P'A(x). If random fields g and h are independent the distribution functions PA(x) = P'A(x) and G(y) are uniform. So, the hypothesis about the absence of forecasting, i.e., about the independence of g and h is equivalent to the hypothesis H0 that the distribution G(y) is uniform. The empirical distribution GL(y) related to G(y) is defined as follows. Denote by ul, l = 1, ..., L the values of the function g (i, j, k) sorted in the nondecreasing order and belonging to the cells where h (i, j, k) = 1. Let nl be the numbers of cells such that h (i, j, k) = 1, g (i, j, k) = ul. Denote by m (ul) the numbers of cells such that g (i, j, k) < ul, and define the empirical distribution GL(y) as a stepwise function with GL(0) = 0 and positive jumps of the size nl— L at points yl = m (ul)—— N . The well known methods of hypothesis testing re- quires that the function GL(y) has the same shape as for ;i j K 1 k K 1 i = = tQ V | ,, ,s i j k 1 i Q V| , , ,PriPA x g i j kI J x 1 ; PA A rijij i j #= ! Q Q Q V V V E H| , , , , .PrPA x I J g i j k x h i j k 1 1 ; PA ijij i j A #= = ! l Q Q Q Q V V V V F I| GERTSIK ET AL. 10 · · 11 independent trials, i.e., random variables ul, l = 1, ..., L are independent in view of axiom (iv). Naturally, we ac- cept the precursors such that the hypothesis H0 is re- jected with the reasonable level of confidence. (Remind, that the hypothesis is accepted if and only if its logical negation could be rejected based on the available ob- servations. The fact that the hypothesis cannot be re- jected does not mean at all that it should be accepted, it only means that the available observations do not con- tradict this hypothesis. Say, the well-known fact that “The Sun rises in the East” does not contradict to our hypoth- esis, however it may not be considered as a ground for its acceptance.) For large values of L the Kolmogorov sta- tistics [Kolmogorov 1933a] is helpful for this aim with an asymptotic distribution or Smirnov’s statistics [Smirnov 1939] with asymptotic distribution The asymptotic expressions for these statistics can be used for L > 20 [Bolshev and Smirnov 1965]. 12. The binary alarm and the hypothesis testing The prediction is the form of forecast when an alarm is announced in a given cell without a prelimi- nary evaluation of the probability of a seismic event. In this case we can estimate the probabilities of events, too. (If the alarm is announced in an arbitrary domain X we set up an alarm if at least half of the cell of our model is occupied by alarm.) Let M be the number of cells in X which are in the state of alarm, M0 be the number of cells where the seismic event is present but no alarm was announced (the number of ‘missed targets’). Denote by x = M— N the share of the cells with alarm announced, m = L— N the share of the cells with seismic events, and o = M0— M the share of missed targets. Let a random variable h (i, j, k) equal 1 if an alarm is announced in the cell (i, j, k) and 0 other- wise. Obviously, the estimate of conditional probability Pr {h (i, j, k) = 1|h (i; j; k) = 1} of the seismic event under the condition of alarm is m——(1 −o)——— x ; and the esti- mate of conditional probability Pr {h (i, j, k) = 1|h (i; j; k) = 0} of the seismic event under the condition of no alarm is mo——— 1−x . If the alarm is announced according to the pro- cedure described in Section 5 the threshold a1 specify- ing the acceptable domain of values for g (i, j, k) should be treated as a free parameter and selected by maxi- mizing the information efficiency r̂ (h; h). The esti- mate of information increase for given values of x and o equals The value of h (i; j; k) characterizes the results of checking two mutually exclusive simple hypothesis: H0: the distribution of f *(i, j, k) has the form implying ‘no seismic events’, or H1: the distribution of f *(i, j, k)) has the form implying the presence of seismic event. Statistics for checking of these hypothesis is the precursor g (i, j, k), and the critical domain for H0 has the form {g (i, j, k) ≥ a1}. (If usual method of alarm an- nouncement is used the relevant precursor plays the role of statistics and the critical domain is defined by the rule of the alarm announcement). The probability of a first type error it is estimated as x−m——— 1− (1−o) ——— m . The probability of second type error is estimated as o. (Note that due to condition (iii) any test used for checking these hypothesis should not de- pend on the coordinates of the cell). The Neyman-Pearson theory allows to define the domain of images of all possible criteria: in coordinates (a, b) it is a convex domain with a boundary C which corresponds to the set of uniformly most powerful tests. This family may be defined in terms of the likelihood ratio K(x) = p1A(x)——— p0A(x) under condition that the distribu- supD GL L= Q yy -V ( ) , ,lim Pr L D z e z1 0 L L k k k z2 2 2 2# = - "3 3 3 =- -F I | sup inf D G D G L L L L = =- + - Q! ! ,$ ,$y y y y L - -Q Q V V ,$ ,$ , . lim Pr lim PrL D z L D z e z1 0 L L L L z2 2 2 # #= = = - " "3 3 + - - F FI I , . log log log log I h 1 1 1 1 1 1 1 1 1 1 2 2 2 2 h m o x o mo x o x m o m x x m o x mo x m x mo - - - - - = - + - + + - - - - - + + - - tQ Q Q Q Q Q Q Q V V V V V V V V! $ , , , , ,PrAPA x f ) i j k i j kx h 0PA rA 0 #= =fF )Q Q QV V VfF I , , , , ,PrAPA x f ) i j k x h i j k 1PA rA 1 #= =fF )Q Q QV V VfF I , , , , ,Pr g i j k h i j k1 0a = = =Q QV VF I , , , , ,Pr g i j k h i j k0 1b= = =Q QV VF I EARTHQUAKE FORECASTING: STATISTICS AND INFORMATION tions P1A(x) and P 0 A(x) have the densities p 1 A(x) and p0A(x). where ~ denotes the threshold. In Gercsik and Kelbert [2004] we demonstrated that among all the tests with the images on the boundary C there exists three different best tests. Here the term “best” may be understood in three different senses, i.e., maximizing the variational, correlational and informational efficiency. The most rel- evant criteria is the informational efficiency r̂ (h; h). 13. The choice of precursors We use the term ‘empiric precursor of an earth- quake’ for any observable characterisric derived from the catalog only, which provides for this catalog a rea- sonable retrospective forecast of seismic events, and which is not derived from basic physical concepts of seismicity (say, the periods of relatice calm, deviation of some basic characteristic from a long-time average , etc). In contrast, the physical precursors are derived from some physical processes and characterize physi- cal quantities (stress fields, strength, concentration of cracks, etc.) or well-defined physical processes (i.e., phase transitions, cracks propagations, etc.) In the me- teorological forecast the danger of using empirical pre- cursors was highlighted by A. Kolmogorov in 1933 [Kolmogorov 1933b]. From that time the meteorolog- ical forecast relies on the physical precursors which are theoretically justified by the models of atmospheric dy- namics. Below we will present Kolmogorov’s argument adapted to the case of seismic forecast. This demon- strates that the purely empirical precursors work well only for the given catalog from which they are derived. However, their efficiency deteriorates drastically when they are applied to any other independent catalog. Consider a group of k empirical precursors used for a forecast and selected from a set of n such groups. According to Kolmogorov’s remark the number k is typically rather small. This is related to the fact that a number of strong earthquakes in catalog is unlikely to exceed a few dozen. As the values of precursors are ran- dom there exists a small probability p that the efficiency of the forecast exceeds the given threshold C. Then the probability of event r̂ (R, h) ≤ C equals 1 − p, and the probability of event r̂ (R, h) > C for at least one collec- tion of precursors equals P = 1 − (1 − p)n and tends to 1 as n → ∞. (According to Kolmogorov some arbitrari- ness of the assumption of independence is compen- sated by the large number of collections.) Summing up, if the number of groups is large enough with probability close to 1 it is possible to find a group giving an effective retrospective forecast for a given catalog. In practice this is always the case as the number of empirical precursors could be increased in- definitely by variation of real parameters used in their construction. It is important to note that for such a group, which is highly efficient for the initial catalog, the probability that the efficiency is greater than C is still equal to p for any other catalog. In other words: the larger the number of the groups of empirical precur- sors the less reliable the forecast is. So, the collection of a large list of the empirical precursors is counter-pro- ductive. Much more reliable are the physical precursors intrinsicly connected with the physical processes which preserve their values with the change of sample. The probability to find such a set of precursors by pure em- pirical choice is negligible because they are very rare in the immense collection of all possible precursors. 14. Demonstration of algorithm A preliminary version of the forecast algorithm de- scribed above was used in the paper [Ghertzik 2008] for Califorrernia and the Sumatra-Andaman earthquake re- gion. These computations serve as a demonstration of the efficiency of the method but their actual results should be taken with a pinch of salt because the selec- tion of precursors does not appear well-justied from the modern point of view: the number of free param- eters (34) to be adapted in the precursor “stress indica- tor” is too large. For this reason the method was not fully exploited. Nowadays, the progress is resumed with a particular emphasis on the selection of a new set of precursors strongly related to the critical phenom- ena in geophysics. The following selection of precursors was used: (1) The stress indicator is a scalar predictor char- acterizing the local level of the stress field. It is con- ceivable as a stress tensor component. The predictor increases due to tectonic movements, decreases with viscosity and with increasing fracturing, and is affected by earthquake sources considered as cracks. (2) The damage indicator is a predictor character- izing the stress-induced microfracturing level. The pre- dictor increases with the value of applied stresses and decreases as cracks are healed. (3) The third predictor is the concentration Zhurkov-Sobolev criterion of seismically active faults based on the calculation of the parameter concentra- tion (density) seismogenic ruptures and modified in such a way that it can be calculated for a local cell. (4) The far-order predictor is the integral of the ra- dial correlation function for seismic energy released in a spatiotemporal cell. Its construction is based on the idea of the strong earthquake as a phase transition be- , , , , i j k i j k i i1 0 h h = =Q Q V VG iff x iff x 2 1 ~ ~ K K Q Q V V GERTSIK ET AL. 12 13 tween consolidation and fracture states of a medium. The phase transition is preceded by the appearance of a far order, an abrupt increase in the correlation radius for the energy of shocks released in a cell. For detailed description of precursors we refer to original, for it is only an illustration but not the topic of this work. It is useful, however, to summarize the main results of this paper. California region. The catalog Global Hypocen- ter Data Base CD-ROM NEIC/USGS, Denver, CO, 1989, together with data from the site NEIC/USGS PDE (ftp://hazard.cr.usgs.gov) for earthquakes with magni- tudes M ≥ 4.0 with epicenters between 113°- 129° of western longitude and 31°- 43° of northern latitude was used for parameter adaptation. The initial time t0 was selected by subtracting from the time of actual computation, 08.03.2006, an integer number of half- year intervals such that t0 fits the first half of the year 1936. (The final time 08.03.2006 could be considered as an initial moment for constructing half year forecast forward up to the date of the latest earthquake avail- able in the catalog). During the computation the time interval from the first half of 1936 to the first half of 1976 was used for relaxation of the zero initial data used for precursors. After this date the catalog for the earthquakes with magnitude M ≥ 4.0 was used to esti- mate the probabilities of strong earthquakes with mag- nitude M ≥ 6.0 up to the moment of actual forecast. Note that the adapted restriction to include into con- sideration only earthquakes with magnitudes M ≥ 4.0 is a severe restriction. It decreases the precision of pre- cursor computation and therefore, if a prediction is suc- cessful, increases the degree of condence in the predictor choice. We choose a = 150 km as a size of the spacial lattice, and Dt = 6 months as a time-step. Retro- spective forecast was performed with 5 alarm levels de- fined by the thresholds as = 10 −a(5− s), s = 1, ..., 4 and a = 0.75. (Due to too short time step no alarms was registered on the lowest level when parameter a = 1 was selected). In order to reduce the influence of the boundary conditions the large square covering all the seismic events in the catalog used in the computations was reduced by two layers of elementary cells from each boundary. As a result of optimization the forecast information efficiency of 0.526 was achieved, i.e., the forecasting algorithm applied to the given catalog ex- tracted from it about 53% of all available information about seismic events. It seems that this result could be only partially explained by a lucky selection of precur- sors: another contributor to the high efficiency of the algorithm is the adaptation of the parameters to the features of the specific catalog. The influence of this ar- tificial information may be reduced only with the in- crease of the observation interval. Accepting the rule of binary alarm announcement in the cells from group 1 and 2 from 5 levels possible one obtains that the space-time share of alarmed cells is 3.4% and the share of missed targets is 18.2%. This result is comparable with the best forecasts available in the literature and ob- tained by other methods (in the cases when the quan- titative parameters of algorithms are presented in the publications). When the forecast was constructed in the future we obtained that the estimate of probability of a strong earthquake anywhere in the area under study is 0.174, and the maximal point estimate of an event in any individual cell is 0.071. As a whole the seismic situ- ation in California did not look too alarming. Indeed, there were no strong earthquakes in California in the next half-year. EARTHQUAKE FORECASTING: STATISTICS AND INFORMATION Alarm level Number of cells Number of events Left boundary of interval Point estimate Right boundary of interval 1 43 25 0.4186 0.5814 0.6998 2 155 11 0.0376 0.0710 0.1172 3 457 6 0.0063 0.0131 0.0291 4 1152 2 0.0005 0.001 0.0071 5 4049 0 0 0 0.001 Figure 1. Mapped areas of five levels of half-year forward forecast- ing for California on March 8, 2006. Table 1. Number of events and alarm cells for each level of the retrospective analysis for California. Table 1 presents the number of events and the total number of alarm cells for each level of the retro- spective analysis, as well as their ratios, which are well- grounded point estimates of the probability that at least one earthquake with a magnitude M ≥ 6.0 will occur in a cell at a given level of alarm. The 5% condence intervals are also given in the table for each of these probabilities. The true values of event probabilities lie within these intervals with a 95% probability (condence coefficient). In all, 5856 cells participated in the prediction, and 44 of these cells (0.75% of all cells) contained events. SAE region. We have conducted a retrospective forecast of strong earthquakes with magnitudes M ≥ 7.0 for the whole region where the Sumatra-Andaman earth- quake (SAE) happened on 26.12.2004 with magnitude M = 9.3. The catalog Global Hypocenter Data Base CD- ROM NEIC/USGS, Denver, CO, 1989, together with the data from the site NEIC/USGS PDE (ftp://hazard. cr. usgs.gov) for earthquakes with magnitudes M ≥ 5.5 with epicenters between 84.3°- 128° of eastern longitude and 20°- 26° of northern latitude was used for the pa- rameters adaptation. The initial moment of time t0 was selected by subtracting from the time of actual compu- tation, 10.11.2004, an integer number of half-year in- tervals such that t0 fits the first half of the year 1936. (The final time was selected in such a way that the next half-year period covers SAE and its powerful after- shock). During the computation the time interval from the first half of 1936 to the first half of 1976 was used for relaxation of the zero initial data used for precur- sors. After this date the catalog was used to estimate the probabilities of strong earthquakes with magnitude M ≥ 7.0 up to the moment of actual forecast. (For mag- nitude M ≥ 7.5 the number of seismic events was not sufficient for reliable forecast because the 5%-confi- dence intervals strongly overlapped). In this case the re- striction to include into consideration only earthquakes with magnitudes M ≥ 5.5 was adapted. As before, it de- creases the precision of precursor computation and therefore, if the prediction is successful, increases the degree of confidence to the predictor choice. We se- lected the size of the spacial grid as a = 400 km and the size of time-step Dt = 6 months. Retrospective analysis was conducted following the same scheme as in the previous case. In order to reduce the influence of the boundary conditions the large square covering all the seismic events in the catalog used in the computations was reduced by two layers of elementary cells from each boundary. As a result of optimization the forecast information efficiency was 0.549, i.e., the forecasting al- gorithm extracted around 55% of all available infor- mation about seismic events when applied to the given catalog. In the case of binary alarm announcement the space-time share of alarmed cells was 3.1% and the share of missed targets was 8.3%. This result is compa- rable with the best forecasts available in the literature and obtained by the other methods (in the cases when the quantitative parameters of algorithms are presented in the publications). In the case of forward forecast the two most powerful earthquakes, i.e., SAE and its major aftershock, happened in the alarm zone of the second level, and two other events with smaller magnitudes in the zone of alarm level 4. The results of the half-year forward prediction be- ginning from November 10, 2004, and M ≥ 7.0 earth- quakes that occurred in this period are plotted in Figure 2. As seen from the figure, the two strongest events (the SAE and its strongest aftershock) occurred in the level- 2 alarm zone, while two weaker events occurred in the level-4 alarm zone. It is noteworthy that, if monitoring GERTSIK ET AL. 14 Figure 2. Mapped areas of five levels of half-year forward forecasting for the SAE region on November 10, 2004. 15 in this region were conducted according to the pro- posed scheme, a nine-cell square cluster containing one cell of the level-1 alarm would be regarded as haz- ardous, giving every reason for anxiety in relation to the impending SAE. Table 2 presents the number of events and the total number of alarm cells for each level of the retro- spective analysis, as well as their ratios, which are well- grounded point estimates of the probability that at least one earthquake with a magnitude M 7.0 will occur in a cell at a given level of alarm. The 5% con- dence intervals are also given in the table for each of these probabilities. In all, 8352 cells participated in the forecasting and 36 of these cells (0.43% of all cells) con- tained events. Figure 3 plots the dependence of the fraction of target misses on the spatiotemporal fraction of alarms for points at which the boundaries between alarm lev- els divide the set of all cells into two classes: union of more anxiety cells and union of less anxiety cells. It is an estimate for Molchan’s error diagram. 15. Conclusions This article demonstrates that the prediction method based on the pattern recognition does not have sufficient foundation and contains a number of errors. We propoused here the alternative approach that is free of this errors and enhances the computing capability of the forecasting. Discretization of spacetime, using elements of the theory of multicomponent random processes and axiomatization of the problem allows to drastically simplify the procedure of earthquake fore- casting and to obtain the following results: (1) The method of announcement to declare a multi-level alarm was found. (2) The method of calculating the point estimate of probability of an event for each alarm level was found. (3) The method of calculating interval estimates of event probability for each level of anxiety was found. (4) The equivalence of the announcement of a two-phase alarm and solving the problem of testing hy- potheses was shown. (5) The method of testing an algorthim of forecast was found. (6) The scalar criterion for comparison of different algorithms of forecast was found. (7) The equivalence of the method of the announce- ment of alarms and method assessing the probability of event was shown. It appears to the authors, that a satisfactory math- ematical formalism of earthquake forecasting based on arbitrary set of precursors is fully fledged. The most im- portant result we believe is the discovery of the scalar quality criterion of forecast algorithms. The criterion allows to find out which one is the best from existing al- gorithms and to focus on finding the most effective pre- cursors. Acknowledgements. We would like to thank V. Pisarenko and G. Sobolev for stimulating discussions that gave us the impulse for writing this paper. M. Kelbert thanks the HSE for the support in the framework of the implementation of the Global Competitive- ness Program. EARTHQUAKE FORECASTING: STATISTICS AND INFORMATION Alarm level Number of cells Number of events Left boundary of interval Point estimate Right boundary of interval 1 36 18 0.3300 0.500 0.6700 2 222 15 0.0400 0.0676 0.1052 3 220 2 0.0015 0.0091 0.0281 4 602 1 0 0.0017 0.0070 5 7272 0 0 0 0.0006 Figure 3. Empirical Molchan's diagram: the share of missed targets vs the share of alarms. Table 2. Number of events and alarm cells for each level of the retrospective analysis for SAE region. References Akaike, H. (1974). A new look at the statistical model identification, IEEE T. Automat. Contr., 19 (6), 716- 723. Bolshev, L.N., and V.N. Smirnov (1965). Mathematical Statistical Tables, V.A. Steklov Matematical Insti- tute, Academy of Sciences, Moscow, USSR, 464 p. (in Russian). Gercsik, V., and M. Kelbert (2004). On comparision of hypothesis tests in Bayesian framework without a loss function, Journal of Modern Applied Statistical Methods, 3 (2), 399-405. Ghertzik, V. (2008). Physical concepts of fracture and prediction of probabilities of strong earthquakes, Phys. Solid Earth, 44 (3), 22-39. Harte, D., and D. Vere-Jones (2005). The entropy score and its uses in earthquake forecasting, Pure Appl. Geophys., 162 (6-7), 1229-1253. Holliday, J.R., C. Chen, K.F. Tiampo, J.B. Rundle, D.L. Turcott and A. Donnellan (2007). A RELM Earth- quake forecast based on pattern informatics, Seis- mol. Res. Lett., 78 (1), 87-93. Jordan, T.H., Y.-T. Chen, P. Gasparini, R. Madariaga, I. Main, W. Marzocchi, G. Papadopoulos, G. Sobolev, K. Yamaoka and J. Zschau (2011). Operational earth- quake forecasting: state of knowledge and guide- lines for utilization, Annals of Geophysics, 54 (4); doi:10.4401/ag-5350. Kagan, Y.Y., and D.D. Jackson (2000). Probabilistic fore- casting of earthquakes, Geophys. J. Int., 143, 438- 453. Keilis-Borok, V.I., and V.G. Kossobokov (1990). Premon- itory activation of earthquake flow: algorithm M8, Phys. Earth Planet. In., 61. 73-83. Keilis-Borok, V.I. (1996). Intermediate-term earthquake prediction, P. Natl. Acad. Sci. USA, 93 (9), 3748-3755. Keilis-Borok, V.I., and A.A. Soloviev (2003). Nonlinear Dynamics of the Lithosphere and Earthquake Pre- diction, Springer-Verlag, Berlin-Heidelberg, 338 p. Kelbert, M., and Y. Suhov (2013). Information Theory and Coding by Example, Cambridge Univ. Press, Cambridge, 530 p. Kolmogorov, A.N. (1933a). Sulla determinazione em- pirica di una legge distribuzione, Giornale dell’Isti- tuto Italiano degli Attuari, 4, 83-91 (in Italian). Kolmogorov, A.N. (1933b). On the suitability of statis- tically obtained prediction formulas, Zh. Geoz., 3, 7882. Kolmogorov, A.N. (1965). Three approaches to the defi- nition of the notion of information amount, Probl. Inf. Transm., 1 (1), 3-11. Molchan, G.M. (1990). Strategies in strong earthquake prediction, Phys. Earth Planet. In., 61, 84-98. Molchan, G.M., and V.I. Keilis-Borok (2008). Earth- quake prediction: probabilistic aspect, Geophys. J. Int., 173, 1012-1017. Pisarenko, V.F., A. Sornette, D. Sornette and M.V. Rod- kin (2008). New approach to the characterization of Mmax and of the tail of the distribution of earthquake magnitudes, Pure Appl. Geophys., 165 (5), 847-888. Prohorov, Yu.V., and Yu.A. Rozanov (1969). Probability theory, basic concepts. Limit theorems, random processes, Springer-Verlag, 401 p. Smirnov, N.V. (1939). On deviations from the empirical distribution curve, Mat. Sb., 6 (48), 1, 3-24. Vere-Jones, D. (1998). Probability and information gain for earthquake forecasting, In: Problems of Geody- namics and Seismology, Computational Seismology, Moscow, 30, 248-263. WGNCEP, Working Group on Northern California Earthquake Potential (2007). The Uniform California Earthquake Rupture Forecast, Version 3 (UCERF3) Project Plan: U.S. Geological Survey Open-File Report 1-176, 2011. * Corresponding author: Vladimir Gertsik, Institute of Earthquake Prediction Theory and Mathematical Geophysics RAS, Moscow, RF; email: gercsik@mitp.ru, gertzik@yandex.ru. © 2015 by the Istituto Nazionale di Geofisica e Vulcanologia. All rights reserved. GERTSIK ET AL. 16 << /ASCII85EncodePages false /AllowTransparency false /AutoPositionEPSFiles false /AutoRotatePages /None /Binding /Left /CalGrayProfile (Dot Gain 20%) /CalRGBProfile (sRGB IEC61966-2.1) /CalCMYKProfile (U.S. Web Coated \050SWOP\051 v2) /sRGBProfile (sRGB IEC61966-2.1) /CannotEmbedFontPolicy /Warning /CompatibilityLevel 1.3 /CompressObjects /Tags /CompressPages true /ConvertImagesToIndexed true /PassThroughJPEGImages true /CreateJobTicket false /DefaultRenderingIntent /Default /DetectBlends true /DetectCurves 0.1000 /ColorConversionStrategy /LeaveColorUnchanged /DoThumbnails false /EmbedAllFonts true /EmbedOpenType false /ParseICCProfilesInComments true /EmbedJobOptions true /DSCReportingLevel 0 /EmitDSCWarnings false /EndPage -1 /ImageMemory 1048576 /LockDistillerParams true /MaxSubsetPct 100 /Optimize false /OPM 1 /ParseDSCComments true /ParseDSCCommentsForDocInfo true /PreserveCopyPage true /PreserveDICMYKValues true /PreserveEPSInfo true /PreserveFlatness true /PreserveHalftoneInfo false /PreserveOPIComments false /PreserveOverprintSettings true /StartPage 1 /SubsetFonts true /TransferFunctionInfo /Apply /UCRandBGInfo /Preserve /UsePrologue false /ColorSettingsFile (None) /AlwaysEmbed [ true /AndaleMono /Apple-Chancery /Arial-Black /Arial-BoldItalicMT /Arial-BoldMT /Arial-ItalicMT /ArialMT /CapitalsRegular /Charcoal /Chicago /ComicSansMS /ComicSansMS-Bold /Courier /Courier-Bold /CourierNewPS-BoldItalicMT /CourierNewPS-BoldMT /CourierNewPS-ItalicMT /CourierNewPSMT /GadgetRegular /Geneva /Georgia /Georgia-Bold /Georgia-BoldItalic /Georgia-Italic /Helvetica /Helvetica-Bold /HelveticaInserat-Roman /HoeflerText-Black /HoeflerText-BlackItalic /HoeflerText-Italic /HoeflerText-Ornaments /HoeflerText-Regular /Impact /Monaco /NewYork /Palatino-Bold /Palatino-BoldItalic /Palatino-Italic /Palatino-Roman /SandRegular /Skia-Regular /Symbol /TechnoRegular /TextileRegular /Times-Bold /Times-BoldItalic /Times-Italic /Times-Roman /TimesNewRomanPS-BoldItalicMT /TimesNewRomanPS-BoldMT /TimesNewRomanPS-ItalicMT /TimesNewRomanPSMT /Trebuchet-BoldItalic /TrebuchetMS /TrebuchetMS-Bold /TrebuchetMS-Italic /Verdana /Verdana-Bold /Verdana-BoldItalic /Verdana-Italic /Webdings ] /NeverEmbed [ true ] /AntiAliasColorImages false /CropColorImages true /ColorImageMinResolution 150 /ColorImageMinResolutionPolicy /OK /DownsampleColorImages true /ColorImageDownsampleType /Bicubic /ColorImageResolution 300 /ColorImageDepth -1 /ColorImageMinDownsampleDepth 1 /ColorImageDownsampleThreshold 1.10000 /EncodeColorImages true /ColorImageFilter /DCTEncode /AutoFilterColorImages true /ColorImageAutoFilterStrategy /JPEG /ColorACSImageDict << /QFactor 0.15 /HSamples [1 1 1 1] /VSamples [1 1 1 1] >> /ColorImageDict << /QFactor 0.15 /HSamples [1 1 1 1] /VSamples [1 1 1 1] >> /JPEG2000ColorACSImageDict << /TileWidth 256 /TileHeight 256 /Quality 30 >> /JPEG2000ColorImageDict << /TileWidth 256 /TileHeight 256 /Quality 30 >> /AntiAliasGrayImages false /CropGrayImages true /GrayImageMinResolution 150 /GrayImageMinResolutionPolicy /OK /DownsampleGrayImages true /GrayImageDownsampleType /Bicubic /GrayImageResolution 300 /GrayImageDepth -1 /GrayImageMinDownsampleDepth 2 /GrayImageDownsampleThreshold 1.10000 /EncodeGrayImages true /GrayImageFilter /DCTEncode /AutoFilterGrayImages true /GrayImageAutoFilterStrategy /JPEG /GrayACSImageDict << /QFactor 0.15 /HSamples [1 1 1 1] /VSamples [1 1 1 1] >> /GrayImageDict << /QFactor 0.15 /HSamples [1 1 1 1] /VSamples [1 1 1 1] >> /JPEG2000GrayACSImageDict << /TileWidth 256 /TileHeight 256 /Quality 30 >> /JPEG2000GrayImageDict << /TileWidth 256 /TileHeight 256 /Quality 30 >> /AntiAliasMonoImages false /CropMonoImages true /MonoImageMinResolution 1200 /MonoImageMinResolutionPolicy /OK /DownsampleMonoImages true /MonoImageDownsampleType /Bicubic /MonoImageResolution 1200 /MonoImageDepth -1 /MonoImageDownsampleThreshold 1.08250 /EncodeMonoImages true /MonoImageFilter /CCITTFaxEncode /MonoImageDict << /K -1 >> /AllowPSXObjects false /CheckCompliance [ /None ] /PDFX1aCheck false /PDFX3Check false /PDFXCompliantPDFOnly false /PDFXNoTrimBoxError true /PDFXTrimBoxToMediaBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXSetBleedBoxToMediaBox true /PDFXBleedBoxToTrimBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXOutputIntentProfile (None) /PDFXOutputConditionIdentifier () /PDFXOutputCondition () /PDFXRegistryName (http://www.color.org) /PDFXTrapped /Unknown /CreateJDFFile false /SyntheticBoldness 1.000000 /Description << /ENU (Use these settings to create PDF documents with higher image resolution for high quality pre-press printing. The PDF documents can be opened with Acrobat and Reader 5.0 and later. These settings require font embedding.) /JPN /FRA /DEU /PTB /DAN /NLD /ESP /SUO /NOR /SVE /KOR /CHS /CHT /ITA >> >> setdistillerparams << /HWResolution [2400 2400] /PageSize [595.000 842.000] >> setpagedevice