KEDS_Paper_Template Knowledge Engineering and Data Science (KEDS) pISSN 2597-4602 Vol 3, No 1, July 2020, pp. 28–39 eISSN 2597-4637 https://doi.org/10.17977/um018v3i12020p28-39 ©2020 Knowledge Engineering and Data Science | W : http://journal2.um.ac.id/index.php/keds | E : keds.journal@um.ac.id This is an open access article under the CC BY-SA license (https://creativecommons.org/licenses/by-sa/4.0/) Earthquake Magnitude and Grid-Based Location Prediction using Backpropagation Neural Network Bagus Priambodo a, 1, *, Wayan Firdaus Mahmudy a, 2 , Muh Arif Rahman a, 3 a Faculty of Computer Science, Brawijaya University Jl. Veteran no. 8, Malang 65145, Indonesia 1 baguspria@student.ub.ac.id *; 2 wayanfm@ub.ac.id; 3 m_arif@ub.ac.id * corresponding author I. Introduction One of the inevitable disasters is a natural disaster. It may come without prior notice and has been responsible for the massive scale of deaths [1]. Centre for Research on Epidemiology of Disasters (CRED) reported an average of 77.144 deaths per year caused by natural disasters since 2000 to 2017 [2]. The natural disaster caused by seismic activities (earthquakes, tsunamis, and volcanic activities) disrupted 3.4 million lives in 2018 [2]. Earthquakes have caused the most deaths every year compared to other types of natural disasters, such as drought, flood, landslide, wildfire, and many more—with a toll of 46.173 lives [2]. Even though it is inevitable, but it can be anticipated to minimize damage and casualties. Past research has been conducted to predict the level of impact caused by earthquakes in real-time [2]. One of those past research is about an early earthquake warning system (EEWS), which will give an alert when it detects an earthquake [3]. Numerous architectures and algorithms have been developed in those studies. Various research is the utilization of neural network trained with backpropagation algorithm and optimized using Levenberg (LOM) to predict hypocenter location, moment magnitude, and the expansion of the earthquake [4], modification of LOM to minimize error on EEWS [3], and utilization of neural tree to predict P and S waves [5]. Until now, to the knowledge of the author, the study of using a backpropagation (BP) algorithm to predict earthquake magnitude and grid-based location in Indonesia has not been conducted yet. This algorithm is chosen because it has been proven to perform well in broad types of problems, such as regression, pattern recognition, and prediction [6][7][8][9]. In this paper, the study aims to measure the performance of neural network trained using backpropagation algorithm in predicting earthquakes magnitude and grid-based location based on earthquakes magnitude and location data in Indonesia recorded from 2000 to 2019. ARTICLE INFO A B S T R A C T Article history: Received 30 June 2020 Revised 2 July 2020 Accepted 15 July 2020 Published online 17 August 2020 Earthquakes, a type of inevitable natural disaster, is responsible for the highest average death toll per year compared to other types of a natural disaster. Even though it is inevitable, but it can be anticipated to minimize damage and casualties, such as predicting the earthquake‘s magnitude using a neural network. In this study, a backpropagation algorithm is used to train the multilayer neural network to weekly predict the average magnitude of earthquakes in grid-based locations in Indonesia. Based on the findings in this research, the neural network is able to predict the magnitude of earthquakes in grid-based locations across Indonesia with a minimum error rate of 0.094 in 34.475 seconds. This best result is achieved when the neural network is trained for 210 epochs, with 16 neurons used in the input and output layer, one hidden layer consisted of 5 neurons and a learning rate of 0.1. This result showed backpropagation has pretty good generalization capability in order to map the relations between variables when mathematical function is not explicitly available. This is an open access article under the CC BY-SA license (https://creativecommons.org/licenses/by-sa/4.0/). Keywords: Neural network Resilient backpropagation Prediction Magnitude Earthquake mailto:baguspria@student.ub.ac.id mailto:wayanfm@ub.ac.id mailto:m_arif@ub.ac.id https://creativecommons.org/licenses/by-sa/4.0/ B. Priambodo et al. / Knowledge Engineering and Data Science 2020, 3 (1): 28–39 29 II. Methods Numerous studies on the application of neural networks to predict incoming natural disasters have been conducted before. One of them is to predict seismic activities (magnitude and other seismic activities on a large scale—more than 5 Ritcher scale) [6]. The findings of that research show acceptable performance (in terms of accuracy) of a 3-layer perceptron neural network model trained using backpropagation, which achieved an accuracy of 80.55%. The other research done is tsunami forecasting [10] using multi-layered perceptron neural networks and backpropagation algorithm. The study shows that the backpropagation algorithm is a lot faster than many other conventional models, and produced high accuracy in terms of predicting height and travel time of tsunami based on earthquake location and size. Often neural networks are hybridized with other techniques or algorithms. One example is the study done in 2006 [5], which used a neural tree to pick up P and S waves faster, and more accurately: with precision score achieved is 0.96. Numerous studies showed the performance of neural networks in predicting earthquakes. Neural networks have been applied to an early earthquake warning system (EEWS), which is trained using backpropagation with modified Levenberg-Marquardt algorithm to minimize the error rate in the EEWS [3]. The error rate that was tried to be minimized was the error on seismic data amplitude prediction based on the Chi-chi earthquake in Taiwan in 1999. Other studies showed increased reliability dan responsiveness of EEWS when a neural network is applied [11]. A theoretical study by [4] showed a prediction of hypocenter location, moment magnitude, and rupture size of an earthquake is generated almost instantaneously as soon as the P wave is picked up. A. Neural Network and Backpropagation A neural network is a network that consists of connected neurons (to process inputs) whose job is to produce an activation value [12]. This network mimics the structure and the ways of how the human brain works, hoping it can be an artificial intelligence that is almost as intelligent as human— by receiving, adapting, and transferring known and new knowledge and skills, as a lifelong learning action [13]. A neural network consists of an input layer, zero or more hidden layer, and an output layer. Information on neurons will be propagated through each layer, starting from the input layer, all the way to the output layer [14][15]. The propagation is done by calculating the weighted sum on each neuron, which then used as the input value for the activation function used. The result of the activation function is then propagated to neurons in the next layer as inputs [11]. A common example of activation function is sigmoid function (1) and hyperbolic tangent (tanh) function (2) [16]. ( ) (1) ( ) (2) The backpropagation helped a neural network to learn the relationship between variables, without explicitly defining the mathematical function that defines that relationship [17]. A neural network trained using backpropagation is based on gradient descent [14]. In the backpropagation algorithm, each neuron in the hidden and output layer will process its own inputs using the sum product of each neuron‘s input value and weight of each neuron, respectively, which then processed through an activation function (most popular used is sigmoid activation function) [17]. Then the error will be calculated backward from the output layer to the input layer for the weight update process—this is called: backpropagation. This weight update process usually used gradient descent to direct the weight changes towards the minimum error [18]. During the weight update process, a parameter called the learning rate (η), will determine the ―step width‖ of the updated value, which will finally update the weight in order to upgrade the neural network‘s performance in terms of its generalization capability. In the backpropagation process, the derivative of activation function and error function is needed to calculate the weight change. In (3) and (4) each is the derivative of sigmoid and tanh function, respectively. ( ) ( )( ( )) (3) ( ) ( ) (4) The algorithm starts with the initialization phase. In this phase, all of the weights between neurons will be randomly initialized (between 0 and 1). The learning rate will also be initialized, 30 B. Priambodo et al. / Knowledge Engineering and Data Science 2020, 3 (1): 28–39 which is usually set to be 0.1. After the initialization phase, the first data will be used as input to the neural network, feedforward is then performed, which is calculating the weighted sum as input to activation function on the next layer. This process is repeated until the output layer has produced the predicted value. After the output layer has produced an output, the neural network will then begin its backpropagation phase. During the backpropagation phase, the partial derivative of the error function with respect to each weight will be calculated, as the weights are directly contributed to the error. From the output layer point of view, the weight from the hidden layer to the output layer contributed to input for activation function in the output layer, resulting in a contribution to the error. From the hidden layer point of view, the weight from the input layer to the hidden layer contributed to input for activation function in the hidden layer, finally contributes to the error. Therefore, (5) will be used to calculate the gradient/partial derivative of the error with respect to weights between the hidden and output layer, and (6) will be used for the calculation of partial derivative with respect to weights between input and hidden layer. (5) ( ) (6) ( ) ∑ ( ) ∑ ( ) In (5), is the partial derivative of error concerning weights between the hidden and output layer. It consists of which is the partial derivative of input to the output layer (the weighted sum value) for weight, and is the partial derivative of the activation function in the output layer to input on the output layer, while is the partial derivative of error function to the output. When each component is broken down, the final result of the formula is as in (5), the product of the output of the hidden layer ( ), a derivative of activation function in the output layer (for example the derivative of sigmoid; ( )) and derivative of the error function (10). Equation (6) is similar to (5), but it showed the partial derivative for weights between input and hidden layer. When the formula is broken down, the components are input value (input from training data; ), the partial derivative of activation function in the hidden layer ( ( )), and the partial derivative of error concerning output of the hidden layer will need further calculation steps. For partial derivative of the error to the output of the hidden layer, it is the sum of all (written as ― ‖ which represents the number of output neurons) partial derivative of the error with respect to particular output neurons from neurons in the hidden layer. Thus, this component is actually the product of weights between the hidden and output layer with from (5). The partial derivative for biases is calculated likewise (as in (5) and (6)), but in (5) and in (6) is changed into 1, because it is the derivative with respect to bias. { The derivative of the error function used in this study (MAE) is shown in (7). This function can only be differentiated only when the difference in predicted, and the target value is not 0. In the context of (5), only one prediction and target value each used because in (5), it is only differentiated with respect to one output neuron only. Later for programming purposes, to eliminate the possibility of the unhandled case, the derivative will be 1 when the prediction is bigger or equal to the target value. After partial derivative for each weight (biases included), the weight will then be updated. The weight update will follow the equation written in (8). The new weight for the next iteration ( ( B. Priambodo et al. / Knowledge Engineering and Data Science 2020, 3 (1): 28–39 31 )) will be the current weight ( ) reduced by the product of learning rate (η) and the partial derivative of the error ( ). This feedforward, backpropagation, and weight update process will be repeated until the maximum epoch has been reached, where one epoch is counted when all of the training data has been fed to the neural network. ( ) ( ) (8) B. Feature Extraction of Earthquakes Earthquake is a geological phenomenon that happened because of the shifting of earth‘s plates, caused by excessive pressure which the earth‘s crust cannot handle [19]. This excessive pressure results in energy release in the form of waves that propagate through the earth‘s crust, causing shockwave people can felt [20]. Those waves are picked up by seismographs. There are two types of waves, P wave (the fastest wave), which will be received first by the seismograph, then followed by a stronger wave, S wave, but slower than the P wave [21]. In seismograph, these two waves will produce data in 3 components: vertical, north-south, and east-west motion. [22]. Based on data of P wave, S wave, epicenter, magnitude, and peak ground acceleration (PGA), an early warning to civilians can be done, and emergency action can be conducted earlier; this can be done with a tool: early earthquake warning system (EEWS). In this study, the magnitude and location of earthquakes that happened in Indonesia, starting from January 1 2000 until December 31 2019, will be used as an input feature. It was chosen based on a previous study done by [23]. In that study, [23] used four features: earthquake number, location (represented in numbers corresponding to grids), magnitude, and hypocenter depth. Date and time are not used in that study because preliminary statistical analysis on the data used showed date and time is not representative enough as a feature. In fact, it had too much unrelated/unneeded information. Detailed location data—coordinates—is simplified into grids representing an area that has been divided into 16 areas. Thus, the location feature consists of an integer number, ranging from 0 to 15. Based on this, the current study also divides the location data used in this study into 16 grids, represented using an integer from 0 to 15, as seen in Figure 1. The area is divided into 16 rectangles with every rectangle has the almost equal area. There will be 2 rows and 8 columns used, where starting from latitude of -10.909° to 5.907° will be divided into 2 sectors, and from longitude of 95.206° to 140.976° will be divided into 8 sectors. In another study, the magnitude of past earthquakes is used to predict earthquake magnitude for the following day [6]. In that study, location is not used as a feature because it was assumed already known beforehand. Therefore, magnitude and location (represented in grid numbers) of an earthquake will be used as an input feature in this study. Magnitude will be normalized using minmax normalization to match the characteristics of sigmoid and tanh function, resulting in the more appropriate value. The result can then be denormalized to get the actual predicted magnitude value by RB algorithm. ( ) ( ) ( ) (9) Fig. 1. Grid-numbering as location feature 32 B. Priambodo et al. / Knowledge Engineering and Data Science 2020, 3 (1): 28–39 In (9), the minmax normalization equation is shown. Variable x is the magnitude value that will be normalized, min(x) is the minimum magnitude over all the data, and max(x) is the maximum magnitude over all of the data. Based on (9), the denormalization equation can be inferred as in (10). * ( ) ( )+ ( ) (10) C. Evaluation Metrics There are many methods that can be used to evaluate the performance of a neural network. In this study, Mean Absolute Error (MAE) will be used as evaluation metrics. This metric calculates the distance of predicted value and the target value. This metric is used based on a study done in [6] and [23]. The equation for MAE can be seen in (11). ∑ | | (11) In (13), n represents the feature count, represents target value of ith feature, and represents the predicted value of ith feature. D. Proposed Method The data is queried from United States Geological Survey (USGS) website and obtained in a comma-separated value (CSV) file. The data collected is earthquake data that happened in Indonesia, specifically inside these boundaries: latitude of -10.909° to 5.907°, and longitude of 95.206° to 140.976°. Earthquake events from January 1 2000 up to January 1 2019 will be used as training data (36,453 records) and earthquake events from January 2 2019 up to December 31 2019 will be used as testing data (2,358 records). The data obtained (total of 38,811 records) is presented as seismic data with 22 columns/attributes, and 4 of them will be taken as features in this study. Those 4 attributes are detailed in Table 1. After the data is obtained, there is two phases of preprocessing that will be done. First, ‗id‘ attribute will be added as feature to index each data. Then, the data on ‗time‘ column will be changed to ‗date‘, because only the dates will be taken as the feature. The ‗latitude‘ and ‗longitude‘ feature will then be mapped into grid numbers (as explained before) and represented as ‗grid‘ feature. The last feature is taken as it is, which is the ‗mag‘ that represents the magnitude of the event. A snippet of first 5 data after this first phase of preprocessing can be seen in Table 2. After the preliminary phase of preprocessing has been done, the final phase will be done to create a dataset that is ready to be used by the neural network. In this phase, each event will be grouped into weekly-period data, and for each week, the average magnitude will be calculated for each grid. This will be ‗avg_mag‘ feature. Then, this will be normalized using minmax normalization. The final feature added will be the ‗target‘ feature, which is the ‗avg_mag‘ on the following week. Table 1. Data attributes details No Name Description Data Type Value 1 time Date and time in milliseconds & UTC when the even occured. Long integer [2000-01-02T12:46:58.770Z, 2019-12-31T09:50:41.876Z] 2 latitude Decimal degrees latitude. Decimal [-90,0, 90,0] 3 longitude Decimal degress longitude. Decimal [-180,0, 180,0] 4 mag Magnitude of the event. Decimal [-1,0, 10,0] Table 2. Preliminary phase of preprocessing on the first 5 data Id Date Grid Mag 1 1/2/2000 6 4.9 2 1/2/2000 6 4.4 3 1/3/2000 6 4.7 4 1/4/2000 14 4.4 5 1/4/2000 5 3.9 B. Priambodo et al. / Knowledge Engineering and Data Science 2020, 3 (1): 28–39 33 Snippet on first week data that has been preprocessed (earthquake events from January 2 2000 to January 8 2000) can be seen in Table 3. The neural network built will consist of 16 neurons in the input and output layer and one hidden layer. The reason only one hidden layer is used is based on numerous study that has shown the good result of approximation on any continuous function [17]. Each neuron in the input and output layer represents the average magnitude in each grid. The input layer will receive data of the average magnitude of earthquake events in a week, and the output layer will predict the average magnitude for each grid in the following week. Each neuron in the hidden and output layer will have a bias with uniform-value: 1. III. Results and Discussions In order to achieve the best result, there will be four components of the neural network configuration that will be tested. The first one is to find the best maximum epochs allowed, then the learning rate, the number of neurons needed in the hidden layer, and lastly, which activation function will be used: sigmoid or tanh function. In the maximum Epochs Testing testing, the maximum epochs allowed will be tested from 50, then increased by 10 until no significant error rate change occurred. For current testing, the neural network will use 5 neurons in the hidden layer, trained using the sigmoid function, a learning rate of 0.1, and all of the weights (except the biases) will be initialized with uniform value: 0.5. For each number of maximum epochs tested, it will be tested ten times, and the average will be taken. The average error rate and training duration will be measured to determine how many epochs that will produce the best result. Figure 2 showed the lowest average error rate achieved is 0.093 when the neural network is trained for 210 epochs, while the highest average error rate achieved is 0.094 when it is trained for 160 epochs. Figure 3 showed when a neural network is trained for 50 epochs, it only took 12.93 seconds, but it took 134.49 seconds to complete training for 500 epochs. Another parameter that may contribute to the performance of the neural network‘s generalization capability is the learning rate. If it is too big, it may miss the optimal accuracy. If it is too small, the learning process would be too slow. In the learning rate testing, the learning rate will be tested from 0.1, then increased by 0.1 until no significant error rate change occurred. For current testing, the neural network will be trained using sigmoid function in 210 epochs (the best result achieved from the last testing) using 5 neurons in the hidden layer, and all of the weights (except the biases) will be initialized with uniform value: 0.5. For each learning rate tested, it will be tested ten times, and the average will be taken. Figure 4 showed the lowest average error rate achieved is 0.093 when the Table 3. Final phase of preprocessing on the first week data Grid Avg_mag Target 1 0.699 0.000 2 0.000 0.000 3 0.000 0.000 4 0.000 0.000 5 0.582 0.000 6 0.627 0.594 7 0.658 0.603 8 0.000 0.637 9 0.000 0.000 10 0.767 0.616 11 0.676 0.621 12 0.651 0.000 13 0.000 0.000 14 0.589 0.548 15 0.562 0.644 16 0.616 0.000 34 B. Priambodo et al. / Knowledge Engineering and Data Science 2020, 3 (1): 28–39 neural network is trained using 0.1 as the learning rate, while the highest average accuracy achieved is 0.103 when it is trained using 1.0 as the learning rate. The average training duration is not tested in the current testing phase, as the learning rate does not affect directly to the training duration in terms of computation time. The next test measures the number of neurons needed in the hidden layer will be tested from 5 neurons, then increased by 2 until no significant error rate change occurred. For current testing, the neural network will be trained using a sigmoid function in 210 epochs (the best result achieved from the last testing) with a learning rate of 0.1, and all of the weights (except the biases) will be initialized with uniform value: 0.5. For each number of neurons tested, it will be tested ten times, and the average will be taken. The average error rate and duration will be measured to determine how many neurons in the hidden layer that will produce the best result. Figure 5 showed the lowest average error rate achieved is 0.093 when the neural network is trained using 5 neurons in the hidden layer, while the highest average error rate achieved is 0.095 when it is trained using 23 neurons in the hidden layer. The average duration for training, as shown in Figure 6 is similar to the result in the last testing: as the number of neurons increased, the training duration also increased. The best training time achieved is 57.23 seconds when trained using 5 neurons only, and the worst result is 75.06 when trained using 23 neurons. Fig. 2.The average testing error rate for each maximum epochs value Fig. 3. Average training duration for each maximum epochs value B. Priambodo et al. / Knowledge Engineering and Data Science 2020, 3 (1): 28–39 35 Fig. 4. The average testing error rate for each learning rate value Fig. 5. Average testing accuracy for each neuron count in the hidden layer Fig. 6. Average training duration for each neuron count in the hidden layer 36 B. Priambodo et al. / Knowledge Engineering and Data Science 2020, 3 (1): 28–39 The Activation Function Testing used a sigmoid function or tanh function. For current testing, the neural network will be trained in 210 epochs, the learning rate of 0.1, using 5 neurons in the hidden layer (the best result achieved from the last testing), but all of the weights (except the biases) will be initialized with a random value. For each number of maximum epochs tested, it will be tested ten times, and the average will be taken. The sigmoid function produced an average error rate of 0.094, which is better than the tanh function, which produced an average error rate of 0.881. The sigmoid function is also better than tanh function in terms of training duration, which only took 60.81 seconds compared to 60.91 seconds when trained using tanh function. Thus, the sigmoid function will be used as the activation function. Based on the testing results, when the neural network is trained with sigmoid as the activation function in 210 epochs using 5 neurons in the hidden layer and 0.1 as learning rate, it achieves its best performance with an error rate of 0.094 in 60.81 seconds. As shown in Figure 2, there is a tendency in the beginning that as the number of maximum epochs increased, the average accuracy also increased. This is also shown in Figure 7. Figure 7 shows the error rate of the neural network on the training data for each epoch. It is shown that as it learns the data over and over at each epoch, it became better in terms of error rate. This proves that the backpropagation algorithm is able to adapt to the nature of data more accurately as it received more training [24]. This is where the weight adaptation process in backpropagation takes part. Through the weight adaptation process, it finally produces weights that are well-suited to the training data and reached better generalization capability when tested against new data in the testing phase. As the learning rate increases, it is shown that the performance of the neural network is getting worse. This may happen because of the step width is too large that the local minimum may be missed. It is also shown that as the number of neurons in the hidden layer increases, the performance is not getting better. This may be the result of overfitting, a condition where the neural network generalization capability became weak, and only achieve a good result when tested to the training data only. Therefore, when the neural network is tested using the testing data, the error rate is higher. The duration needed for the neural network to predict earthquake event magnitude (for events in 2019) is the total duration from training phase until testing phase; the prediction duration. The average prediction duration is 0.005 seconds. Thus, overall duration needed is 60.81 seconds (average training duration) plus 0.005 seconds (prediction duration): 60.815 seconds. This is the total duration needed to predict 51 rows of data earthquake event magnitude based on training result on 988 rows of data, with 16 features each. As shown in Table 4, it is a snippet of prediction and target comparison for the first two weeks. In prediction result, there are 12.99% (106 events) of the prediction that missed the target by 1 up to 5.2, and the others (about 87.01%—710 events) missed the target less than 1, with minimum error recorded is 2.56 × 10 -4 . Fig. 7. Error rate during training on each epoch B. Priambodo et al. / Knowledge Engineering and Data Science 2020, 3 (1): 28–39 37 IV. Conclusion There are two key findings in this study. First, to build input features based on magnitude and location of earthquake event, detailed information of location (such as latitudes and longitudes) needed to be mapped into grids, then the magnitude will be averaged weekly for each grid number. The average magnitude weekly and based on each grid will be the input features. Secondly, the lowest error rate achieved by backpropagation algorithm (when trained in 210 epochs using sigmoid activation function, 5 neurons in the hidden layer and 0.1 as learning rate) to predict the magnitude of earthquake event in the following week is 0.094, with 60.815 seconds needed for the neural network to learn from 988 rows of data and predict 51 rows of data. Based on those key findings, it is recommended to further studying the importance and impact of other input features, configuration of the neural network, and hybridizing with other algorithms, which can further maximize the performance in predicting earthquakes more accurately and in the small-time window to increase preparedness. Table 4. Comparison of prediction and target produced by resilient backpropagation algorithm Date Grid Prediction Target 1/9/2019 1 4.524 4.767 1/9/2019 2 0.001 0.000 1/9/2019 3 0.001 0.000 1/9/2019 4 0.001 0.000 1/9/2019 5 4.398 4.200 1/9/2019 6 4.571 4.467 1/9/2019 7 0.004 4.400 1/9/2019 8 0.005 0.000 1/9/2019 9 0.002 0.000 1/9/2019 10 4.510 5.000 1/9/2019 11 4.285 4.300 1/9/2019 12 4.317 0.000 1/9/2019 13 4.482 4.540 1/9/2019 14 4.350 4.488 1/9/2019 15 4.308 4.467 1/9/2019 16 4.515 0.000 1/16/2019 1 4.519 4.300 1/16/2019 2 0.001 4.300 1/16/2019 3 0.001 0.000 1/16/2019 4 0.001 0.000 1/16/2019 5 4.386 4.433 1/16/2019 6 4.568 4.688 1/16/2019 7 0.003 0.000 1/16/2019 8 0.004 0.000 1/16/2019 9 0.001 0.000 1/16/2019 10 4.542 4.350 1/16/2019 11 4.302 4.300 1/16/2019 12 4.410 4.200 1/16/2019 13 4.499 4.975 1/16/2019 14 4.364 4.443 1/16/2019 15 4.316 4.450 1/16/2019 16 4.475 0.000 38 B. Priambodo et al. / Knowledge Engineering and Data Science 2020, 3 (1): 28–39 Declarations Author contribution All authors contributed equally as the main contributor of this paper. All authors read and approved the final paper. Funding statement This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. Conflict of interest The authors declare no conflict of interest. Additional information No additional information is available for this paper. References [1] D. Guha-Sapir and F. Vos, ―Earthquakes, an Epidemiological Perspective on Patterns and Trends,‖ Advances in Natural and Technological Hazards Research, pp. 13–24, Dec. 2010. [2] Centre for Research on Epidemiology of Disasters (CRED), ―Press release: EMBARGO 11.00 CET, JANUARY 24 24,‖ 2019. [3] J. W. Lin, C. T. Chao, and J. S. Chiou, ―Backpropagation neural network as earthquake early warning tool using a new modified elementary Levenberg-Marquardt Algorithm to minimise backpropagation errors,‖ Geosci. Instrumentation, Methods Data Syst., vol. 7, no. 3, pp. 235–243, 2018, doi: 10.5194/gi-7-235-2018. [4] M. Böse, F. Wenzel, and M. Erdik, ―PreSEIS: A neural network-based approach to earthquake early warning for finite faults,‖ Bull. Seismol. Soc. Am., vol. 98, no. 1, pp. 366–382, 2008, doi: 10.1785/0120070002. [5] S. Gentili and A. Michelini, ―Automatic picking of P and S phases using a neural tree,‖ J. Seismol., vol. 10, no. 1, pp. 39–63, 2006, doi: 10.1007/s10950-006-2296-6. [6] M. Moustra, M. Avraamides, and C. Christodoulou, ―Artificial neural networks for earthquake prediction using time series magnitude data or Seismic Electric Signals,‖ Expert Syst. Appl., vol. 38, no. 12, pp. 15032–15039, Nov. 2011, doi: 10.1016/j.eswa.2011.05.043. [7] N. R. Sari, W. F. Mahmudy, and A. P. Wibawa, ―Backpropagation on neural network method for inflation rate forecasting in Indonesia,‖ Int. J. Adv. Soft Comput. its Appl., vol. 8, no. 3, 2016. [8] F. A. Huda, W. F. Mahmudy, and H. Tolle, ―Android malware detection using backpropagation neural network,‖ Indones. J. Electr. Eng. Comput. Sci., vol. 4, no. 1, 2016, doi: 10.11591/ijeecs.v4.i1.pp240-244. [9] H. Aini and H. Haviluddin, ―Crude Palm Oil Prediction Based on Backpropagation Neural Network Approach,‖ Knowl. Eng. Data Sci., vol. 2, no. 1, pp. 1–9, 2019. [10] M. Romano et al., ―Artificial neural network for tsunami forecasting,‖ J. Asian Earth Sci., vol. 36, no. 1, pp. 29–37, 2009, doi: 10.1016/j.jseaes.2008.11.003. [11] C. J. Lin, Z. Shen, and S. Huang, ―Predicting Structural Response with On-Site Earthquake Early Warning System Using Neural Networks,‖ Weather, no. 226, 2011. [12] J. Schmidhuber, ―Deep Learning in neural networks: An overview,‖ Neural Networks, vol. 61, pp. 85–117, 2015, doi: 10.1016/j.neunet.2014.09.003. [13] G. I. Parisi, R. Kemker, J. L. Part, C. Kanan, and S. Wermter, ―Continual lifelong learning with neural networks: A review,‖ Neural Networks, vol. 113, pp. 54–71, May 2019, doi: 10.1016/j.neunet.2019.01.012. [14] G. T. Hicham, E. A. Chaker, and E. Lotfi, ―Comparative study of neural networks algorithms for cloud computing CPU scheduling,‖ Int. J. Electr. Comput. Eng., vol. 7, no. 6, pp. 3570–3577, 2017, doi: 10.11591/ijece.v7i6.pp3570- 3577. [15] C. Dewi, S. Sundari, and M. Mardji, ―Texture Feature On Determining Quantity of Soil Organic Matter For Patchouli Plant Using Backpropagation Neural Network,‖ J. Inf. Technol. Comput. Sci., vol. 4, no. 1, pp. 1–14, 2019. [16] K. Chandrasekaran and S. P. Simon, ―Binary/real coded particle swarm optimization for unit commitment problem,‖ in International Conference on Power, Signals, Controls and Computation, Jan. 2012, no. 3, pp. 1–6, doi: 10.1109/EPSCICON.2012.6175240. [17] A. T. C. Goh, ―Back-propagation neural networks for modeling complex systems,‖ Artif. Intell. Eng., vol. 9, no. 3, pp. 143–151, Jan. 1995, doi: 10.1016/0954-1810(94)00011-S. [18] M. Riedmiller and H. Braun, ―A direct adaptive method for faster backpropagation learning: the RPROP algorithm,‖ in IEEE International Conference on Neural Networks, 1993, pp. 586–591, doi: 10.1109/ICNN.1993.298623. [19] K. Mogi, ―Earthquake Prediction in Japan,‖ J. Phys. Earth, vol. 43, no. 5, pp. 533–561, 1995. [20] U.S. Geological Survey, ―What is an earthquake and what causes them to happen?,‖ U.S. Department of the Interior, 2019. [21] Incorporated Research Institutions for Seismology (IRIS), ―Seismic Wave Behavior — Effect on Buildings‖ . [22] Incorporated Research Institutions for Seismology (IRIS), ―3-Component Seismograph,‖ 2017. [23] A. S. N. Alarifi, N. S. N. Alarifi, and S. Al-Humidan, ―Earthquakes magnitude predication using artificial neural https://doi.org/10.1007/978-90-481-9455-1_2 https://doi.org/10.1007/978-90-481-9455-1_2 https://cred.be/sites/default/files/PressReleaseReview2018.pdf https://cred.be/sites/default/files/PressReleaseReview2018.pdf https://doi.org/10.5194/gi-7-235-2018 https://doi.org/10.5194/gi-7-235-2018 https://doi.org/10.5194/gi-7-235-2018 https://doi.org/10.1785/0120070002 https://doi.org/10.1785/0120070002 https://doi.org/10.1007/s10950-006-2296-6 https://doi.org/10.1007/s10950-006-2296-6 https://doi.org/10.1016/j.eswa.2011.05.043 https://doi.org/10.1016/j.eswa.2011.05.043 https://doi.org/10.1016/j.eswa.2011.05.043 http://home.ijasca.com/data/documents/ID-28_Pg70-87_Backpropagation-on-Neural-Network-Method-For-Inflation-Rate-Forecasting-in-Indonesia_4.pdf http://home.ijasca.com/data/documents/ID-28_Pg70-87_Backpropagation-on-Neural-Network-Method-For-Inflation-Rate-Forecasting-in-Indonesia_4.pdf https://doi.org/10.11591/ijeecs.v4.i1.pp240-244 https://doi.org/10.11591/ijeecs.v4.i1.pp240-244 https://doi.org/10.17977/um018v2i12019p1-9 https://doi.org/10.17977/um018v2i12019p1-9 https://doi.org/10.1016/j.jseaes.2008.11.003 https://doi.org/10.1016/j.jseaes.2008.11.003 https://www.semanticscholar.org/paper/Predicting-Structural-Response-with-On-Site-Early-Lin-Shen/524a3bf6d696b5d95bc3cdd25d6848b00c847df2 https://www.semanticscholar.org/paper/Predicting-Structural-Response-with-On-Site-Early-Lin-Shen/524a3bf6d696b5d95bc3cdd25d6848b00c847df2 https://doi.org/10.1016/j.neunet.2014.09.003 https://doi.org/10.1016/j.neunet.2014.09.003 https://doi.org/10.1016/j.neunet.2019.01.012 https://doi.org/10.1016/j.neunet.2019.01.012 https://doi.org/10.11591/ijece.v7i6.pp3570-3577 https://doi.org/10.11591/ijece.v7i6.pp3570-3577 https://doi.org/10.11591/ijece.v7i6.pp3570-3577 https://doi.org/10.25126/jitecs.20194168 https://doi.org/10.25126/jitecs.20194168 https://doi.org/10.1109/epscicon.2012.6175240 https://doi.org/10.1109/epscicon.2012.6175240 https://doi.org/10.1109/epscicon.2012.6175240 https://doi.org/10.1016/0954-1810(94)00011-s https://doi.org/10.1016/0954-1810(94)00011-s https://doi.org/10.1109/icnn.1993.298623 https://doi.org/10.1109/icnn.1993.298623 https://doi.org/10.4294/jpe1952.43.533 https://www.usgs.gov/faqs/what-earthquake-and-what-causes-them-happen?qt-news_science_products=0#qt-news_science_products https://www.usgs.gov/faqs/what-earthquake-and-what-causes-them-happen?qt-news_science_products=0#qt-news_science_products https://www.iris.edu/hq/files/programs/education_and_outreach/aotm/6/SeismicWaveBehavior_Building.pdf https://www.iris.edu/hq/files/programs/education_and_outreach/aotm/9/3-ComponentSeismograph.pdf https://doi.org/10.1016/j.jksus.2011.05.002 B. Priambodo et al. / Knowledge Engineering and Data Science 2020, 3 (1): 28–39 39 network in northern Red Sea area,‖ J. King Saud Univ. - Sci., vol. 24, no. 4, pp. 301–313, Oct. 2012, doi: 10.1016/j.jksus.2011.05.002. [24] I. Wahyuni, N. R. Adam, W. F. Mahmudy, and A. Iriany, ―Modeling backpropagation neural network for rainfall prediction in tengger east Java,‖ in Proceedings - 2017 International Conference on Sustainable Information Engineering and Technology, SIET 2017, 2018, vol. 2018-Janua, doi: 10.1109/SIET.2017.8304130. https://doi.org/10.1016/j.jksus.2011.05.002 https://doi.org/10.1016/j.jksus.2011.05.002 https://doi.org/10.1109/siet.2017.8304130 https://doi.org/10.1109/siet.2017.8304130 https://doi.org/10.1109/siet.2017.8304130 I. Introduction II. Methods A. Neural Network and Backpropagation B. Feature Extraction of Earthquakes C. Evaluation Metrics D. Proposed Method III. Results and Discussions IV. Conclusion Declarations Author contribution Funding statement Conflict of interest Additional information References [1] D. Guha-Sapir and F. Vos, “Earthquakes, an Epidemiological Perspective on Patterns and Trends,” Advances in Natural and Technological Hazards Research, pp. 13–24, Dec. 2010. [2] Centre for Research on Epidemiology of Disasters (CRED), “Press release: EMBARGO 11.00 CET, JANUARY 24 24,” 2019. [3] J. W. Lin, C. T. Chao, and J. S. Chiou, “Backpropagation neural network as earthquake early warning tool using a new modified elementary Levenberg-Marquardt Algorithm to minimise backpropagation errors,” Geosci. Instrumentation, Methods Data Syst., vol [4] M. Böse, F. Wenzel, and M. Erdik, “PreSEIS: A neural network-based approach to earthquake early warning for finite faults,” Bull. Seismol. Soc. Am., vol. 98, no. 1, pp. 366–382, 2008, doi: 10.1785/0120070002. [5] S. Gentili and A. Michelini, “Automatic picking of P and S phases using a neural tree,” J. Seismol., vol. 10, no. 1, pp. 39–63, 2006, doi: 10.1007/s10950-006-2296-6. [6] M. Moustra, M. Avraamides, and C. Christodoulou, “Artificial neural networks for earthquake prediction using time series magnitude data or Seismic Electric Signals,” Expert Syst. Appl., vol. 38, no. 12, pp. 15032–15039, Nov. 2011, doi: 10.1016/j.eswa.2 [7] N. R. Sari, W. F. Mahmudy, and A. P. Wibawa, “Backpropagation on neural network method for inflation rate forecasting in Indonesia,” Int. J. Adv. Soft Comput. its Appl., vol. 8, no. 3, 2016. [8] F. A. Huda, W. F. Mahmudy, and H. Tolle, “Android malware detection using backpropagation neural network,” Indones. J. Electr. Eng. Comput. Sci., vol. 4, no. 1, 2016, doi: 10.11591/ijeecs.v4.i1.pp240-244. [9] H. Aini and H. Haviluddin, “Crude Palm Oil Prediction Based on Backpropagation Neural Network Approach,” Knowl. Eng. Data Sci., vol. 2, no. 1, pp. 1–9, 2019. [10] M. Romano et al., “Artificial neural network for tsunami forecasting,” J. Asian Earth Sci., vol. 36, no. 1, pp. 29–37, 2009, doi: 10.1016/j.jseaes.2008.11.003. [11] C. J. Lin, Z. Shen, and S. Huang, “Predicting Structural Response with On-Site Earthquake Early Warning System Using Neural Networks,” Weather, no. 226, 2011. [12] J. Schmidhuber, “Deep Learning in neural networks: An overview,” Neural Networks, vol. 61, pp. 85–117, 2015, doi: 10.1016/j.neunet.2014.09.003. [13] G. I. Parisi, R. Kemker, J. L. Part, C. Kanan, and S. Wermter, “Continual lifelong learning with neural networks: A review,” Neural Networks, vol. 113, pp. 54–71, May 2019, doi: 10.1016/j.neunet.2019.01.012. [14] G. T. Hicham, E. A. Chaker, and E. Lotfi, “Comparative study of neural networks algorithms for cloud computing CPU scheduling,” Int. J. Electr. Comput. Eng., vol. 7, no. 6, pp. 3570–3577, 2017, doi: 10.11591/ijece.v7i6.pp3570-3577. [15] C. Dewi, S. Sundari, and M. Mardji, “Texture Feature On Determining Quantity of Soil Organic Matter For Patchouli Plant Using Backpropagation Neural Network,” J. Inf. Technol. Comput. Sci., vol. 4, no. 1, pp. 1–14, 2019. [16] K. Chandrasekaran and S. P. Simon, “Binary/real coded particle swarm optimization for unit commitment problem,” in International Conference on Power, Signals, Controls and Computation, Jan. 2012, no. 3, pp. 1–6, doi: 10.1109/EPSCICON.2012.6175240. [17] A. T. C. Goh, “Back-propagation neural networks for modeling complex systems,” Artif. Intell. Eng., vol. 9, no. 3, pp. 143–151, Jan. 1995, doi: 10.1016/0954-1810(94)00011-S. [18] M. Riedmiller and H. Braun, “A direct adaptive method for faster backpropagation learning: the RPROP algorithm,” in IEEE International Conference on Neural Networks, 1993, pp. 586–591, doi: 10.1109/ICNN.1993.298623. [19] K. Mogi, “Earthquake Prediction in Japan,” J. Phys. Earth, vol. 43, no. 5, pp. 533–561, 1995. [20] U.S. Geological Survey, “What is an earthquake and what causes them to happen?,” U.S. Department of the Interior, 2019. [21] Incorporated Research Institutions for Seismology (IRIS), “Seismic Wave Behavior — Effect on Buildings” . [22] Incorporated Research Institutions for Seismology (IRIS), “3-Component Seismograph,” 2017. [23] A. S. N. Alarifi, N. S. N. Alarifi, and S. Al-Humidan, “Earthquakes magnitude predication using artificial neural network in northern Red Sea area,” J. King Saud Univ. - Sci., vol. 24, no. 4, pp. 301–313, Oct. 2012, doi: 10.1016/j.jksus.2011.05.002. [24] I. Wahyuni, N. R. Adam, W. F. Mahmudy, and A. Iriany, “Modeling backpropagation neural network for rainfall prediction in tengger east Java,” in Proceedings - 2017 International Conference on Sustainable Information Engineering and Technology, SIET 20�