Given the difficulty of accurate online detection for massive data collecting real-timely in a strong noise environment during the complex geological mineral grade analysis process, an order self-learning ARHMM (Autoregressive Hidden Markov Model) algorithm is proposed to carry out online outlier detection in the geological mineral grade analysis process. The algorithm utilizes AR model to fit the time series obtained from “Online x - ray Fluorescent Mineral Analyzer” and makes use of HMM as a basic detection tool, which can avoid the deficiency of presetting the threshold in traditional detection methods. The structure of traditional BDT (Brockwell-Dahlhaus-Trindade) algorithm is improved to be a double iterative structure in which iterative calculation from both time and order is applied respectively to update parameters of ARHMM online. With the purpose of reducing the influence of outlier on parameter update of ARHMM, the strategies of detection-before-update and detection-based-update are adopted, which also improve the robustness of the algorithm. Subsequent simulation by model data and practical application verify the accuracy, robustness, and property of online detection of the algorithm. According to the result, it is obvious that new algorithm proposed in this paper is more suitable for outlier detection of mineral grade analysis data in geology and mineral processing. ABSTRACT Keywords: ARHMM; BDT; KICvc; outlier detection; online detection. Online Outlier Detection for Time-varying Time Series on Improved ARHMM in Geological Mineral Grade Analysis Process Detección en tiempo real de valores atípicos sobre series de tiempo variable en ARHMM mejorado durante el proceso de análisis de grado mineralógico ISSN 1794-6190 e-ISSN 2339-3459 http://dx.doi.org/10.15446/esrj.v21n3.65215 Jianjun Zhaoa,b, Junwu Zhoubb, Weixing Suc*, Fang Liuc a School of Information Science & Engineering, Northeastern University, Shenyang 110004, China; b BeiJing General Research Institute of Mining & Metallurgy, BeiJing 100160, China c School of Computer Science & Software Engineering, Tianjin Polytechnic University, Tianjin 300387, China *Email of Corresponding Author: 15900201597@163.com Existe gran dificultad para la detección en tiempo real para series de datos masivos con altos niveles de ruido de valores atípicos. Se propone un algoritmo de autoaprendizaje ARHMM (Modelo autoregresivo oculto de Markov) para llevar a cabo la detección de dichos valores atípicos en el proceso de análisis del grado mineral geológico. El algoritmo usa un modelo AR para ajustar la serie de tiempo obtenida del “analizador de fluorescencia de rayos X” y hace uso del HMM como una herramienta básica de detección, la cual puede evitar la deficiencia de predeterminar el umbral en métodos tradicionales de detección. Para actualizar los parámetros del ARHMM en tiempo real, la estructura del algoritmo BDT (Brockwell-Dahlhaus-Trindade) tradicional se mejora para ser una doble estructura iterativa en la que se aplica el cálculo iterativo en tiempo y en orden respectivamente. Con el propósito de reducir la influencia de valores atípicos (o extremos) en la actualización del parámetro de ARHMM, se adoptan las estrategias de detección-antes-que-actualización y la detección-basada-en-actualización, lo que también aumenta la robustez del algoritmo. La subsiguiente simulación por modelos de datos y aplicación práctica comprueba la precisión, fortaleza y capacidad de la detección en línea del algoritmo. De acuerdo con el resultado, es evidente que el nuevo algoritmo propuesto en este artículo es más apropiado para la detección de datos de valores atipicos para el análisis del grado mineral en geología y el procesamiento mineral. RESUMEN Palabras clave: Modelo autoregresivo oculto de Markov; detección en tiempo real; Brockwell- Dahlhaus-Trindade. Record Manuscript received: 21/02/2017 Accepted for publication: 28/07/2017 How to cite item Zhao, J., Zhou, J., Su, W., & Liu, F. (2017). Online Outlier Detection for Time-varying Time Series on Improved ARHMM in Geological Mineral Grade Analysis Process. Earth Sciences Research Journal, 21(3), 135-139. doi: http://dx.doi.org/10.15446/esrj.v21n3.65215 G E O ST A T IS T IC S EARTH SCIENCES RESEARCH JOURNAL Earth Sci. Res. J. Vol. 21, No. 3 (September, 2017): 135 - 139 136 Jianjun Zhao, Junwu Zhou, Weixing Su, Fang Liu Introduction Mineral composition analysis is a key factor in determining whether or not to carry out mining. Over the years, many scholars have proposed some new ideas and methods for accurate mineral grade assessment, many of which are based on chemical or physical test equipment to obtain the data for ingredient grade analysis (Kameshwara, Rao, & Narayana, 2014; De'nan, Naaim, & Leong, 2017). Therefore, the accuracy of the data used for ore composition analysis is critical to the ore grade analysis. At present, automated testing equipment is used in ore grade analysis, such as “BOX-A type on-stream x-ray fluorescence analyzer”, which uses spectral obtain by irradiating X-rays to the pulp to get the results of ore grade. It is worth noting that BOX-A type on-stream x-ray fluorescence analyzer by default is that the spectral data obtained is correct. But whether it is chemical or physical testing equipment are inevitably produced abnormal data. Those outliers directly affect the analysis results of the mineral products analysiser (Clarke, & Levis, 1998; Rivoirard, Demange, & Freulon, 2013). Therefore, the detection and elimination of these abnormal data is the premise and key to the above ore grade analysis work. A new algorithm is proposed here to especially do outlier detection for ore inspection data which obtain from chemical or physical testing equipment. The algorithm utilizes AR model to fit the time series and makes use of HMM as a basic detection tool, which can avoid the deficiency of presetting the threshold in traditional detection methods. To update parameters of ARHMM online, the structure of traditional BDT (Brockwell-Dahlhaus-Trindade) algorithm is improved here, and a double iterative structure in which iterative calculation from both time and order is applied respectively. With the purpose of reducing the influence of outlier on parameter update of ARHMM, the strategies of detection-before-update and detection-based-update are adopted, which also improve the robustness of the algorithm. Subsequent simulation by model data and practical application verify the accuracy, robustness, and property of online detection of the algorithm. In this paper our innovations are shown as follow: 1. Unlike other outlier detect method (such as the traditional AR model detection method), the outliers detect method proposed in this paper does not need to set the detection threshold. 2. Considering the problem that the model order of chemical or physical testing equipment’s hard to be determined, the new detected method which is based on residual error has the function of model order self-learning. 3. In a view to avoid the influence of outliers on the test results, this paper proposes a detection-before-update and detection-based-update strategies. The Predecessors’ Achievements on Outlier detection Many good ideas and methods are put forward for the research of outliers detection problem, such as that Barnett and Lewis proposed an outliers detected method based on statistics in their word named ‘Outlier in Statistical Data (Barnett & Lewis, 1994). Outlier detection method based on distance is proposed by Knorr and Raymond (Knorr & Ng, 1999; Edwin & Raymond, 1998), an new detected method based on density is suggested by Ramaswamy et al. (2000). But for ore inspection data, the detection methods based on distance, density or variance is a lack of feasibility since an online real-time detection method be needed for the ore test data. With the research of anomaly data detection technology, many new ideas and techniques are introduced, such as clustering analysis (Almeida & Barbosa, 2007) and neural network method (Bullen, Cornford, & Nnbney, 2003; Prakobphol, & Zhan, 2008). But clustering analysis method is also not suitable for online outlier detection for extensive test data, and neural network method requires a lot of data to model learning. In 1995, Ragaran and Argrawal put forward the concept of "sequence anomaly" (Han & Micheline, 2001) and proposed the detection method based on deviation (Takeuchi, & Yamanishi, 2006). Because this method needs to know the order of the model, it can not be directly used for the outlier detection of mineral grade analysis data. Structure of Double iteration in BDT To make the BDT algorithm can be calculated online, the improved BDT algorithm with double repetition structure is proposed in this paper. Traditional BDT algorithm The traditional BDT algorithm is improved by Levinson-Durbin algorithm which is proposed by Brockwell et al. (2002). For traditional BDT algorithm, using all the data to the iterative calculation of model order, in a view to obtain the order of the forward and backward AR model. xt, t =1,2,∙∙∙ is the test data waiting for detection, where xt is m -dimension vectors. So forward AR model can be express as Equation 1. (1) In which, ɛk, (t) is forward residual under k order model, which obeys Gauss distribution with zero means. ak (i) is the coefficient of forwarding AR model under k order model. So backward AR model can be written as Equation 2. (2) With Minimizing all data forward and backward residual as the target, so the generalized objective function written as Equation 3 (Trindade, 2003). (3) n is the number of the data. ω1 and ω2 are weighted coefficient matrix for forwarding and backward AR model, and in BDT algorithm, the values of ω1 and ω2 is 1. ἓk (t) and are Estimated residual values for forward and backward AR model individually. (4) (5) In which, are m -dimension matrix. The traditional BDT algorithm can be written in full as followed: (6) (7) (8) (9) (10) (11) (12) (13) In Equation 6 to Equation 13, Ûk and V̂k are estimated variance for forwarding and backward noise. The initial condition for traditional BDT algorithm are: (14) 137Online Outlier Detection for Time-varying Time Series on Improved ARHMM in Geological Mineral Grade Analysis Process (15) (16) The subscript ø express that when the initial iteration, the model order set is empty. • Double iteration BDT algorithm The objective function of improved BDT algorithm also is Equation 3. The dynamic performance of the algorithm is enhanced by the forgetting factor . The improved BDT algorithm has double loop structure which model order is inner loop and time is the outer loop. We set which is part of Equation 6. So: (17) In Equation 7, k a set maximum value for model order. Considering the time-varying characteristics of the model parameters, the forgetting factor is added to the outer loop(time loop) updates. (18) (19) In Equation 18, Ri t is the mean of the covariance matrix for ἓi-l (t) and ἢi-l (t-i) in time . Similarly, Equation 13 can be rewritten as: (20) So the calculation process of double iteration algorithm is illustrated in Figure 1. conditional probability. The other is expressed as observation probability matrix calculated by AR model. (21) In Equation 21, N(•) is Gauss function, and Ʃ̂k is estimated the variance of Gauss distribution. ARHMM outlier detection algorithm also composed of two steps: One step-- Preliminary detection From Equation 1, we can see that there is a deviation between estimated process data by AR model and real process data. (22) If the deviation ɛk (t) is only noise, it obeys Gauss distribution. So the preliminary criteria for outlier detection are to determine the probability that the deviation follows Gauss distribution. (23) In Equation 23, st = 1 indicates that the real data detected is normal, st = 0 means it is the outlier. So the detection criteria can be expressed that: (24) In Equation 23, the subscript p is the optimal model order calculated by KICvc criteria whose expression is: (25) In Equation 25, is the mean of residual ɛk (t) under various model order (Bilmes, 2006). (26) (27) Two step-- Final detection In final detection, the result of Preliminary detection is the observed value of HMM. So the final detection result obtained by Viterbi algorithm (Abd-Krim, 2006): (28) For improved ARHMM algorithm, when the data at t time is detected, the data before t time already is detected. So the traditional Viterbi algorithm is request into: (29) (30) Parameters Updating by Outlier The parameters of order self-learning ARHMM algorithm need update online, and the parameters are estimated residual mean , State transition matrix A , and in improved BDT algorithm. Specific update algorithm is as follows: Figure 1. Flow chart of double iteration algorithm Implementation of Order Self-learning ARHMM Detection Algorithm The traditional ARHMM structure is composed of two parts (Wang, & Chiang, 2008): one is Markov chain, which is expressed as initial state probability π and state transition matrix in which St is the state at time t, N is the total state for HMM, and is a 138 Jianjun Zhao, Junwu Zhou, Weixing Su, Fang Liu (A) : If is normal data, then maintained by Equations 30, 31, otherwise, not updated at time t. (B) : since there are two states in ARHMM algorithm. So the updated algorithm is: (31) In Equation 31, N (ay) indicates the times of the situation that St-1 = i, St = j, appears (Lou, 195). (C) , : is normal data, then , calculated by Equation(14), otherwise, Using data means to replace xt. (D) is normal data, then calculated by Equation 19; otherwise, it calculated by Equation (32). (32) (E) is normal data, then calculated by Equation (18); otherwise, it calculated by Equation 33. (33) Results and Discussion Model-Based Validations To verify the accuracy of the algorithm to detect the order of the model, the three order model which reacting ore detection interference process are used to generate the data. The data is shown in Figure 2-(a), and the order detection results are shown in Figure 2-(b). In which: (35) As can be seen from (34), there are time vary parameters on the denominator. To verify the robustness of the algorithm, we add 10% of the white noise and eight anomalies. The data is shown in Figure 3-(a), and the order detection results and outliers detected results are shown in Figure 3-(b) and Figure 3-(c). Figure 2. Data and results of order detection As can be seen from the order detection results, the proposed algorithm can accurately detect the model order through the short-term adjustment process. With a view to verify that the proposed algorithm not only can identify the optimal model order but also can detect the abnormal data accurately. We modified the open-loop model mentioned in the Alex Alexandridis paper to get the second set of data (Alexandridis, Sarimveis, & Bafas, 2003). Rabiner 1989). The modified model is as follows: (34) Figure 3. Data and detection results for model data Fig.3-(a) is the data waiting for detecting. Fig.3-(b) is the order estimated result, and Fig.3-(c) is the outlier detection result. As can be seen from the result, for the third-order nonlinear time-varying system, the model-order self-learning algorithm proposed in this paper can find its optimal model order and accurately detect all the anomaly data. Application To further verify the practicality of the improved ARHMM outlier identified method, it is applied to the on-stream x-ray fluorescence analyzer, which, first of all, in the last century 70's by the Finotec Outotec company successfully developed and implemented to mineral processing practice. So far, Finland Outotec company still has more than 80% market share. In China, the Beijing Institute of Mining and Metallurgy following its analytical principles, developed in 2014 with the same function grade analyzer - BOXA type on-stream x-ray fluorescence analyzer (Hekimoglu, Eernoglu, & Kalina, 2009). According to foreign reports, the measurement accuracy of the analyzer increased by 1%, will effectively improve the metal recovery rate of 0.2 or more, and now from the hardware to improve measurement accuracy has been very difficult, or input-output serious disproportionate, so more scholars turn to the analysis of the modeling technology to improve research. Based on this situation, the accuracy of the data for the model of learning is essential Our comparative tests are as follows: The first set of data is obtained as follows: first of all, we use the improved ARHMM outlier detection algorithm proposed in this paper to pre-process the spectral signal which as the input of the analyzer, and get the ore grade results as the first set of data. 139Online Outlier Detection for Time-varying Time Series on Improved ARHMM in Geological Mineral Grade Analysis Process The second set of data is obtained as follows: we use the traditional AR outlier detection algorithm proposed in paper by Northey, Mohr, & Mudd (2014) to pre-process the spectral signal which as the input of the analyzer, and get the ore grade results as the second set of data. Finally, the two sets of data are compared with the results of the ore grade laboratory test, the error of the two groups of test results as shown in table 1. Table 1. The result of Comparison between with outlier detection process and without outlier detection process It can be seen from the table that there is higher accuracy when using the improved ARHMM outlier detection algorithm proposed in this paper to do the pre-processing of the spectrum compared to tradition AR outlier detected method, which since the improved ARHMM outlier detected algorithm have more robustness and more suitable for non-linear systems. Conclusions Taking into account the lack of ARHMM algorithm for ore grade analysis process, an order self-learning ARHMM algorithm is proposed in this paper, whose innovation points are summarized as: first, unlike other outlier detection method (such as the traditional AR model detection method), the outliers detection method proposed in this paper does not need to set the detection threshold. Second, considering the problem that the model order of control system‘s hard to be determined, the new detected method which is based on residual error has the function of model order self-learning. And third, to avoid the influence of outliers on the test results, this paper proposes a detection- before-update and detection-based-update strategies. So under above improving, ARHMM algorithm can more accurately use to analysis the data in the geological mineral grade analysis process. In other words, the application field of ARHMM algorithm has been expanded. Subsequent simulation by model data and practical application verify the accuracy, robustness, and property of online detection of the algorithm. According to the result, it is evident that new algorithm proposed in this paper is more suitable for outlier detection in the geological mineral grade analysis process. Acknowledgments This research is partially supported by National Natural Science Foundation of China under Grant 51607122, 61602343. References Abd-Krim, S. (2006). Vector Autoregressive Model-Order Selection From Finite Samples Using Kullback’s Symmetric Divergence. IEEE Transactions on Circuits and Systems I: Regular Papers, 53(10), 2327. Alexandridis, A., Sarimveis, H., & Bafas, G. (2003). A new algorithm for online structure and parameter adaptation of RBF networks. Neural Networks, 16(7), 1003-1017. Almeida, J. A. S., & Barbosa L. M. S. (2007). A new method with outlier detection and automatic clustering. Chemometrics and Intelligent Laboratory Systems, 87(2), 208. Barnet, V., & Lewis, T. (1994). Outlier in Statistical Data. John Wiley & Sons, Chichester. Bilmes, A. J. (2006). What HMMs Can Do. IEICE - Transactions on Information and Systems, E89-d(3), 1. Brockwell, P. J., Dahlhaus, R., & Trindade, A. A. (2002). Modified Burg Algorithms for Multivariate Subset Autoregression. Technical Report 2002-015, Department of Statistics, University of Florida. Bullen, R. J., Cornford, D., & Nnbney, I. T. (2003). Outlier detection in scatterometer data: neural network approaches. Neural Networks, 16(3-4), 419. Clarke, B. R., & Levis, T. (1998). An outlier problem in the determination of ore grade. Journal of applied statistics, 25(6), 751-662. De'nan, F., Naaim, N., & Leong, L. C. (2017). Behaviour of flush end-plate connection for perforated section. Engineering Heritage Journal, 1, 11-20. Edwin, M. K. & Raymond, T. N. (1998). Algorithms for Mining Distance- based Outliers. Proceedings of the twenty-fourth international conference on very large data bases, C. 392. Han, J. W., & Micheline, K. (2001). Data mining concepts and techniques. Machinery Industry Press, China. Hekimoglu, S., Eernoglu, R. C., & Kalina, J. (2009). Outlier detection by means of robust regression estimators for use in engineering science. Journal of Zhejiang university-science A, 10(6), 909. Kameshwara, R., Rao, C. R., & Narayana, A. C. (2014). Assessing grade domain of iron ore deposit using geostatistical modelling: A case study. Journal of the Geological Society of India. 83(5), 549-554. Knorr, E. M. & Ng, R. T. (1999). Finding Intentional Knowledge of Distance-based Outliers. Proceedings of the twenty-fifth international conference on very large data bases, C. 211. Lou, H. L. (1995). Implementing the Viterbi Algorithm. IEEE Signal Processing Magazine, 1053-5888, 42. Northey, S., Mohr, S., & Mudd, G. M. (2014). Modeling future copper ore grade decline based on a detailed assessment of copper resources and mining. Resources conservation and recycling, 83, 190-201. Prakobphol, K., & Zhan J. T. (2008). A novel outlier detection scheme for network intrusion detection systems. Proceedings of the second international conference on information security and assurance, C. 555. Rabiner, R. L. (1989). A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of the IEEE. 77(2), 257. Ramaswamy, S., Rastogi, R., & Kyuseok, S. (2000). Efficient algorithms for mining outliers from large data sets. Proceedings of the ACM SIGMOD International Conference on Management of Data Dallas, C. 427. Rivoirard, J., Demange, C., & Freulon, X. (2013). A Top-Cut Model for Deposits with Heavy-Tailed grade distribution. Mathematical geosciences, 45(8), 967-982. Takeuchi, J., & Yamanishi, K. (2006). A Unifying Framework for Detecting Outliers and Change Points from Time Series. IEEE Transactions on Knowledge and Data Engineering, 18(4), 482. Trindade, A. A. (2003). Implementing Modified Burg Algorithms in Multivariate Subset Auto-regressions Modeling. Department of Statistics, University of Florida. Wang, J. S. & Chiang, J. C. (2008). A cluster validity measure with Outlier detection for support vector clustering. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 38(1), 78.