AP06_2.vp 1 Introduction Adaptive control systems typically work in a feedback re- gime, hence the quality of the control depends heavily on the quality of the measurements of the process data. However, the measured data are often corrupted by various disturbances caused by uncertain elements of the process, such as measure- ment noise, malfunctions of measuring devices, etc. These disturbances can negatively influence the performance of the resulting automatic system. Therefore, the task of data pre-processing (filtration) is of great importance in adaptive control, e.g. [1] or [2]. One of the most dangerous corrup- tions of the measured data is due to outliers. An outlier is an incorrect measurement of the process, which is significantly different from the real process data. Two types of outliers are distinguished: i) isolated outliers; which are caused by an iso- lated failure of the measurement; ii) blocks of outliers; which are caused by a temporary breakdown of a measuring device. The former type is relatively easy to detect. However, it is more challenging to detect the latter, since the block of outliers may have some characteristics of uncorrupted data. In this paper, we will to model corrupted data by a probabilistic mixture of dynamic (i.e. autoregressive) models [3, 4, 5]. The model is identified using a Bayesian approach [6, 7, 8, 9, 10]. Aim and outline of the solution The task addressed in this paper is to detect of outliers and to reconstruct the measured data. The proposed solution of this task is based on modelling of the corrupted data by a mixture model comprising two components. The first component models the uncorrupted data, while the second models the outliers. Detected outliers are replaced by predictions from the first component. 2 Principle of mixture model estimation The quasi-Bayes algorithm for recursive estimation of mixture model parameters was developed recently [11]. The algorithm is designed for estimating mixture model parame- ters with components in the form of linear regression models. The mixing weights of the components are considered to be unknown, and their estimates are also provided by the algorithm. Mixture models The mixture model is described as a conditional probabil- ity density f d d t f dt i i t t i i c ( ( ), , ) ( , )� � � � �1 1 1 � � � � � � (1) where f ( )� � denotes conditional probability density function (pdf), d is modelled (and filtered) variable; dt is actual value at time t, �t�1 is a vector of historical data on which dt depends, � � � �� [ , , , ]1 2 � �c are parameters of individual components, � � � �� [ , , , ]1 2 � �c is a vector of components weights, c � is number of components. The main advantage of this model is the ability to describe a system with a finite number of different states, even if the re- lations between the states are very complex. Bayes rule for mixture models Direct application of the well known Bayes rule f d t f d t d t f d t( , ( )) ( ( ) ( ), , ) ( , ( ))� � � � � �� � �1 1 (2) to the mixture model (1) yields intractable posterior distribu- tion. Specifically, application of the Bayes rule (2) to the mix- ture model (1) yields posterior distribution in the form of a mixture with ct components. Hence, the complexity of the posterior grows with time, t, which is prohibitive for on-line processing. Model approximation To solve the above problem, an approximate Bayesian estimation is used. It is achieved in three steps: (i) introducing a random variable ct that indicates the active component at time t, (ii) reformulation of the model of the active component into a product form: f d f dc t t i t t i i c i c t t( , , ) ( , ) ( )� � � � � �� � � � � �1 1 1 � , (3) and (iii) approximating the Kronecker delta function �(i � ct) in (3) by its conditional mean value 30 © Czech Technical University Publishing House http://ctn.cvut.cz/ap/ Acta Polytechnica Vol. 46 No. 2/2006 Czech Technical University in Prague Mixture Based Outlier Filtration P. Pecherková, I. Nagy Success/failure of adaptive control algorithms – especially those designed using the Linear Quadratic Gaussian criterion – depends on the quality of the process data used for model identification. One of the most harmful types of process data corruptions are outliers, i.e. ‘wrong data’ lying far away from the range of real data. The presence of outliers in the data negatively affects an estimation of the dynamics of the system. This effect is magnified when the outliers are grouped into blocks. In this paper, we propose an algorithm for outlier detection and removal. It is based on modelling the corrupted data by a two-component probabilistic mixture. The first component of the mixture models uncorrupted process data, while the second models outliers. When the outlier component is detected to be active, a prediction from the uncorrupted data component is computed and used as a reconstruction of the observed data. The resulting reconstruction filter is compared to standard methods on simulated and real data. The filter exhibits excellent properties, especially in the case of blocks of outliers. Keywords: data filtration, system modelling, mixture models, Bayesian estimation, prediction. E i c d t i c f c d t c i d t t t i c t t [ ( ) ( )] ( ) ( ( )) Pr( ( ) � �� � � � � � � � 1 � ) ,,� wi t (4) where Pr(�) denotes probability. An evaluation of the weight for linear regression models is available, [6] or [12]. Effect of the approximation The mean value (4) is a vector of probabilities, wi, t, of the individual components. Thus, at each time instant, the statistics of all components are updated by the observed data. The contribution of the observed data to each component is given by the estimated weight (4). For components from an exponential family [6], the estimation is equivalent to the weighted least squares technique. Initiation of the estimation The Bayesian estimation (2) updates the parameter de- scription – represented by conditional pdf f d t( , ( ))� � – using the observed data, dt, for all times t � 1, 2, …, d � , where d � is the number of available data items. The recursion starts at t � 1 with pdf f d( , ( ))� � 0 , which is called the prior pdf. This pdf reflects our prior knowledge about parameters � and �. The prior can also be used to ensure that the estimated model has certain advantageous features [13]. Approximate estimation algorithm The algorithm for an approximate estimation of the mix- ture model parameters with exponential family components is outlined in the following scheme: A. Initial off-line part ˆ Choose the number of components of the mixture mod- el and their structure. ˆ Set initial statistics of parameters. B. On-line time loop ˆ Measure the current data. ˆ Compute probabilistic weights of all components using (4). The component with maximum weight is called the active component. ˆ Update parameter statistics for each component. C. Concluding off-line part ˆ Compute point estimates of the parameters from their statistics (if they are needed). 3 Mixture-based outlier filtration The process of Bayesian mixture estimation, indicated above, is adapted for outlier detection and reconstruction. Idea of the filter The main idea is to model the observed data by a proba- bilistic mixture with two components: 1) the data component; which models uncorrupted data, and 2) the outlier component; which models the outliers. Initiation of the filter The initial description of the components is formalized by the prior pdf. The prior variance of the data component is chosen from prior analysis of the filtered data and thanks to forgetting [14], the variance is not allowed to change much. The prior variance of the outlier component is chosen signifi- cantly larger that that of the data component. Moreover, it is left relatively free, in order to be able to “catch” whatever does not belong to the uncorrupted signal, i.e. the outliers. Naturally, better modelling of uncorrupted data allows better separation of the data from the corruptions. Dynamic models describe the variable in dependence on its historical values, while a static description is without this. Our experi- ence with data modelling [15] suggests that even data that is almost static deserves to be described by dynamic models in order to achieve high quality of the description. Therefore, the data component was chosen as a first order regression model. The structure of the outlier component is relatively loose and is chosen as static, i.e. a zero order regression model. Its only task is to “cover” all possible errors, mainly outliers. Operation of the filter As described in the previous paragraph, the estimation of a mixture model is based on weighting the data with respect to the individual components. Using the estimated weights the active component can be detected. This mechanism is used for outlier detection as follows: 1. if the dominant weight belongs to the data component, no action is taken, and 2. if the dominant weight belongs to the outlier component, the current data item is considered to be an outlier and the observed value is substituted by a simulated realization of the data component. A problem occurs if in the following step the current data is not an outlier. Then the dominant weight belongs to the data component and it would be influenced by the old data item, which is now the outlier, through its regression vector. This value going to the regression vector of the data compo- nent must therefore also be substituted by the filtered value. Algorithm of the filtration The filter can be summarized by the following modifica- tion of the above mixture estimation algorithm. A. Initial off-line part ˆ Set initial statistics of parameters for a two-components mixture model, � first component with small data covariance (the data component), � second component with large data covariance (the outlier component). ˆ Set the forgetting coefficient. B. On-line time loop ˆ Measure the current data. ˆ Compute the probabilistic weights of both components with respect to the measured data. ˆ Choose the component with the greater weight. ˆ If the chosen component is that of the data, � re-evaluate parameter statistics for each component separately. © Czech Technical University Publishing House http://ctn.cvut.cz/ap/ 31 Czech Technical University in Prague Acta Polytechnica Vol. 46 No. 2/2006 ˆ If the chosen component is that of the outliers, � generate data prediction using the data component, � use the predicted value as a reconstruction of the observed data, � use the predicted value in the (future) regression vector of the data component, � re-evaluate parameter statistics for the outlier com- ponent only. 4 Experiments In this section, we test the proposed algorithm on real data. A sample of data from a traffic miniregion in the center of Prague was chosen for the experiments. The data refers to formed by intensities of traffic flow measured at a single point in the miniregion. The noise corrupting the data is represented mainly represents standard traffic irregularities (interactions between neighbouring control lights, accidental accumulations of cars, minor accidents etc.). The data sample consist of 1000 data items measured with a period of 5 minutes. It involves data over approximately 3,5 days. The intensity maxima reflect the traffic load during each day. The noise causes dissimilarities of the courses for in- dividual days. Different daily courses (visible at the beginning of the fourth day) are caused by different types of days, such as weekdays and weekends. The outliers are not frequent disturbances, but they are very important due to their devastating effects on model esti- mation. They are caused either by accidental breakdowns of detectors or by detector failure for several periods of mea- surements. Especially the latter category are very difficult to distinguish automatically from the normal signal. In order to test the filter, the data was artificially corrupted by various types of outliers. Basically, single outliers and blocks of outliers are used in all experiments. Then, various outlier amplitudes are tested – big, medium, small – and a combination of these types in a single data sample. For all experiments, the results of the proposed filter are compared to those obtained using standard filters. These filters are based on a fixed-length window, moving along the current time, and evaluating some data characteristics for compari- son with the current data measurement to detect an outlier. These characteristics are either the mean value or the median computed over the window. These characteristics are com- puted either equally for all data or via a kind of forgetting algorithm. A description of such filters can be found e.g. in [16, 17, 18, 19]. Many preliminary experiments were per- formed to compare the suggested mixture filter to standard filters. All of them gave comparable results for isolated outli- ers but almost all standard filters were quite unsuitable for fil- tering block outliers. Typically, the standard filters failed to detect a block of outliers. Among all the standard filters, only two were found to be comparable with the proposed mixture filter. Standard filter No. 1 was designed to detect a block of out- liers [16]. After detecting the borders of a block outlier, it models the data before and after the outlier with a simple re- gression model and substitutes the outlying values by a combi- nation of predictions from both these models. Standard filter No. 2 is a median filter with window size 200 (time periods) and without forgetting. To demonstrate the results of filtering results in this paper, only these two standard filters are used. As mentioned above, the components have different co- variances. For these experiments, the data component has 32 © Czech Technical University Publishing House http://ctn.cvut.cz/ap/ Acta Polytechnica Vol. 46 No. 2/2006 Czech Technical University in Prague Fig. 1: Uncorrupted transportation data covariance covd � 0.1 and the outlier component has covar- iance covo � 100. Example 1: All big outliers The first experiment follows a typical scenario, i.e. outliers with big amplitudes. The level of the outliers is about 5000, which is approximately 100 times the level of the uncor- rupted data. The mixture filter completely detects all outliers and leaves the normal data without any change. The filtered variable is plotted in Fig. 2. The filtering gives practically identical data (cf. Fig. 1), up to 20 isolated outliers and two short blocks (first 100–200 items and second 650–700 items) where groups of outliers were located. All substitutions for outliers are in an appropri- ate range. In order to evaluate the results in a non-visual way, and making use of the fact that the outliers were introduced artificially, the corrupted data is compared to its predictions from a model estimated on the basis of a filtered data sample. This quality evaluation is done through the prediction error (PE) coefficient, which is the square root of the sum of squares of the prediction error divided by the variance of the data. The results for the suggested mixture filter and the two cho- sen standard filters are given in the following table (Table 1). Example 2: All small outliers An outlier is a value lying “far” outside the range of the corrupted data. What happens if “far” is not so far as in the previous experiment? Now, outliers of an amplitude from 1 to 5 times the uncorrupted data amplitude are tested. The com- position of the data and outliers is the same. The results are summarized in the following table: Once again, the mixture filter outperforms the others. The absolute values of the differences of PE are smaller than in the previous experiment because the outliers are smaller © Czech Technical University Publishing House http://ctn.cvut.cz/ap/ 33 Czech Technical University in Prague Acta Polytechnica Vol. 46 No. 2/2006 Fig. 2: Filtered data Filter PE coefficient Mixture filter 0.49 Standard filter No. 1 0.72 Standard filter No. 2 4.80 Remark: The results of the PE coefficient for the other standard filters were from 8 to 170. The big difference is caused by the fact that the standard filters are not able to detect the blocks of outliers. Table 1: PE coefficients for all big outliers Filter PE coefficient Mixture filter 0.50 Standard filter No. 1 1.52 Standard filter No. 2 1.36 Table 2: PE coefficients for all small outliers and the failure to detect them results in a smaller contribution to the PE. Example 3: First big and then small outliers This last case is the most difficult, because the filter could “calibrate” the size of the outliers according to the first suspi- cious data and miss everything that is smaller than its pattern. Will the filter be able to recognize smaller outliers that follow the big- ger ones? The results are summarized in the following table: Also in this example, which is the most dificult for the mix- ture filter, the results are stable and superior to the other filters. 5 Conclusions A new type of filter detecting outliers and data recon- struction has been described and demonstrated on a serial of examples. The filter is based on modelling the data by a mix- ture model with two components: one for modelling uncor- rupted data, and the second for modelling the outliers. The main advantage of the filter is its ability to detect groups of outliers, which was demonstrated in a simulation. Outliers of this type arise from temporary breakdowns of measuring devices, which are rather frequent in transportation systems. In all the simulated examples, both single outliers and blocks of outliers were correctly detected and the data was recon- structed by reasonable values, generated by the uncorrupted data component. The comparison of the results with standard filters proved that reconstruction of a block of outliers is a difficult task. Typically, standard filters were not able to substi- tute the whole block of outliers. The best standard filters usually missed several outliers from the block before they “realized” that an outlier had occurred. The proposed mix- ture-based filter detects the outliers more correctly, and thus outperforms the standard filters in all simulated experiments. The particular disadvantage of the filter is when the blocks of outliers are very long. There can be a problem with the data component because its statistics were not recomput- ed during failure. Future work will be focus on solving this problem. Acknowledgment This research was partly supported by grants MŠMT ČR 1M6798555601, MDČR 1F43A/003/120. References [1] Zhao, F., Leong, T. Y: “A Data Preprocessing Frame- work for Supporting Probability-Learning in Dynamics Decision Modeling in Medicine”. J AM MED INFORM ASSN, Vol. suppl. S2000, 2000, p. 933–937. [2] Dzerovski, S., Gamberger, D., Lavrac, N.: “Noise Detec- tion and Elimination in Data Processing Experiments in Medical Domains.” APPL ARTIF INTELL, Vol. 14 (2000), No. 2, p. 205–223. [3] Titterington, D. M., Smith, A. F. M., Makov, U. E.: Sta- tistical Analysis of Finite Mixtures. John Wiley & Sons, Chichester, New York, Brisbane, Toronto, Singapore, 1985, ISBN 0 471 90763 4. [4] Richardson, S., Green, P. J.: “On Bayesian Analysis of Mixtures with an Unknown Number of Components, with Discussion”, Journal of the Royal Statistical Society, Se- ries B, Vol. 59 (1997), No. 4, p. 731–792. [5] McLachlan, G. J.: Finite Mixture Models. Wiley, New York, 1999. [6] Kárný, M., Nagy, I., Novovičová, J.: “Mixed-Data Multi- -Modelling for Fault Detection and Isolation”, Adaptive Control and Signal Processing, 2002, No. 1, p. 61–83. [7] Kárný, M.: “Probabilistic Support of Operators”. ERCIM News, 2000, No. 40, p. 25–26. [8] Ettler, P., Kárný, M., Nagy, I.: “Employing Information Hidden in Industrial Process Data”, in Preprints of Sym- posium on Intelligent Systems for Industry, Paisley, UK, Academic Press, 2001, p. 1814–1817. [9] Kárný, M., Nedoma, P., Nagy, I., Valečková, M.: “Initial Description of Multi-Modal Dynamic Models”, in Artifi- cial Neural Nets and Genetic Algorithms. Proceedings, Eds.: V. Kurková, R. Netruda, M. Kárný, N. C. Steele, Vi- enna, Springer, April 2001, p. 398–401. [10] Nagy, I., Nedoma, P., Kárný, M.: ”Factorized EM Algo- rithm for Mixture Estimation“, in Artificial Neural Nets and Genetic Algorithm. Proceedings, Eds.: V. Kurková, R. Netruda, M. Kárný, N. C. Steele, Vienna, Springer, April 2001, p. 402–405. [11] Kárný, M., Kadlec, J., Sutanto, E. L.: “Quasi-Bayes Esti- mation Applied to Normal Mixture”, in Preprints of the 3rd European IEEE Workshop on Computer-Intensive Methods in Control and Data Processing, Eds.: J. Ro- jíček, M. Valečková, M. Kárný, K. Warwick, Prague, September 1998, ÚTIA AV ČR, p. 77–82. [12] Nagy, I., Kárný, M., Nedoma, P., Voráčová, Š.: “Bayesian Estimation of Traffic Lane state”, International Journal of Adaptive Control and Signal Processing, Vol. 17 (2003), No. 1, p. 51–65. [13] Kárný, M., Khailova, N., Böhm, J., Nedoma, P.: “Quan- tification of Prior Information Revised”, International Journal of Adaptive Control and Signal Processing, Vol. 15 (2001), No. 1, p. 65–84. [14] Kulhavý, R., Zarrop, M. B.: “On General Concept of Forgetting”, International Journal of Control, Vol. 58 (1993), No. 4, p. 905–924. [15] Nagy, I.: “Estimation of Real Data with Dynamic Mix- tures”, Tech. Rep., research report No. 2066, ÚTIA AV ČR, Prague, 2002. [16] Tesař, L., Quinn, A.: “Detection and Removal of Outliers from Multidimensional AR Processes”, in Pro- ceedings of Irish Signal and Systems Conference, Maynooth, Ireland, August 2001. 34 © Czech Technical University Publishing House http://ctn.cvut.cz/ap/ Acta Polytechnica Vol. 46 No. 2/2006 Czech Technical University in Prague Filter PE coefficient Mixture filter 0.49 Standard filter No. 1 1.52 Standard filter No. 2 1.36 Table 3: PE coefficients for first big and subsequent small outliers [17] Ko, Sung-Jea, Lee, Y. H.: “Center Weighted Median Fil- ters and their Applications to Image Enhancement”, IEEE Transactions on Circuits and Systems, Vol. 38, No. 9, September 1991, p. 984–993. [18] Tesař, L., Quinn, A.: “Method for Artefact Detection and Suppressing Using Alpha-Stable Distributions”, in Proceedings of ICANNGA Conference, Prague, Czech Republic, March 2001. [19] Cipra, T.: “Dynamic Credibility with Outliers”, Applica- tions of Mathematics, Vol. 41 (1996), No. 2, p. 149–159. Ing. Pavla Pecherková e-mail: nemcova@utia.cas.cz Doc. Ing. Ivan Nagy, CSc. phone: +420 224 890 732 e-mail: nagy@fd.cvut.cz Department of Applied Mathematics Czech Technical University in Prague Faculty of Transportation Sciences CTU Na Florenci 25 110 00 Prague 1 Institute of Information Theory and Automation AV ČR P.O.Box 18 182 08 Prague 8, Czech Republic © Czech Technical University Publishing House http://ctn.cvut.cz/ap/ 35 Czech Technical University in Prague Acta Polytechnica Vol. 46 No. 2/2006