AP06_2.vp


1 Introduction
Adaptive control systems typically work in a feedback re-

gime, hence the quality of the control depends heavily on the
quality of the measurements of the process data. However, the
measured data are often corrupted by various disturbances
caused by uncertain elements of the process, such as measure-
ment noise, malfunctions of measuring devices, etc. These
disturbances can negatively influence the performance of
the resulting automatic system. Therefore, the task of data
pre-processing (filtration) is of great importance in adaptive
control, e.g. [1] or [2]. One of the most dangerous corrup-
tions of the measured data is due to outliers. An outlier is an
incorrect measurement of the process, which is significantly
different from the real process data. Two types of outliers are
distinguished: i) isolated outliers; which are caused by an iso-
lated failure of the measurement; ii) blocks of outliers; which are
caused by a temporary breakdown of a measuring device. The
former type is relatively easy to detect. However, it is more
challenging to detect the latter, since the block of outliers may
have some characteristics of uncorrupted data. In this paper,
we will to model corrupted data by a probabilistic mixture of
dynamic (i.e. autoregressive) models [3, 4, 5]. The model is
identified using a Bayesian approach [6, 7, 8, 9, 10].

Aim and outline of the solution
The task addressed in this paper is to detect of outliers and

to reconstruct the measured data. The proposed solution of this
task is based on modelling of the corrupted data by a mixture
model comprising two components. The first component
models the uncorrupted data, while the second models the
outliers. Detected outliers are replaced by predictions from
the first component.

2 Principle of mixture model
estimation
The quasi-Bayes algorithm for recursive estimation of

mixture model parameters was developed recently [11]. The
algorithm is designed for estimating mixture model parame-
ters with components in the form of linear regression models.
The mixing weights of the components are considered to
be unknown, and their estimates are also provided by the
algorithm.

Mixture models
The mixture model is described as a conditional probabil-

ity density

f d d t f dt i i t t i
i

c
( ( ), , ) ( , )� � �

�

�1 1
1

� � � � �

�

(1)

where
f ( )� � denotes conditional probability density function (pdf),
d is modelled (and filtered) variable; dt is actual value at

time t, �t�1 is a vector of historical data on which dt
depends,

� � � �� [ , , , ]1 2 � �c are parameters of individual components,
� � � �� [ , , , ]1 2 � �c is a vector of components weights,

c
�

is number of components.

The main advantage of this model is the ability to describe
a system with a finite number of different states, even if the re-
lations between the states are very complex.

Bayes rule for mixture models
Direct application of the well known Bayes rule

f d t f d t d t f d t( , ( )) ( ( ) ( ), , ) ( , ( ))� � � � � �� � �1 1 (2)

to the mixture model (1) yields intractable posterior distribu-
tion. Specifically, application of the Bayes rule (2) to the mix-
ture model (1) yields posterior distribution in the form of a
mixture with ct components. Hence, the complexity of the
posterior grows with time, t, which is prohibitive for on-line
processing.

Model approximation
To solve the above problem, an approximate Bayesian

estimation is used. It is achieved in three steps: (i) introducing
a random variable ct that indicates the active component at
time t, (ii) reformulation of the model of the active component
into a product form:

f d f dc t t i t t i
i c

i

c

t
t( , , ) ( , ) ( )� � � � � �� �

�

�

� �1 1
1

�

, (3)

and (iii) approximating the Kronecker delta function �(i � ct) in
(3) by its conditional mean value

30 ©  Czech Technical University Publishing House http://ctn.cvut.cz/ap/

Acta Polytechnica Vol. 46  No. 2/2006 Czech Technical University in Prague

Mixture Based Outlier Filtration
P. Pecherková, I. Nagy

Success/failure of adaptive control algorithms – especially those designed using the Linear Quadratic Gaussian criterion – depends on the
quality of the process data used for model identification. One of the most harmful types of process data corruptions are outliers, i.e. ‘wrong
data’ lying far away from the range of real data. The presence of outliers in the data negatively affects an estimation of the dynamics of the
system. This effect is magnified when the outliers are grouped into blocks. In this paper, we propose an algorithm for outlier detection and
removal. It is based on modelling the corrupted data by a two-component probabilistic mixture. The first component of the mixture models
uncorrupted process data, while the second models outliers. When the outlier component is detected to be active, a prediction from the
uncorrupted data component is computed and used as a reconstruction of the observed data. The resulting reconstruction filter is compared
to standard methods on simulated and real data. The filter exhibits excellent properties, especially in the case of blocks of outliers.

Keywords: data filtration, system modelling, mixture models, Bayesian estimation, prediction.


E i c d t i c f c d t

c i d t

t t
i

c

t

t

[ ( ) ( )] ( ) ( ( ))

Pr( ( )

� �� � � �

� �

�

�
1

�

) ,,� wi t

(4)

where Pr(�) denotes probability. An evaluation of the weight
for linear regression models is available, [6] or [12].

Effect of the approximation
The mean value (4) is a vector of probabilities, wi, t, of

the individual components. Thus, at each time instant, the
statistics of all components are updated by the observed data.
The contribution of the observed data to each component is
given by the estimated weight (4). For components from an
exponential family [6], the estimation is equivalent to the
weighted least squares technique.

Initiation of the estimation
The Bayesian estimation (2) updates the parameter de-

scription – represented by conditional pdf f d t( , ( ))� � –

using the observed data, dt, for all times t � 1, 2, …, d
�

, where d
�

is the number of available data items. The recursion starts at
t � 1 with pdf f d( , ( ))� � 0 , which is called the prior pdf. This

pdf reflects our prior knowledge about parameters � and �.
The prior can also be used to ensure that the estimated model
has certain advantageous features [13].

Approximate estimation algorithm
The algorithm for an approximate estimation of the mix-

ture model parameters with exponential family components
is outlined in the following scheme:
A. Initial off-line part

ˆ Choose the number of components of the mixture mod-
el and their structure.

ˆ Set initial statistics of parameters.

B. On-line time loop
ˆ Measure the current data.
ˆ Compute probabilistic weights of all components using

(4). The component with maximum weight is called the
active component.

ˆ Update parameter statistics for each component.

C. Concluding off-line part
ˆ Compute point estimates of the parameters from their

statistics (if they are needed).

3 Mixture-based outlier filtration
The process of Bayesian mixture estimation, indicated

above, is adapted for outlier detection and reconstruction.

Idea of the filter
The main idea is to model the observed data by a proba-

bilistic mixture with two components: 1) the data component;
which models uncorrupted data, and 2) the outlier component;
which models the outliers.

Initiation of the filter
The initial description of the components is formalized by

the prior pdf. The prior variance of the data component is

chosen from prior analysis of the filtered data and thanks to
forgetting [14], the variance is not allowed to change much.
The prior variance of the outlier component is chosen signifi-
cantly larger that that of the data component. Moreover, it is
left relatively free, in order to be able to “catch” whatever does
not belong to the uncorrupted signal, i.e. the outliers.

Naturally, better modelling of uncorrupted data allows
better separation of the data from the corruptions. Dynamic
models describe the variable in dependence on its historical
values, while a static description is without this. Our experi-
ence with data modelling [15] suggests that even data that is
almost static deserves to be described by dynamic models in
order to achieve high quality of the description. Therefore,
the data component was chosen as a first order regression
model. The structure of the outlier component is relatively
loose and is chosen as static, i.e. a zero order regression
model. Its only task is to “cover” all possible errors, mainly
outliers.

Operation of the filter
As described in the previous paragraph, the estimation of

a mixture model is based on weighting the data with respect
to the individual components. Using the estimated weights
the active component can be detected. This mechanism is
used for outlier detection as follows:
1. if the dominant weight belongs to the data component, no

action is taken, and
2. if the dominant weight belongs to the outlier component,

the current data item is considered to be an outlier and
the observed value is substituted by a simulated realization
of the data component.

A problem occurs if in the following step the current data
is not an outlier. Then the dominant weight belongs to the
data component and it would be influenced by the old data
item, which is now the outlier, through its regression vector.
This value going to the regression vector of the data compo-
nent must therefore also be substituted by the filtered value.

Algorithm of the filtration
The filter can be summarized by the following modifica-

tion of the above mixture estimation algorithm.
A. Initial off-line part

ˆ Set initial statistics of parameters for a two-components
mixture model,

� first component with small data covariance (the data
component),

� second component with large data covariance (the
outlier component).

ˆ Set the forgetting coefficient.
B. On-line time loop

ˆ Measure the current data.
ˆ Compute the probabilistic weights of both components

with respect to the measured data.
ˆ Choose the component with the greater weight.
ˆ If the chosen component is that of the data,

� re-evaluate parameter statistics for each component
separately.

©  Czech Technical University Publishing House http://ctn.cvut.cz/ap/ 31

Czech Technical University in Prague Acta Polytechnica Vol. 46  No. 2/2006


ˆ If the chosen component is that of the outliers,
� generate data prediction using the data component,
� use the predicted value as a reconstruction of the

observed data,
� use the predicted value in the (future) regression

vector of the data component,
� re-evaluate parameter statistics for the outlier com-

ponent only.

4 Experiments
In this section, we test the proposed algorithm on real

data. A sample of data from a traffic miniregion in the center
of Prague was chosen for the experiments. The data refers
to formed by intensities of traffic flow measured at a single
point in the miniregion. The noise corrupting the data is
represented mainly represents standard traffic irregularities
(interactions between neighbouring control lights, accidental
accumulations of cars, minor accidents etc.).

The data sample consist of 1000 data items measured with
a period of 5 minutes. It involves data over approximately
3,5 days. The intensity maxima reflect the traffic load during
each day. The noise causes dissimilarities of the courses for in-
dividual days. Different daily courses (visible at the beginning
of the fourth day) are caused by different types of days, such as
weekdays and weekends.

The outliers are not frequent disturbances, but they are
very important due to their devastating effects on model esti-
mation. They are caused either by accidental breakdowns of
detectors or by detector failure for several periods of mea-
surements. Especially the latter category are very difficult to
distinguish automatically from the normal signal.

In order to test the filter, the data was artificially corrupted
by various types of outliers. Basically, single outliers and
blocks of outliers are used in all experiments. Then, various
outlier amplitudes are tested – big, medium, small – and a
combination of these types in a single data sample. For all
experiments, the results of the proposed filter are compared
to those obtained using standard filters. These filters are
based on a fixed-length window, moving along the current
time, and evaluating some data characteristics for compari-
son with the current data measurement to detect an outlier.
These characteristics are either the mean value or the median
computed over the window. These characteristics are com-
puted either equally for all data or via a kind of forgetting
algorithm. A description of such filters can be found e.g. in
[16, 17, 18, 19]. Many preliminary experiments were per-
formed to compare the suggested mixture filter to standard
filters. All of them gave comparable results for isolated outli-
ers but almost all standard filters were quite unsuitable for fil-
tering block outliers. Typically, the standard filters failed to
detect a block of outliers. Among all the standard filters, only
two were found to be comparable with the proposed mixture
filter.

Standard filter No. 1 was designed to detect a block of out-
liers [16]. After detecting the borders of a block outlier, it
models the data before and after the outlier with a simple re-
gression model and substitutes the outlying values by a combi-
nation of predictions from both these models. Standard filter
No. 2 is a median filter with window size 200 (time periods)
and without forgetting. To demonstrate the results of filtering
results in this paper, only these two standard filters are used.

As mentioned above, the components have different co-
variances. For these experiments, the data component has

32 ©  Czech Technical University Publishing House http://ctn.cvut.cz/ap/

Acta Polytechnica Vol. 46  No. 2/2006 Czech Technical University in Prague

Fig. 1: Uncorrupted transportation data


covariance covd � 0.1 and the outlier component has covar-
iance covo � 100.

Example 1: All big outliers
The first experiment follows a typical scenario, i.e. outliers

with big amplitudes. The level of the outliers is about 5000,
which is approximately 100 times the level of the uncor-
rupted data. The mixture filter completely detects all outliers
and leaves the normal data without any change. The filtered
variable is plotted in Fig. 2.

The filtering gives practically identical data (cf. Fig. 1), up
to 20 isolated outliers and two short blocks (first 100–200
items and second 650–700 items) where groups of outliers
were located. All substitutions for outliers are in an appropri-
ate range. In order to evaluate the results in a non-visual way,
and making use of the fact that the outliers were introduced
artificially, the corrupted data is compared to its predictions
from a model estimated on the basis of a filtered data sample.
This quality evaluation is done through the prediction error
(PE) coefficient, which is the square root of the sum of squares
of the prediction error divided by the variance of the data.
The results for the suggested mixture filter and the two cho-
sen standard filters are given in the following table (Table 1).

Example 2: All small outliers
An outlier is a value lying “far” outside the range of

the corrupted data. What happens if “far” is not so far as in the
previous experiment? Now, outliers of an amplitude from 1 to 5
times the uncorrupted data amplitude are tested. The com-

position of the data and outliers is the same. The results are
summarized in the following table:

Once again, the mixture filter outperforms the others.
The absolute values of the differences of PE are smaller than
in the previous experiment because the outliers are smaller

©  Czech Technical University Publishing House http://ctn.cvut.cz/ap/ 33

Czech Technical University in Prague Acta Polytechnica Vol. 46  No. 2/2006

Fig. 2: Filtered data

Filter PE coefficient

Mixture filter 0.49

Standard filter No. 1 0.72

Standard filter No. 2 4.80

Remark: The results of the PE coefficient for the
other standard filters were from 8 to 170.
The big difference is caused by the fact
that the standard filters are not able to
detect the blocks of outliers.

Table 1: PE coefficients for all big outliers

Filter PE coefficient

Mixture filter 0.50

Standard filter No. 1 1.52

Standard filter No. 2 1.36

Table 2: PE coefficients for all small outliers


and the failure to detect them results in a smaller contribution
to the PE.

Example 3: First big and then small outliers
This last case is the most difficult, because the filter could

“calibrate” the size of the outliers according to the first suspi-
cious data and miss everything that is smaller than its pattern.
Will the filter be able to recognize smaller outliers that follow the big-
ger ones? The results are summarized in the following table:

Also in this example, which is the most dificult for the mix-
ture filter, the results are stable and superior to the other
filters.

5 Conclusions
A new type of filter detecting outliers and data recon-

struction has been described and demonstrated on a serial of
examples. The filter is based on modelling the data by a mix-
ture model with two components: one for modelling uncor-
rupted data, and the second for modelling the outliers. The
main advantage of the filter is its ability to detect groups of
outliers, which was demonstrated in a simulation. Outliers of
this type arise from temporary breakdowns of measuring
devices, which are rather frequent in transportation systems.
In all the simulated examples, both single outliers and blocks
of outliers were correctly detected and the data was recon-
structed by reasonable values, generated by the uncorrupted
data component. The comparison of the results with standard
filters proved that reconstruction of a block of outliers is a
difficult task. Typically, standard filters were not able to substi-
tute the whole block of outliers. The best standard filters
usually missed several outliers from the block before they
“realized” that an outlier had occurred. The proposed mix-
ture-based filter detects the outliers more correctly, and thus
outperforms the standard filters in all simulated experiments.

The particular disadvantage of the filter is when the
blocks of outliers are very long. There can be a problem with
the data component because its statistics were not recomput-
ed during failure. Future work will be focus on solving this
problem.

Acknowledgment
This research was partly supported by grants MŠMT ČR

1M6798555601, MDČR 1F43A/003/120.

References
[1] Zhao, F., Leong, T. Y: “A Data Preprocessing Frame-

work for Supporting Probability-Learning in Dynamics
Decision Modeling in Medicine”. J AM MED INFORM
ASSN, Vol. suppl. S2000, 2000, p. 933–937.

[2] Dzerovski, S., Gamberger, D., Lavrac, N.: “Noise Detec-
tion and Elimination in Data Processing Experiments
in Medical Domains.” APPL ARTIF INTELL, Vol. 14
(2000), No. 2, p. 205–223.

[3] Titterington, D. M., Smith, A. F. M., Makov, U. E.: Sta-
tistical Analysis of Finite Mixtures. John Wiley & Sons,
Chichester, New York, Brisbane, Toronto, Singapore,
1985, ISBN 0 471 90763 4.

[4] Richardson, S., Green, P. J.: “On Bayesian Analysis of
Mixtures with an Unknown Number of Components,
with Discussion”, Journal of the Royal Statistical Society, Se-
ries B, Vol. 59 (1997), No. 4, p. 731–792.

[5] McLachlan, G. J.: Finite Mixture Models. Wiley, New York,
1999.

[6] Kárný, M., Nagy, I., Novovičová, J.: “Mixed-Data Multi-
-Modelling for Fault Detection and Isolation”, Adaptive
Control and Signal Processing, 2002, No. 1, p. 61–83.

[7] Kárný, M.: “Probabilistic Support of Operators”. ERCIM
News, 2000, No. 40, p. 25–26.

[8] Ettler, P., Kárný, M., Nagy, I.: “Employing Information
Hidden in Industrial Process Data”, in Preprints of Sym-
posium on Intelligent Systems for Industry, Paisley, UK,
Academic Press, 2001, p. 1814–1817.

[9] Kárný, M., Nedoma, P., Nagy, I., Valečková, M.: “Initial
Description of Multi-Modal Dynamic Models”, in Artifi-
cial Neural Nets and Genetic Algorithms. Proceedings,
Eds.: V. Kurková, R. Netruda, M. Kárný, N. C. Steele, Vi-
enna, Springer, April 2001, p. 398–401.

[10] Nagy, I., Nedoma, P., Kárný, M.: ”Factorized EM Algo-
rithm for Mixture Estimation“, in Artificial Neural Nets
and Genetic Algorithm. Proceedings, Eds.: V. Kurková,
R. Netruda, M. Kárný, N. C. Steele, Vienna, Springer,
April 2001, p. 402–405.

[11] Kárný, M., Kadlec, J., Sutanto, E. L.: “Quasi-Bayes Esti-
mation Applied to Normal Mixture”, in Preprints of the
3rd European IEEE Workshop on Computer-Intensive
Methods in Control and Data Processing, Eds.: J. Ro-
jíček, M. Valečková, M. Kárný, K. Warwick, Prague,
September 1998, ÚTIA AV ČR, p. 77–82.

[12] Nagy, I., Kárný, M., Nedoma, P., Voráčová, Š.: “Bayesian
Estimation of Traffic Lane state”, International Journal of
Adaptive Control and Signal Processing, Vol. 17 (2003),
No. 1, p. 51–65.

[13] Kárný, M., Khailova, N., Böhm, J., Nedoma, P.: “Quan-
tification of Prior Information Revised”, International
Journal of Adaptive Control and Signal Processing, Vol. 15
(2001), No. 1, p. 65–84.

[14] Kulhavý, R., Zarrop, M. B.: “On General Concept of
Forgetting”, International Journal of Control, Vol. 58
(1993), No. 4, p. 905–924.

[15] Nagy, I.: “Estimation of Real Data with Dynamic Mix-
tures”, Tech. Rep., research report No. 2066, ÚTIA AV
ČR, Prague, 2002.

[16] Tesař, L., Quinn, A.: “Detection and Removal of
Outliers from Multidimensional AR Processes”, in Pro-
ceedings of Irish Signal and Systems Conference,
Maynooth, Ireland, August 2001.

34 ©  Czech Technical University Publishing House http://ctn.cvut.cz/ap/

Acta Polytechnica Vol. 46  No. 2/2006 Czech Technical University in Prague

Filter PE coefficient

Mixture filter 0.49

Standard filter No. 1 1.52

Standard filter No. 2 1.36

Table 3: PE coefficients for first big and subsequent small outliers


[17] Ko, Sung-Jea, Lee, Y. H.: “Center Weighted Median Fil-
ters and their Applications to Image Enhancement”,
IEEE Transactions on Circuits and Systems, Vol. 38,
No. 9, September 1991, p. 984–993.

[18] Tesař, L., Quinn, A.: “Method for Artefact Detection
and Suppressing Using Alpha-Stable Distributions”, in
Proceedings of ICANNGA Conference, Prague, Czech
Republic, March 2001.

[19] Cipra, T.: “Dynamic Credibility with Outliers”, Applica-
tions of Mathematics, Vol. 41 (1996), No. 2, p. 149–159.

Ing. Pavla Pecherková
e-mail: nemcova@utia.cas.cz

Doc. Ing. Ivan Nagy, CSc.
phone: +420 224 890 732
e-mail: nagy@fd.cvut.cz

Department of Applied Mathematics

Czech Technical University in Prague
Faculty of Transportation Sciences CTU
Na Florenci 25
110 00 Prague 1

Institute of Information Theory and Automation AV ČR
P.O.Box 18
182 08 Prague 8, Czech Republic

©  Czech Technical University Publishing House http://ctn.cvut.cz/ap/ 35

Czech Technical University in Prague Acta Polytechnica Vol. 46  No. 2/2006