Microsoft Word - 1_R.doc


HUNGARIAN JOURNAL 
OF INDUSTRIAL CHEMISTRY 

VESZPRÉM 
Vol. 37(1). pp. 59-67 (2009) 

PCA DRIVEN SIMILARITY FOR SEGMENTED UNIVARIATE TIME SERIES 

Z. BANKÓ, J. ABONYI  

University of Pannonia, Department of Process Engineering, P.O. Box 158. Veszprém H-8200, HUNGARY 
E-mail: abonyij@fmt.uni-pannon.hu 

 
Selection of the proper similarity measure is the cornerstone of all time series data mining task. In the recent years, many 
similarity measures have been introduced to fulfill the needs of chemical process engineering. These measures have been 
guided by data reduction methods due to the large amount of data. This data reduction can be done explicitly (by 
segmentation) as well as implicitly (by utilizing the latent variable space). Usually, the original multivariate data is 
projected into a single dimension with Principal Component Analysis (PCA) and segmentation is executed. However, the 
similarity measures which have been used to compare univariate, segmented representations of the original processes do 
not consider that the main information carried by the univariate representations is the correlation of the original variables. 
This paper introduces a PCA inspired similarity measure for these univariate segments. Finally, it is shown that the 
presented method can be considered as the logical extension of the Correlation Based Dynamic Time Warping (CBDTW) 
to univariate time series. 

Keywords: segmentation, Dynamic Time Warping, Principal Component Analysis, Piecewise Linear Approximation 

Introduction 

A time series is a sequence of values measured as a 
function of time. These kinds of data are widely used in 
the field of chemical process engineering, namely for 
process control, fault detection and analysis of process 
transitions. The increasing popularity of knowledge 
discovery and data mining tasks for discrete data has 
indicated the growing need for similarly efficient 
methods for time series data. These tasks share a 
common requirement: a similarity measure has to be 
defined between the elements of a given database. 
Moreover, the results of the data mining methods from 
simple clustering (partitioning the data into coherent but 
not predefined subsets) and classification (placing the 
data into predefined, labelled groups) to complex 
decision-making systems used for process control are 
highly dependent on the applied similarity measure.  

Related work 

The similarity of multivariate time series can be 
approached from two different perspectives. The first 
way is the application of metrics based warping measures 
such as Dynamic Time Warping (DTW) and Longest 
Common SubSequence (LCSS). These techniques are 
perfectly suitable for univariate tasks like speech 
recognition, where the analyzed process is represented 
by only one variable. In most cases, these methods can 
be easily generalized for the needs of the multivariate 
time series where the process depends on two or more 

variables. However, their application for correlated 
multivariate time series is often not as effective as it is 
expected.  

The direct comparison of the variables used by these 
approaches ignores the hidden process, i.e. the correlation 
between the process variables and this hidden process 
carries the real information in most process control task 
[1]. Hence, Principal Component Analysis (PCA) based 
similarity measures are used to overcome this problem. 
Krzanowski [2] defined the PCA similarity factor to 
measure the similarity between different data by 
comparing the hyperplanes (the dimensionality reduced 
latent variable spaces):  

p
UUUUtr

YXs pX
T

pYpY
T

pX
nnPCA

nnnn
)(

),( ,,,,=  

where: Xn – the first n-variable multivariate time series 
 Yn – the second n-variable multivariate time series 
 UXn,p and UYn,p – the matrices of eigenvectors 

which belong to the most important p ≤ n 
eigenvalues of covariance matrices of Xn and Yn, 
i.e. the two hyperplanes. 

 
The similarity factor has a geometrical explanation, 

because it measures the similarity between the two 
hyperplanes by computing the squared cosine values 
between all the combinations of the first p principal 
components from Xn and Yn: 

∑∑
= =

=
p

i

p

j
jinnPCA p

YXs
1 1

,
2cos

1
),( Θ  


60

where: Θi,j – the angle between the i
th principal component 

of Xn and the j
th principal component of Yn. 

 
The main advantage of Krzanowski's similarity factor 

is its optimal variable reduction property (from variance 
point of view) which makes it ideal for tasks with high 
number of variables. PCA similarity factor has also 
gained in popularity because of its outstanding feature 
which makes possible to recognize the direction of the 
changes in the distance between the variables, i.e. the 
rotation of the hyperplanes.  

On the other hand, PCA similarity factor weights all 
principal components equally, hence it may not capture 
well enough the degree of similarity between the 
sequences when only a few principal components explain 
most of the variance. Thus, it was natural to define a 
modified PCA similarity factor that weights each 
principal component by its explained variance. M. C. 
Johannesmeyer [3] defined this modified PCA similarity 
factor by weighting each principal component with its 
eigenvalue: 

∑

∑∑

=

= ==
p

i

Y
i

X
i

ji

p

i

p

j

Y
j

X
i

nnPCA
nn

nn

YXs

1

,
2

1 1

)(

Θcos)(

),(
λλ

λλ
λ  

where: λi
Xn and λi

Yn – the corresponding eigenvalues of 
the ith and jth principal component of Xn and Yn. 

 
This principle was developed by K. Yang [4] who 

presented the logical extension of PCA similarity factor 
and SλPCA called Eros (Extended Frobenius Norm): 

∑∑
==

==
n

i
ii

n

i

T
YXinnEros wiUiUwYXs nn

11
Θcos)()(),(  

where: Θi – the angle between the two corresponding 
principal components 

 wi – the weighting vector based on the eigenvalues 
of the sequences in the data set. 

 
Figure 1: Two corresponding principal components 
 
Generally speaking, the Eros measures the similarity 

of two multivariate time series by comparing the angle 
between the corresponding principal components and 
using the aggregated eigenvalues as weights, hence it 
takes into account the variance of each principal 
component. It has to be noted that Eros always computes 

the acute angle between the two principal components 
(eigenvectors). Therefore, as it is illustrated in Fig. 1, 
when the angle (α) between the two corresponding 
eigenvectors is not acute, the absolute value of it is 
taken and the similarity between the two corresponding 
eigenvectors is computed by using the acute angle (β). 
More specifically, the inner product of two normal 
vectors, a and b in Fig. 1, yields cos(α), while 
cos(β) = cos(π-α) = –cos(α) is needed. Therefore, the 
absolute value of cosine of the angle between the 
eigenvectors is taken, so that cos(α) is computed when 
α ≤ π/2, while -cos(α) is computed when α > π/2. 

The previously mentioned methods greatly improved 
the simple PCA similarity factor both in speed and in 
accuracy; however, they have not dealt with the biggest 
problem of every PCA related technique, i.e. PCA 
considers the time series as a whole but does not take 
into account the alterations of the correlation structure. 
This alternation affects the hyperplanes, therefore 
segmentation is required in most real-life applications to 
create homogeneous segments from correlation structure 
point of view. However, the segmentation raises another 
problem: Although in many real-life applications a lot 
of variables must be simultaneously tracked and 
monitored, most of the segmentation algorithms are used 
for the analysis of only one time-variant variable [5]. 
Hence, dimensionality reduction techniques are used, 
most likely PCA, to project the multivariate time series 
into the one dimensional space where any suitable 
univariate segmentation method such as Piecewise 
Aggregate Approximation (PAA) or Piecewise Linear 
Approximation (PLA) is executed. Finally, the segments 
are compared with a suitable similarity measure. 

Finding this suitable similarity measure is a difficult 
task. Besides the obvious Euclidean distance other, more 
advanced measures can be used such as DTW. It proved 
its adaptability and superiority over other similarity 
measures in wide range of time series applications from 
speech recognition [6] to fingerprint verification [7] and 
as final proof of this Yanikoglu won the Signature 
Verification Conference with a DTW based algorithm in 
2004 [8].  

Keogh and Pazzani presented modifications on 
DTW algorithm to handle PAA [9] and PLA [10] 
representations of univariate time series. Although these 
new algorithms can provide noticeably better results than 
Euclidean distance and speeds up the computationally 
expensive general DTW algorithm, they do not take the 
drift of the segments into account.  

Thus, the segmentation has to be framed in another 
way. Instead of compressing the multivariate data to a 
univariate time series applying PLA or PAA, the 
segmentation can be done in the multivariate space while 
the correlation is still considered. The authors introduced 
[11] two homogeneity measures as cost function for 
segmentation which are corresponding to the two typical 
applications of PCA models. The Q reconstruction error 
can be used to segment the time series according to the 
direct change of the correlation among the variables, 
while the Hotelling's T2 statistics can be utilized to 
segment the time series based on the drift of the center 
of the operating region.  

a

α
β

b


61

Based on these new segmentation methods a novel 
similarity measure, called Correlation Based Dynamic 
Time Warping was created [12]. It was proven that it 
outperforms all of the previously introduced correlation 
based similarity measures when the multivariate time 
series are highly correlated. This paper introduces the 
univariate version of CBDTW which can be used for 
PCA projected univariate time series. 

The rest of the paper is organized as follows. Section 
Background details the theoretical background of the 
proposed similarity measure, i.e. segmentation and DTW. 
In the next section the problems of the current DTW 
algorithms used for segmented representations are 
pointed out and the new similarity measure is presented. 
It is also discussed how it was derived from CBDTW. 
Section 4 conducts a detailed empirical comparison of 
the introduced PCA based method with currently used 
measures on verification databases widely used by the 
time series data mining community. Finally, validation 
is performed by clustering of temperature data from a 
sophisticated model of an industrial catalytic fixed bed 
tube reactor. 

Background  

An n-variable, m-element time series, Xn = [x1,x2,…,xn], is 
an m-by-n element matrix, where xi = [xi(1),xi(2),…,xi(n)]

T 
is the ith variable and xi(j) its j

th element. 
Xn(j) = [x1(j),x2(j),…,xn(j)] is the j

th sample of Xn. The 
similarity between Xn and Yn is denoted by s(Xn, Yn), 
where 0 ≤ s(Xn, Yn) ≤ 1, s(Xn, Yn) = s(Yn, Xn), and 
s(Xn, Xn) = 1. Obviously, the similarity is nothing more 
than a real number between zero and one which expresses 
the tightness of connection between the processes behind 
the time series. The closer the number is to one, the 
processes are treated more similar. In practice, the term 
distance or dissimilarity (d) is used instead of similarity. 
The value of the distance is given by a number ranged 
from zero to one. This can be associated with similarity: 
d(Xn, Yn) = 1 – s(Xn, Yn). 

Segmentation 

The ith segment of Xn is a set of consecutive time points, 
Si(a,b) = [Xn(a), Xn(a+1),…, Xn(b)]. The c-segmentation 
of time series Xn is a partition of Xn to c non-overlapping 
segments, ScXn = [S1(1,a),S2(a+1,b),…,Sc(k+1,m)]. In 
other words, a c-segmentation splits Xn to c disjoint time 
intervals, where 1 ≤ a and k ≤ m. 

The simplest but yet powerful segmentation technique 
for univariate time series is PAA. In this case, to reduce 
the m-length data from N, the time series are simply 
divided into N similar sized frames and each frame is 
represented by its mean value. Assuming that N is a 
factor of m, we get: 

∑
+−=

=
i

N
m

i
N
m

j

jx
m
N

ix
1)1(

11 )()(  

Besides filtering noise, PAA can compensate phase 
shifts of time axis and difference sampling rates of time 
series and the distance between two PAA representations 
can be chosen almost freely; however, it cannot handle 
“locally elastic” shifts of the time axis and it is not 
enough tight representation for most time series as it is 
shown in Fig. 2. 
 

Figure 2: The original signal (top) and its PAA (middle) 

and PLA representation (bottom) using 10 segments 
 

To correct these faults, more sophisticated methods 
are applied such as PLA which, however, arise the 
„segmentation problem”, i.e. how to segment a times 
series quickly and how to represent each segment tight 
enough. 

Segmentation can be framed in several ways [13], but 
its main goal is always the same: finding homogenous 
segments by the definition of a cost function, 
cost(Si(a,b)). This function can be any arbitrary function 
which projects from the space of the time series to the 
space of the non-negative real numbers. Usually, 
cost(Si(a,b)) is based on the distances between the actual 
values of the time series and the values given by a simple 
function f (constant or linear function, a polynomial of a 
higher but limited degree) fitted to the data of each 
segment: 

∑
=+−

=
b

al
nni lXflXdab

bas )))((),(((
1

1
)),((cost  

Thus, the segmentation algorithms simultaneously 
determine the parameters of the models and the borders 
of the segments by minimizing the sum of the costs of 
the individual segments: 

⎟
⎟
⎠

⎞
⎜
⎜
⎝

⎛
= ∑

=

c

i
i

c
X baSs N

1
))),((cost(min)(cost  

This segmentation cost of a time series can be 
minimized by dynamic programming, which is 


62

computationally intractable for many real datasets. 
Consequently, heuristic optimization techniques such as 
greedy top-down or bottom-up techniques are frequently 
used to find good but suboptimal c-segmentations: 
- Bottom-Up: Every element of Xn is handled as a 

segment. The costs of the adjacent elements are 
calculated and two elements with the minimum cost 
are merged. The merging cost calculation of adjacent 
elements and the merging are continued until some 
goal is reached. 

- Top-Down: The whole Xn is handled as a segment. 
The costs of every possible split are calculated and 
the one with the minimum cost is executed. The 
splitting cost calculation and splitting is continued 
recursively until some goal is reached. 

- Sliding Window: The first segment is started with 
the first element of Xn. This segment is grown until 
its cost exceeds a predefined value. The next segment 
is started at the next element. The process is repeated 
until the whole time series is segmented. 
 
All of these segmentation methods have their own 

specific advantages and drawbacks. Accordingly, the 
sliding window method is not able to divide up a 
sequence into predefined number of segments but this is 
the fastest method. The applied method depends on the 
acutal task. These heuristic optimization techniques 
were examined in detail through the application of 
Piecewise Linear Approximation [13]. It can be stated if 
real-time (on-line) segmentation is not required, the best 
result will be reached by Bottom-Up segmentation. 

While PAA represents an equal sized segment with 
only one value, PLA replaces the original data with not 
equally sized segments of straight lines, i.e. a PLA 
segment of x1 is a 4-taple:  

[ ]iiii ylxylxxrxxlxix )(,)(,)(,)()( 11111 =  
where: x1(xl)i and x1(xr)i – left and right time coordinates 

of the ith segment of x1 
 x1(yl)i and x1(yr)i – the values of x1 in x1(xl)i and 

x1(xr)i. 
 
Finding the optimal PLA a of time series and a 

suitable distance for this representation is a difficult 
task, that usually depends on the application. However, 
precison of the mostly used, traditional Euclidean 
distance (or any other Lp norm) can be significantly 
increased with the application of DTW. 

Dynamic Time Warping 

The traditional comparision approaches are rarely precise 
enough for the most applications. This is caused by the 
brittleness of the conventional similarity measures such 
as Euclidean distance. They are unable to handle the 
distortions in time axis, so these distortions almost 
randomly affect the distance between time series. The 
solution for this problem is the application of DTW 
which can “warp” the original time series (nonlinearly 

dilate or compress their time axes) to be similar in shape 
to the query series as much as possible.  

To align two univariate sequences (x1 and y1) with 
DTW, firstly a grid D have to be defined with size of 
the two time series (mx x my). Each cell of this matrix 
represents the actual distance between the appropriate 
indices of the two time series. In this step, any 
application-dependent distance like L1 and L∞ norms 
can be used but generally Euclidean distance is suggested 
because it allows of the efficient indexing of DTW. 
Considering this we have:  

2 2
11 ))()((),( jyixjiD −=  

Using grid D, many arbitrary mappings – called 
warping paths – can be created between x1 and y1. 
However, the construction of a warping path 
[p(1),p(2),…,p(l)] has to be restricted with the following 
constraints: 
- Boundary condition: The path has to start in D(1,1) 

and end in D(mx,my). 
- Monotonicity: The path has to be monotonous, i.e. 

always heading from D(1,1) to D(mx,my). If 
p(k) = D(i,j) and p(k+1) = D(i’,j’) then i’–i ≥ 0 and 
j’–j ≥ 0 . 

- Continuity: The path has to be continuous. If 
p(k) = D(i,j) and p(k+1) = D(i’,j’) then i’–i ≤ 1 and 
j’–j ≤ 1. 
 
To find the optimal warping path (the DTW distance 

of the two time series), every warping path has an 
assigned cost which is the sum of values of the affected 
cells divided by normalization constant K: 

⎟
⎟
⎟
⎟
⎟

⎠

⎞

⎜
⎜
⎜
⎜
⎜

⎝

⎛

=
∑
=

K

ip
yxd

l

i
DTW

1
11

)(
min),(  

The value of K depends on the application and in 
most cases this is the length of the path, but it can also 
be omitted. More information about the method of 
defining K and its significance can be found in [14]. 
Note that the Euclidean distance is a special case of 
DTW, i.e. we choose the path that is located on the 
diagonal of grid D and K = 1. 

 
Figure 3: Cumulative distance matrix D and the optimal 

warping path on it 


63

Obviously, the number of paths exponentially grows 
by the size of time series. Fortunately, the optimal path 
can be found in O(mxmy) time with the help of dynamic 
programming using cumulative distance matrix D. A 
cell of the cumulative matrix contains the sum of the 
appropriate cell value in matrix D and the minimum of 
the three cells from where the cell can be reached: 

⎥
⎥
⎥

⎦

⎤

⎢
⎢
⎢

⎣

⎡

+−−
+−
+−

=
),(D)1,1(D

),(D)1,(D
),(D),1(D

min),(D
jiji

jiji
jiji

ji  

The DTW distance between the two time series can 
be found in D(mx,my).  

Warping the representations 

DTW can be applied easily on the PAA representations 
of univariate time series. Each segment is represented 
by its mean, thus the DTW algorithm is the same as for 
the original time series but the computational time is 
lowered to O((m/N)2).  

Warping the PLA representation is much more 
difficult. Vullings et al. [15] were the first to use PLA 
based DTW on ECG data while Keogh and Pazzani [10] 
gave a generalized distance between PLA segments 
which can be used to fill matrix D: 

2

11 2
)()(

2
)()(

))(),(( ⎟
⎠
⎞

⎜
⎝
⎛ +

−
+

= iiii
yryylyyrxylx

jyixd  

As it can be seen this distance arises from PAA: it 
compares the means of the corresponding segments but 
it utilizes the tighter PLA representation. However, this 
tighter representation, i.e. the different length of the 
segments, introduces a new problem. The normalization 
constant K usually based on the length of the warping 
path but PLA, contrary to PAA, does not generate 
equidistant segments. Thus, Keogh and Pazzani 
suggested to recursively sum an additional variable on 
the warping path which stores the lengths of the visited 
segments.  

PCA driven similarity of PLA representation 

As it was mentioned before many problems arise when 
one would like to use data mining alogrithms on highly 
correlated mutivariate time series data. The high amount 
of data requires some reduction techniques. In chemical 
process engineering, usually PCA is used to create 
univariate time series from the multivariate data because 
PCA preserves the correlation of the original time series 
which is definitely an important factor to consider. 

The generated univariate time series make it possible 
to use traditional segmentation techniques which are 
often required for two reasons: first, the highly 
appreciated DTW algorithm computational extensive thus 
further data compression is advised and second, process 
transitions and frauds can be revealed by segmentation. 

Considering the previously mentioned, PAA is not 
suitable for our purposes because the segments are 
equidistant and they are represented by their means, 
hence PAA representation lost the direction of the latent 
variable in each segment. PLA generates a much tighter 
representation and it can preserve the direction 
information. In addition, it can be seen as the one 
dimensional version of PCA based segmentation presented 
in [12,13]. The authors used the Q reconstruction error, 
i.e. the goal of the segmentation was to minimize the 
sum of the squared Euclidean distances between the 
original and the reconstructed variables in each segment. 
PLA does the same, it minimizes the squared Euclidean 
distance between the original data point and their 
reconstructed pair (the closest point on the PLA 
segment). This reconstruction error is shown in case of 
PLA segmentation in Fig. 4. 

 
X

1
(t)

time

Q

 
Figure 4: Time series x1 (black dots) and its 

reconstruction using the PLA representation (gray dots) 
 
Unfortunately, the currently used DTW based 

measures do not take into consideration that the PLA 
created straight line is the latent variable of each segment 
while the mean of each segment are used. In [12], it is 
suggested to use any of the PCA based similarity 
measures to compare the hyperplanes of two segments 
when the segmentation was driven by PCA. This leads 
to represent each PLA segment with its angle of slope: 

⎟⎟
⎠

⎞
⎜⎜
⎝

⎛
−
−

=
ii

ii

xlxxrx
ylxyrx

ix
)()(
)()(

atan)(1  

Please note, none of the segments can be 
perpendicular to the time axis, hence no further 
restrictions are required. Now, any of the above 
presented PCA based similarity measures can be used to 
compare the segments based on the slope information. 
The interested reader might notice that the only 
difference between the reviewed PCA based similarity 
measures how they weights the angles between the 
hyperplanes. In the one dimensional space all of them 
can be reduced to the same equation: 

⎟
⎟

⎠

⎞

⎜
⎜

⎝

⎛

⎟
⎟

⎠

⎞

⎜
⎜

⎝

⎛

+

−
=

)()(1

)()(
atancos))(),((

11

11
11 iyix

iyix
iyixd  

where |cos(…)| – can be replaced by cos2(…). 
 

64

This would be a perfect measure for hyperplanes of 
a PCA projection, where the hyperplanes are the 
coordinates of the projected space, thus their signs are 
not only meaningless but considering them would destroy 
the precision. However, the PLA representation of a 
segment of a univariate time series (i.e., its latent 
variable) is also represents the original time series in the 
same space, thus the sign information cannot be 
neglected. For example, if one compares x1(i) = 57 (~89°) 
and y1(i) = –57 (~91°) as hyperplanes they are almost 
the same (only 2° is the difference); however, they are 
completely different from the time series point of view 
(increasing and decreasing trends). So, the basis distance 
of DTW should be a function which can measure the 
difference between the signed slopes. For this paper, the 
simple squared Euclidean distance was chosen: 

2
1111

))()(())(),(( jyixjyixd −=  

The last thing to handle is the K constant. If it can be 
treated that preprocessing steps are properly executed 
(i.e. data is filtered and it is proved that no additional 
noise is added to it), the role of the K constant is 
changed. The proper preprocessing ensures that all of 
the segments are detected due to the underlying process 
and they are not detected due to the noise; moreover, the 
minimum number of elements of a segment can also be 
defined. Thus, an additional warping is achieved if the 
weights of the segments are not considered and the 
length of the warping path can be used as the value of K 
or it can be omitted as it is done in this paper. 

Validation 

According to [16], the presented similarity measure was 
compared to other methods using the free and widely 
used datasets and classification algorithm of the UCR 
Time Series Classification/Clustering Homepage [17]. 
The datasets were kindly provided by Mr. Keogh on 
March 7, 2007. Please note, for easier reconstruction of 
the results, the algorihms were not executed on the four 
biggest dataset: Face (all), Two Patterns, Wafer, Yoga.  

In this test, the suggested 1-NN classification 
algorithm was executed on the databases and the error 
rate (proportion of the faulty classified time series) of 
each database for every measure was recorded in Table 1 
and Table 2. As a reference, the non-segmented time 
series was also compared with Euclidean distance and 
DTW, their results can be found in the third and fourth 
columns.  

The first four colored columns contain the results of 
the currently used similarity measures. The time series 
were segmented with PAA and PLA to 30 pieces using 
Bottom-Up technique and Euclidean or the reviewed 
DTW distance was applied. Please note, this means that 
the mean of the segments were used to compare two 
segments irrespectively of the applied segmentation 
method. 

 
A
ng

le
 3

0 
Se

gm
en

ts
 

D
T

W
 

0.
37

 
0.
64

 
0.
63

 
0.
59

 
0.
11

 
0.
19

 
0.
39

 
0.
15

 
A
ng

le
 3

0 
Se

gm
en

ts
 

E
uc

lid
ea

n 

0.
46

 
0.
71

 
0.
67

 
0.
64

 
0.
32

 
0.
23

 
0.
72

 
0.
46

 
PL
A

 3
0 

Se
gm

en
ts

 
D

T
W

 
0.
40

 
0.
66

 
0.
50

 
0.
04

 
0.
29

 
0.
20

 
0.
30

 
0.
38

 
PL
A

 3
0 

Se
gm

en
ts

 
E

uc
lid

ea
n 

0.
35

 
0.
76

 
0.
53

 
0.
13

 
0.
18

 
0.
20

 
0.
39

 
0.
51

 
PA
A

 3
0 

Se
gm

en
ts

 
D

T
W

 
0.
36

 
0.
43

 
0.
53

 
0.
02

 
0.
25

 
0.
21

 
0.
23

 
0.
22

 
PA
A

 3
0 

Se
gm

en
ts

 
E

uc
lid

ea
n 

0.
36

 
0.
41

 
0.
50

 
0.
07

 
0.
25

 
0.
13

 
0.
17

 
0.
21

 
N
o 

W
ar

pi
ng

 
W

in
do

w
 

D
T

W
 

0.
31

 
0.
40

 
0.
50

 
0.
00

 
0.
18

 
0.
23

 
0.
17

 
0.
17

 
E
uc

lid
ea

n 
D

is
ta

nc
e 

0.
37

 
0.
39

 
0.
47

 
0.
15

 
0.
25

 
0.
12

 
0.
22

 
0.
22

 
T
im

e 
Se

ri
es

 
L

en
gt

h 27
0 

17
6 

47
0 

12
8 

28
6 96

 
35
0 

46
3 

N
um

be
r o

f 
cl

as
se

s 5
0 37

 5 3 2 2 4 7 

N
am

e 

50
W

or
ds

 
A
di

ac
 

B
ee

f 

C
B

F 

C
of

fe
e 

E
C

G
 

Fa
ce

 (f
ou

r)
 

Fi
sh

 
Ta
bl

e 
1:

 R
es

ul
ts

 o
f 

th
e 

U
C

R
 t

im
e 

se
ri

es
 c

la
ss

if
ic

at
io

n 
al

go
ri

th
m

 o
n 

th
e 

fi
rs

t 
ei

gh
t 


65

A
ng

le
 3

0 
Se

gm
en

ts
 

D
T

W
 

0.
05

 
0.
44

 
0.
67

 
0.
37

 
0.
29

 
0.
22

 
0.
67

 
0.
01

 
A
ng

le
 3

0 
Se

gm
en

ts
 

E
uc

lid
ea

n 

0.
05

 
0.
39

 
0.
78

 
0.
40

 
0.
70

 
0.
51

 
0.
67

 
0.
17

 
PL
A

 3
0 

Se
gm

en
ts

 
D

T
W

 
0.
07

 
0.
20

 
0.
44

 
0.
30

 
0.
44

 
0.
28

 
0.
01

 
0.
03

 
PL
A

 3
0 

Se
gm

en
ts

 
E

uc
lid

ea
n 

0.
07

 
0.
31

 
0.
47

 
0.
33

 
0.
52

 
0.
44

 
0.
01

 
0.
19

 
PA
A

 3
0 

Se
gm

en
ts

 
D

T
W

 
0.
07

 
0.
26

 
0.
30

 
0.
13

 
0.
42

 
0.
21

 
0.
04

 
0.
10

 
PA
A

 3
0 

Se
gm

en
ts

 
E

uc
lid

ea
n 

0.
09

 
0.
25

 
0.
34

 
0.
13

 
0.
47

 
0.
21

 
0.
01

 
0.
29

 
N
o 

W
ar

pi
ng

 
W

in
do

w
 

D
T

W
 

0.
09

 
0.
13

 
0.
27

 
0.
13

 
0.
41

 
0.
21

 
0.
01

 
0.
00

 
E
uc

lid
ea

n 
D

is
ta

nc
e 

0.
09

 
0.
25

 
0.
43

 
0.
13

 
0.
48

 
0.
21

 
0.
12

 
0.
24

 
T
im

e 
Se

ri
es

 
L

en
gt

h 15
0 

63
7 

31
9 

57
0 

42
7 

12
8 60

 
27
5 

N
um

be
r o

f 
cl

as
se

s 

2 2 7 4 6 15
 6 4 

N
am

e 

G
un

-P
oi

nt
 

L
ig

ht
ni

ng
-2

 
L
ig

ht
ni

ng
-7

 
O
liv

eO
il 

O
su

 L
ea

f 

Sw
ed

is
h 

L
ea

f 

Sy
nt

he
tic

 C
on

tr
ol

 
T
ra

ce
 

‘Angle’ denotes the presented similarity measure. 

The time series were segmented to 30 pieces again but 
now the slope of each segment was considered. The 
provided representations were also compared with 

Euclidean distance and DTW. The best results are 
marked with dark grey for each database.  

As it can be seen the newly presented method kept 
up with the expectations. However, some notes have to 
be made on the results. One can realize that the reduction 
obtained by the segmentation was only about one order 
of magnitude which ensures the tight representation 
even with PAA and 30 segments is too much for most 
databases when PLA is used. Moreover, all of the 
databases require different number of segments to get 
the best results.  

The aim of this test was to show that it really makes 
sense to use the slope of a segment instead of its mean 
when the segment is provided by PLA. Obviously, the 
best PAA and PLA representation can be found for all 
distance measure but the idea behind the creation of 
such a test is to provide a unified test environment for 
all measures and to prevent the researcher from “over 
optimization”.  

Validation on an industrial fixed bed tube reactor 

The proposed PCA driven similaity measure has also 
been applied for clustering of temperature data from a 
sophisticated model of an industrial catalytic fixed bed 
tube reactor.  
 

coolant in

coolant out

reagents

product

x = 0

x = L

( )Wout,W B,T

( )Win,W B,T

( )out,Gout,Giout,Gout,G p,c,B,T

( )in,Gin,Giin,Gin,G p,c,B,T  
Figure 5: The simplified scheme of the studied reactor 

 
The studied vertically built up reactor contains a 

great number of tubes with catalyst. Highly exothermic 
reaction occurs as the reactants rising up the tubes pass 
the fixed bed of catalyst particles and the heat generated 

Ta
bl

e 
2:

 R
es

ul
ts

 o
f t

he
 U

C
R

 ti
m

e 
se

ri
es

 c
la

ss
if

ic
at

io
n 

al
go

ri
th

m
 o

n 
th

e 
se

co
nd

 e
ig

ht
 d

at
as

et
s 


66

by the reaction escapes through the tube walls into the 
cooling water. Due to this highly exothermic reaction 
which takes place in the catalyst bed the reactor very 
sensitive for the development of reactor runaway. 

Reactor runaway means a sudden and considerable 
change in the process variables. The development of 
runaway is in very close relationship with the stability 
of reactor/model. Runaway has two main important 
aspects. In one hand runaway forecast has a safety 
aspect, since it is important for avoiding the damage the 
constructional material or in the worst case scenario the 
explosion of reactor. On the other hand, runaway has a 
technology aspect, since the forecast of the runaway can 
be used for avoiding the development of hot spots in 
catalytic bed. The selection of operation conditions is 
important to avoid the development of reactor runaway 
and to increase the lifetime of catalyst at same time. The 
worked out mathematical model of the studied reactor has 
been presented in [5]. The model has been implemented 
in MATLAB and solved with a low order Runge-Kutta 
method.  

 
0 1 2 3 4
250

300

350

400

Reactor length [m]

T
e
m

p
e
ra

tu
re

 [
K

]

0 1 2 3 4
200

400

600

800

Reactor length [m]

T
e
m

p
e
ra

tu
re

 [
K

]

 
Figure 6: Normal (upper graph) and runaway (lower 

graph) profiles of the studied reactor 
 
The obtained simulator was applied to calculate two 

kinds of profiles. The inlet conditions were set to provide 
five profiles which correspond to the normal working 
conditions and other five profiles which describe the 
development of reactor runaway. The presented similarity 
measure has been used to classify these profiles. The 
result of clustering can be seen in the dendogram of  
Fig. 7. 

1 3 5 2 4 6 7 10 9 8 

D
is

ta
n
ce

   Normal                          Runaway  
Figure 7: Clustering result of the five Normal (number 

1-5) and five Runaway (number 6-10) profiles 

Conclusion 

We presented a new similarity measure for segmented 
univariate time series by considering the PLA 
representation of a segment as the latent variable. The 
proposed similarity was validated on many real world 
dataset used by data mining community and on the 
temperature profiles generated by a sophisticated model 
of an industrial reactor. Both evaluations show that it is 
worth to consider representing the segments by their 
slopes instead of their means and using this feature for 
the comparison. However, there is no decided difference 
between the two methods, thus we intend to combine 
these approaches in the future. 

ACKNOWLEDGEMENT 

János Abonyi is grateful for the support of the Bolyai 
Research Fellowship of the Hungarian Academy of 
Sciences. The financial support of the TÁMOP-4.2.2-
08/1/2008-0018 project is gratefully acknowledged. 
 

REFERENCES 

1. KOURTI T.: Application of Latent Variable Methods 
to Process Control and Multivariate Statistical 
Process Control in Industry, International Journal of 
Adaptive Control and Signal Processing 19 (2005), 
213-246 

2. KRZANOWSKI W.: Between-groups comparison of 
principal components, Journal of the American 
Statistical Society 74 (1979), 703-707 

3. JOHANNESMEYER M. C.: Abnormal Situation 
Analysis Using Pattern Recognition Techniques and 
Historical Data, M.Sc. Thesis, University of 
California, Santa Barbara, CA (1999) 


67

4. YANG K., SHAHABI C.: A PCA-based Similarity 
Measure for Multivariate Time Series, Proceedings 
of the 2nd ACM International Workshop on 
Multimedia Databases (2004), 65-74 

5. KIVIKUNNAS S.: Overview of Process Trend Analysis 
Methods and Applications, ERUDIT Workshop on 
Applications in Pulp and Paper Industry (1998), 
ERUDIT 

6. SAKOE H., CHIBA S.: Dynamic Programming 
Algorithm Optimization for Spoken Word 
Recognition, IEEE Transactions on Acoustics, 
Speech, and Signal Processing (1978) 

7. VAJNA Z. M. K.: A Fingerprint Verification System 
Based on Triangular Matching and Dynamic Time 
Warping, IEEE Trans. Pattern Anal. Mach. Intell. 
22 (2000), 1266-1276 

8. KHOLMATOV A., YANIKOGLU B. A.: Biometric 
Authentication Using Online Signatures, ISCIS 
(2004) 

9. KEOGH E. J., PAZZANI M. J.: Scaling up Dynamic 
Time Warping for Datamining Applications, 
Knowledge Discovery and Data Mining (2000) 

10. KEOGH E. J., PAZZANI M.: Scaling up Dynamic 
Time Warping to Massive Datasets, 3rd European 
Conference on Principles and Practice of Knowledge 
Discovery in Databases (1999) vol. 1704 

11. ABONYI J., FEIL B., NÉMETH S., ÁRVA P.: Principal 
Component Analysis Based Time Series 
Segmentation, IEEE International Conference on 
Computational Cybernetics (2005) 

12. BANKÓ Z., ABONYI J.: Correlation Based Dynamic 
Time Warping of Multivariate Time Series, 
Computational Statistics & Data Analysis (In press) 

13. KEOGH E. J., CHU S., HART D., PAZZANI M. J.: An 
Online Algorithm for Segmenting Time Series, 
ICDM (2001) 

14. SAKOE H., CHIBA S.: Dynamic Programming 
Algorithm Optimization for Spoken Word 
Recognition, IEEE Transactions on Acoustics, 
Speech, and Signal Processing (1978)  

15. VULLINGS H. J. L. M., VERHAEGEN M. H. G., 
VERBRUGGEN H. B.: ECG Segmentation Using 
Time-Warping, Proceedings of the Second 
International Symposium on Advances in Intelligent 
Data Analysis, Reasoning about Data (1997) 

16. KEOGH E. J., KASETTY S.: On the Need for Time 
Series Data Mining Benchmarks: a Survey and 
Empirical Demonstration, Proceedings of the 8th 
ACM SIGKDD International Conference on 
Knowledge Discovery and Data Mining (2002) 

17. KEOGH E. J., XI X., WEI L., RATANAMAHATANA C. 
A.: The UCR Time Series Classification/Clustering 
Homepage, 
http://www.cs.ucr.edu/~eamonn/time_series_data/, 
Riverside CA. University of California - Computer 
Science and Engineering Department (2006) 

18. VARGA T., SZEIFERT F., RÉTI J., ABONYI J.: Analysis 
of the runaway in an industrial heterocatalytic 
reactor, Computer-Aided Chemical Engineering 24, 
(2007), 751-756