CHEMICAL ENGINEERING TRANSACTIONS  
 

VOL. 76, 2019 

A publication of 

 
The Italian Association 
of Chemical Engineering 
Online at www.aidic.it/cet 

Guest Editors: Petar S. Varbanov, Timothy G. Walmsley, Jiří J. Klemeš, Panos Seferlis 
Copyright © 2019, AIDIC Servizi S.r.l. 

ISBN 978-88-95608-73-0; ISSN 2283-9216 

Fault Detection Analysis Before and After Dynamic Model 

Reforming on the Benchmark Tennessee Eastman Process 

Bo Chen, Zhu Wang, Xiong-Lin Luo* 

Department of Automation, China University of Petroleum Beijing, 102249, China  

luoxl@cup.edu.cn 

To make the simulation object closer to the actual chemical process, the author improved the model of 

Tennessee Eastman (TE) process in their previous work. In this paper, based on above improved model, to 

further analyse its effect on the fault detection performance, a research on fault data of the improved model and 

original model is done, based on principle component analysis (PCA), the detection rate of Hotelling's T2 statistic, 

Q statistic and support vector machine (SVM) integrated particle swarm optimization (PSO) approach are used 

to reflect their detection performance on the two models. From the detection rates, the detection performance 

gets worse when detecting the fault of the improved model. The analysis indicates that when the detection 

methods are used to detect the faults in actual chemical process, the detection performance will be influenced 

and may not be as effective as described in literature. 

1. Introduction 

With the rapid development of modern industrial technology, the structure of various large-scale automation 

systems is becoming more and more complex; fault detection of systems has always been the focus of academic 

attention (Amin et al., 2010). In fact, the fault detection methods or control algorithms are always tested on the 

simulation models. However, there are always a large number of assumptions which are taken as precondition 

when building the simulation models. Tennessee Eastman (TE) process is proposed according to an actual 

chemical process of Eastman chemical company in 1993, its simulation model is widely used as a test object, 

however, modelling the TE process presented a trade-off between rigor and model stiffness due to the vapour 

phase dynamics during pressure change (Downs and Vogel, 1993), in other words, many detection methods 

are questionable, because the TE model is also based on certain assumptions and simplifications. Thus, the 

author (Chen et al., 2018) improved the model to restore the vapour phase dynamics during pressure change, 

the aim was to make TE model sufficiently complex and realistic to improve the quality of control algorithms and 

fault detection and diagnosis methods. In fact, people always compare different detection methods on the same 

one model, there is hardly any literature discussing the performance of the same detection methods on model 

and actual process. 

Thus, in this paper, based on the previous work, to analyse the specific differences of fault detection method on 

actual process and simplified model, in other words, the aim is to study whether the previous fault detection 

method has the same performance or not when the model is sufficiently complex and realistic, a research on 

fault data of the improved and original model is done. First, a control structure that can stable the system is used 

to produce the data, the data sets of normal condition are used as training sets, and the fault data are used as 

testing sets. Then, to avoid the dimension disaster and decrease calculation load, principal component analysis 

(PCA) is used to deal with the data sets, the Hotelling's T2 statistic and the Q statistic are used to detect the 

faults, their correct classification rates (CCR) and false alarm rate (FAR) are used to reflect their detection 

performance on the two models. The detection results indicate that when the model is more complex and 

realistic, the fault detection methods get worse, to make above results more reliable, the SVM integrated PSO 

approach in recent literature is used too. Finally, the results show when the detection methods are used to 

detect the faults in actual chemical process, the detection performance may not be as effective as described in 

literature. 

643

 
                                                                                                                                                                 DOI: 10.3303/CET1976108 
 
 
Paper Received: 01/03/2019; Revised: 04/04/2019; Accepted: 04/04/2019 
Please cite this article as: Chen B., Wang Z., Luo X.-L., 2019, Fault Detection Analysis Before and After Dynamic Model Reforming on the 
Benchmark Tennessee Eastman Process, Chemical Engineering Transactions, 76, 643-648  DOI:10.3303/CET1976108 
  

2. Preliminaries 

2.1 TE process 

TE process is a plant-wide process control problem, it is widely used as a test problem, and it contains eight 

components. The process includes five major units: a reactor, a condenser, a separator, a recycle compressor 

and a product stripper. The process has 53 monitoring variables including 41 measurements and 12 

manipulated variables, once a fault is taken on, all variables will be influenced. More details can be found in 

(Downs and Vogel, 1993). The process flow diagram of TE process is shown in Figure 1. However, the original 

model simplifies the vapour phase dynamics during pressure change, there is only one temperature variable 

and the temperature within the whole device is represented by liquid temperature, the synchronization between 

vapour and liquid dynamic does not accord with practical process, thus, as mentioned in the introduction, the 

author (Chen et al., 2018) improved the TE model by taking into account the vapour phase energy balance, then 

the fast changes of pressure and vapour temperature are restored, in this way, the improved model can reflect 

the dynamic behaviour of vapour and liquid phase better and it also presents the differences in physical 

properties of vapour and liquid phase which is ignored in the original model, it indicates that the responses of 

some variables are also different from that of the original model and the improved model is closer to the actual 

process. 

 
Figure 1: Process flow diagram of TE process including a control structure 

2.2 PCA based Hotelling'sT2 statistic and Q statistic 

PCA is a dimension reduction technique, it projects data from a high dimensional data space to a lower 

dimensional subspace, and transforms a set of correlated variables into a set of uncorrelated variables, it is 

described briefly below. Set a data set X with n observations and m process variables denoted as a matrix 

XT=[x1,...,xn]∈Rmxn, x∈Rm, then the covariance matrix S=XTX/(n-1) is obtained. Then, singular value 

decomposition (SVD) is applied to S and the loading matrix P=[Ppc,Pres] ∈Rmxm is generated. Finally, the 

observations in X are projected to the lower-dimensional space ZpcT=PpcTXT. The detailed can be found in 

literature (Wang and Yin, 2015). Hotelling's T2 statistic and Q statistic are typical methods to monitor process; 

T2 measures the magnitude of variations that are inside the PCs, Q statistic measures variability that breaks the 

normal process correlation, which often indicates an abnormal situation. Based on PCA, the T2 statistic and Q 

statistic can be calculated, their confidence limits are calculated with a level of significance, α. More details can 

be found in (Yin et al., 2012). 

CWR

CWS

1

2

3

XB

XC

XF

A/B/C

A
N

A
L

Y
Z

E
R

PI

SC

CWS

CWR

XA

Purge
Compressor

FI

JI

XD

XE

XF

XG

XH

A

N

A

L
Y
Z

E

R

A
N

A
L

Y
Z
E

R

Product

Stm

Cond

6

4

11

7

13

8 9

5

10

12
Reactor

Condenser vap/liq
separator

Stripper

XB

XC

XD

XE

XF

XG

XH

XA

XD

XE

FI

LI

TI

TI

FI

FI

LI

FI

TI

FI     

LC

17

FC

4

LI
LC

7

TI

TC

18

TC

10

PI

PC

6

TI

TC

16

FC

9

A

D

E

FI

FI

FI

    
XC

13
FC

3

FC

1

FC

2

XC

14

XC

15

FC

5

PI

LC

8

14

TC

12

TC

19

644


2.3 PSO and SVM 

SVM is widely used in classification task and a lot of scholars have studied it. Gaussian radial basis function 

(RBF) kernel SVM is adopted in this paper. The objective function is as following. 

 
m
2

i
w,b i=1

1
w +C ξ

2
min 

 
(1) 

 
T

i i i
s.t. y (w x +b) 1-ξ  (2) 

 
i
ξ 0  (3) 

where (xi,yi) donates the training data, i=1,...,m, xi∈Rn and y∈{-1,1}, the parameter C is a penalty parameter of 

error term, it is represented by the parameter ξi.  

The PSO algorithm was initially inspired by the regularity of bird cluster activities, and then a simplified model 

based on swarm intelligence was established. Set the Xi=(xi1,xi2,...,xin)(i=1,..,m) donates the location of the ith 

particle, Pi=(pi1,pi2,...,pin) donates the best position of the ith particle, Pg=(pg1,pg2,...,pgn) donates the best position 

in all m particles, Vi=(vi1,vi2,...,vin) donates the pace of the ith particle moving to another position. The particles 

are moved according to the following equations. 

 
i i 1 1 i i 2 2 g i

V (k+1)=wV (k)+c r (P -X (k))+c r (P -X (k))
 

(4) 

 
i i i

X (k+1)=X (k)+V (k+1)  (5) 

where k is the number of iterations, w is inertia weight. If w is chosen appropriately, the number of iterations 

required can be small, c1 and c2 are acceleration constants, r1 and r2 are defined randomly between 0 and 1. 

The PSO is used to obtain the best parameters of SVM. Both Liu (2016) and Li et al. (2016) used the PSO to 

optimize parameters. Zhang and Guo (2016) used the PSO-SVM for diagnosis and the result indicated that it is 

an effective method.  

3. Results and discussion 

In this section, to study the detection performance of the above detection methods on two models, the training 

data sets and the testing data sets from the original model and the improved model of TE process are collected, 

respectively. Because the TE process is open-loop unstable, the system must be operated under closed loop, 

the control structure is shown in Figure 1. The training data sets and the testing data sets have 52 monitoring 

variables, including 41 measurements and 11 manipulated variables, in which the agitation speed is not included 

because it is not manipulated; Table 1 shows the final steady state values of 11 manipulated variables, it 

indicates that their final steady state values are almost exactly the same. The data set of normal condition is 

used as the training data set, it has 500 observation samples, and the testing data set has 960 samples, for the 

fault data, only the first 160 samples are in normal status. The FAR and CCR are used to evaluate the detection 

performance. When one of the test data exceeds the threshold, it is identified as a fault.  

Table 1: The final steady state values of 11 manipulated variables in two models 

MV  OM (%) IM (%) MV  OM (%) IM (%) MV  OM (%) IM (%) MV  OM (%) IM (%) 

1 63.1 63.1 4 61.2 61.2 7 38.1 38.1 10 41.1 41.1 

2 

3 

54.0 

24.6 

54.0 

24.6 

5 

6 

22.2 

40.1 

22.2 

40.1 

8 

9 

46.5 

47.4 

46.5 

47.5 

11 

 
18.1 18.1 

 
where MV is manipulated variables, OM is the original model, IM is the improved model. 

The study includes 4 cases, the training sets and testing sets will come from two models respectively, as shown 

in Table 2, case 1: the training sets comes from the original model (Training sets 1), the testing sets comes from 

the original model (Testing sets 1); case 2: the training sets comes from the original model, the testing sets 

comes from the improved model (Testing sets 2); case 3: the training sets comes from the improved model 

(Training sets 2), the testing sets comes from the original model; case 4: the training sets comes from the 

improved model, the testing sets comes from the improved model. Case 1 is used to study the detection 

performance on the simplified model, case 4 is used to study the detection performance on the complex model, 

645


the other two cases (case 2 and case 3) aim to show the difference of two models. Their FAR or CCR will be 

compared, they are calculated by Eq(6). 

Table 2: Detail information of 4 cases 

Case  Training sets 1  Testing sets 1 Training sets 2  Testing sets 2 

Case 1 √ √   

Case 2 √   √ 

Case 3  √ √  

Case 4   √ √ 

 
100%   100%
No. of normal samples identified as fault  No. of correctly classified samples  

FAR = CCR =
total No. of normal samples total samples

   (6) 

First, the normal condition is tested to reflect the FAR. As shown in Figure 2, Figure 2a and Figure 2d show that 

they can detect normally, only a few number of data points exceed the threshold, their FARs are close to 5%, 

which accords with the level of significance (α=0.05), as shown in Table 3, the FAR between case 1 and case 

4 has little difference (the FAR of case 4 is slightly higher), because no faults occurs, all variables fluctuate near 

their steady-state values. Figure 2b and Figure 2c show that when the training set and the testing set come from 

two models respectively, the FAR is 100%. In fact, even though they have the same final steady state values, 

after adding the vapour phase energy balance equation and the exist of white noise, the correlation between 

variables has been changed and is reflected by the data, in other words, after trained by PCA, their covariance 

matrices are different too, their models are also different, thus all of the data points exceed the threshold. 

Table 3: The confidence limits and false alarm rates of T2 

Case  Confidence limit of T2 FAR of T2 (%) Confidence limit of Q FAR of Q (%)  

Case 1 11.3 3.65 7.96 3.65  

Case 2 11.3 100 7.96 100  

Case 3 

Case 4 

7.92 

7.92 

100 

7.92 

13.1 

13.1 

100 

5.21 

 
0 200 400 600 800 1000
0

10

20

30

 
T
2

Samples

 Normal condition samples

 Threshold

a

0 200 400 600 800 1000
0

10

20

30
d  Normal condition samples

 Threshold

 
T
2

Samples

0 200 400 600 800 1000
0

4000

8000

12000

16000

20000

T
2

b  Normal condition samples

 Threshold

 
 T
2

Samples

0 200 400 600 800 1000
0

100

200

300
c  Normal condition samples

 Threshold

 
Samples
 

Figure 2: PCA based T2 plots of normal condition in 4 cases, a: case 1, b: case 2, c: case 3, d: case 4 

The analysis of FAR indicates that the PCA based T2 and Q statistic can be used to detect the fault of two 

model, but case 2 and case 3 cannot be used to compare the detection performance, thus in the next section, 

only the case 1 and case 4 are used to reflect the CCR.  

646


A fault occurs, such as fault 4 (reactor cooling water inlet temperature). The training data have 500 samples of 

normal condition; the testing data have 960 samples of fault 4. The T2 and Q plot are illustrated in Figure 3. As 

shown in Figure 3, in case 4, T2 has more data points that are not classified accurately, a lot of samples that 

should be identified fault samples are identified normal samples. Their CCR are shown in Table 4, their CCR of 

T2 are 98.4 % and 60.8 % respectively, the CCR of Q in case 4 is also lower, it is obvious that the detection 

performance is worse when detecting the fault in case 4. 

Table 4: The CCR of T2 and Q statistic for fault 4 

Case  CCR of T2 (%)  CCR of Q (%)  

Case 1 98.4  99.5  

Case 4 60.8  98.6  

 
0 200 400 600 800 1000
0

5x10
3

1x10
4

2x10
4

2x10
4

 
T
2

Samples

 Fault 4 samples

 Threshold

a

0 200 400 600 800 1000
0

5x10
3

1x10
4

2x10
4

d  Fault 4 samples

 Threshold

 
Q

Samples

0 200 400 600 800 1000
0

1x10
5

2x10
5

3x10
5

4x10
5

5x10
5

T
2

b  Fault 4 samples

 Threshold

 
Q

Samples

0 200 400 600 800 1000
0

100

200

300
c  Fault 4 samples

 Threshold

 
Samples
 

Figure 3: PCA based T2 and Q plots of fault 4 in case 1 and case 4, a and b: case 1, c and d: case 4 

To avoid the chance of above detection method, the PSO-SVM is used to diagnose the fault 4 too. The training 

data set considered here has 1460 samples including 500 of normal condition, 480 of fault 4 and 480 of fault 5 

(condenser cooling water inlet temperature), the fault 4 is the testing data set including 960 samples. Before 

detecting, the data are processed by PCA. The prediction results are shown in Figure 4. As shown in Figure 4a, 

the samples which are predicted to be from normal condition are labeled as ‘0’, while the samples predicted 

from fault 4 are labeled as ‘1’ and the samples predicted from fault 5 are labeled as ‘2’. When the condition is 

normal, both of them diagnose accurately and have the same number (160) of label ‘0’, when fault 4 occurs, the 

case 1 has 742 labels of ‘1’ and 58 labels of ‘2’, but the case 4 has 690 labels of ‘1’ and 110 labels of ‘2’, which 

means that case 1 has more fault points that are detected accurately. As shown in Figure 4b, the CCR of case 

1 and case 4 are 94% and 88.5% respectively, case 4 has lower CCR; the results also show that the 

performance of the fault detection method will be worse when applied to the improved model. 

For some faults, both case 1 and case 4 have high CCR, because these faults are normal faults with significant 

symptoms and they are easy to be detected compared with incipient faults. The incipient faults are usually 

difficult to detect, because their magnitudes are extremely small and tend to be buried by either the process 

trend or the measurement noise (He et al., 2018). To reflect the detection performance of detection method on 

small faults for two models, the magnitudes of fault 1, 3, 5, 11 are reduced, the CCR of these faults are shown 

in Table 5.  

As shown in Table 5, for case 1, the detection method still has higher CCR, but for case 4, fault 3, 5, 11 cannot 

be detected effectively when the magnitudes are reduced and nearly half of the data points have not been 

correctly detected. The results indicate that when the same incipient faults happen, getting good results in the 

simulation model does not mean that it will get the same results in a more complex actual chemical process. 

647


Figure 4: The prediction results of PSO-SVM for fault 4, a: the number of different predicted class labels for 

case 1 and case 4, b: the CCR of case 1 and case 4 for fault 4 

Table 5: The CCR of other faults of case 1 and case 4 

Fault Description Fault type Case 1 (%) Case 4 (%) 

1: A Feed Ratio, B Composition Constant in stream 4 Step 98.4 98.3 

3: D Feed Temperature in stream 2 Step 98.3 57.1 

5: Condenser Cooling Water Inlet Temperature Step 80.0 49.7 

11: Cooling Water Inlet Temperature Random 96.8 46.7 

4. Conclusions 

In this paper, the aim is to compare the detection performance of the same fault detection method on simplified 

model and complex model. The performance is always good when detecting the simplified model but the 

research results indicate that when the model is complex and realistic, the performance of fault detection method 

gets worse. In other words, when the detection methods are used to detect the faults in the more complex actual 

chemical process, the detection performance will not be as effective as described in literature. In general, many 

methods cannot be applied in the actual process directly. 

Acknowledgments 

This work is supported by the National Natural Science Foundation of China (21676295). 

References 

Amin M.T., Imtiaz S., Khan, F., 2018, Process System Fault Detection and Diagnosis Using a Hybrid Technique, 

Chemical Engineering Science, 192(2), 191–211. 

Chen B., Lan F.W., Wang Z., Luo X. L., 2018, Dual-time scale based extending of the benchmark Tennessee 

Eastman process[M], Computer Aided Chemical Engineering, Elsevier, 44, 529–534. 

Downs J.J., Vogel E.F., 1993, A plant-wide industrial process control problem, Computers and Chemical 

Engineering, 17(3), 245–255. 

He, Z., Shardt, Y. A. W., Wang, D., Hou, B., Zhou, H., Wang, J., 2018, An incipient fault detection approach via 

detrending and denoising. Control Engineering Practice, 74, 1-12. 

Li W., Wang X.C., Wang X.S., Wang H., 2016, Endpoint Prediction of BOF Steelmaking based on BP Neural 

Network Combined with Improved PSO, Chemical Engineering Transactions, 51, 475–480. 

Liu Y.Y., 2016, The Design and Application of Quantum-Behaved Particle Swarm Optimization Based on Levy 

Flight, Chemical Engineering Transactions, 51, 499–504. 

Wang G., Yin S., 2015, Quality-related fault detection approach based on orthogonal signal correction and 

modified PLS, IEEE Transactions on Industrial Informatics, 11(2), 398–405. 

Yin S., Ding S.X., Haghani A., Hao H., Zhang P., 2012, A comparison study of basic data-driven fault diagnosis 

and process monitoring methods on the benchmark Tennessee Eastman process, Journal of Process 

Control, 22(9), 1567–1581. 

Zhang Z., Guo H., 2016, Research on Fault Diagnosis of Diesel Engine Based on PSO-SVM, Proceedings of 

the 6th International Asia Conference on Industrial Engineering and Management Innovation, Atlantis Press, 

DOI: 10.2991/978-94-6239-145-1_48. 

0 1 2
0

100

200

300

400

500

600

700

800

Label

N
u
m

b
e
r

 
case 1

case 4

160 160

58

110

a 742

690

4
0

10

20

30

40

50

60

70

80

90

100

Fault

C
C

R
 /

 %

 
case 1

case 4

94
88.5

b

648