Microsoft Word - 211.docx


 CHEMICAL ENGINEERING TRANSACTIONS  
 

VOL. 61, 2017 

A publication of 

 
The Italian Association 

of Chemical Engineering 
Online at www.aidic.it/cet 

Guest Editors: Petar S Varbanov, Rongxin Su, Hon Loong Lam, Xia Liu, Jiří J Klemeš
Copyright © 2017, AIDIC Servizi S.r.l. 
ISBN 978-88-95608-51-8; ISSN 2283-9216 

Data-driven Online Operating Performance Assessment for 
Multi-Datasets Multivariable Industrial Processes 

Yupeng Dua, Zhenlei Wanga,*, Xin Wangb 
aKey Laboratory of Advanced Control and Optimization for Chemical Processes, East China University of Science and 
Technology, Shanghai 200237, China 
bCenter of Electrical & Electronic Technology，Shanghai Jiao Tong University, Shanghai 200240, China  
 wangzhen_l@ecust.edu.cn  

In this study, a novel online operating performance assessment method based on multi-sets two-step basis 
vector extraction artificial neural networks (MTBVE-ANN) strategy is proposed for industrial applications. The 
MTBVE-ANN method focuses on finding common and specific information involved in multi-datasets, which 
improves the accuracy of data nonlinear characterization with artificial neural networks introduced. The 
optimality related variations are extracted from each operating performance grade by analysing the common 
and unique variations over online steady performance grades. The online operating performance assessment 
method is performed based on the similarities between the optimality related variations of the test data and 
that of historical training data. Previously, total projection to latent structures (T-PLS) operating performance 
assessment method must be performed based on the availability of both input and output data. The proposed 
method in this paper takes the artificial neural network to assess the operating performance grade of the 
online test data without output. The validity and precision of the proposed operating performance assessment 
method is illustrated with the industrial data of multi-datasets multivariable industrial processes. 

1. Introduction  
Under the conditions of maximizing the protection of the environment, maximizing economic efficiency has 
become the focus. Owing to the complexity of the chemical process, maintaining the excellent operating 
performance assess of chemical process control system is crucial. Its optimization control strategy research is 
always the hot spot of process control (Hwang et al., 2013), but due to process disturbance, equipment wear, 
instrument failure and operational environment changes and other uncertain factors, the operating state may 
gradually deviate from the optimal operating point, resulting in economic performance deterioration and 
environmental damages. The online process performance monitoring and optimization method is very 
meaningful. In 1989, Harris proposed the concept of "feedback invariant" to solve the control performance 
assessment (CPA) problem of single-variable control system. This marks the beginning of CPA theory (Harris 
et al., 1989). In order to further assess the system operating performance from a data-driven perspective, Qin 
et al. (2008) proposed a data-based covariance benchmark for control performance monitoring. Therefore, 
data-driven performance assessment methods gradually become a hot spot. To solve the above problem, 
Zhou et al. (2010) proposed total projection to latent structures (T-PLS) for process monitoring to reduce false 
alarm and missing alarm rates of faults related to output data and improve the method for nonlinear processes 
performance monitoring. Liu et al. (2014) applied the T-PLS algorithm to the field of offline operating 
performance assessment by comparing the online data and offline model similarity according to the actual 
needs. 
However, the immediacy of online operating performance assessment is limited by the scarcity of the online 
output data and the nonlinearity of the data degrades the accuracy of assessment. In this study, a novel online 
operating performance assessment method called MTBVE-ANN is proposed to solve the problems above. In 
data feature extraction, the paper takes a multi-sets two-step basis vector extraction strategy (MTBVE) on 
input data, which focuses on finding their common subspace and specific subspace over sets. In online 
assessment, sliding time window and the artificial neural network approach is carried out for performance 

                               
DOI: 10.3303/CET1761286

 
Please cite this article as: Du Y., Wang Z., Wang X., 2017, Data-driven online operating performance assessment for multi-datasets 
multivariable industrial processes, Chemical Engineering Transactions, 61, 1729-1734  DOI:10.3303/CET1761286  

1729


grade classifier. The similarity indices between test data and those of each online performance grade are 
measured between 0 and 1, which reflect the grade membership. 
The remainder of this article proceeds as follows. In Section 2, multi-sets two-step basis vector extraction 
strategy is briefly summarized. In Section 3, the offline extraction modeling and online operating performance 
assessment process is demonstrated. In Section 4, MTBVE-ANN method is taken into operation on one 
chemical factory real online input data. Finally, the main conclusions and acknowledgements are presented at 
the end. 

2. Summary of Multi-sets Two-step Basis Vector Extraction Strategy 
To analyse the common variable correlation shared among multi-datasets, Zhao et al. (2011) proposed the 
method. This method solves the problem of finding the basis vector by the means of mathematical solution to 
constrained optimization. Assume there are C input datasets, and the i-th dataset is denoted as 

,1 , 2, , , ii
T N J

i i i i,N R
× = ∈ X x x x                                                                                                                            (1) 

Where 1, 2, ,i C=  , Ni and J represent the numbers of samples and process variable respectively. According to 
industrial production experience, the input data is divided into different datasets. Each dataset is represented 
by some sub-basis vectors which are similar or close related with each other over various performance grade. 
The paper defined each sub-basis vector as a linear combination of the original sample subspace as 

, , , ,
1

iN
T

i, j i j n i n i i j
n

a
=

= =p x X a                                                                                                                                      (2) 

Where 1Ji, j R
×∈p  is denoted as the jth sub-basis vector of iX and , , ,1 , ,, , i

T

i j i j i j Na a =  a  as the combination 

coefficient. In order to simplify the interrelationships over sets, 1Jg R
×∈p is denoted as pseudo (C+1)-th sub-

basis vector. Sustained, the extraction problem is transferred into a constrained optimization problem as 

( )22
1

max max

1
. .

1

C
T T
g i i

i

T
g g

T T
i i i i

R

s t

=

=

 =


=

 p X a

p p

a X X a

                                                                                                                                (3) 

In order to obtain the solution of the optimization problem, construct Lagrange operator is introduced as  

( ) ( ) ( ) ( )2 , ,
1 1

, , 1 1
C C

T T T T T
g i g i g i i g g g i i j i i i j

i i
F λ λ λ λ

= =

= − − − − p a p X a p p a X X a , in which gλ and iλ are construct scalars, and 
the optimization problem is degenerated into an unconstructed extremum problem. The optimization problem 
is finally converted to an analytical solution as 

( )( )1
1

C
T T
i i i i g g g

i
λ

−

=

= X X X X p p                                                                                                                            (4) 
Therefore, the sub-basis vector in each dataset can be calculated by 

( ) 11T T Ti i i i i i i g
iλ

−
= =p X a X X X X p                                                                                                                     (5) 

Where ( ) 1T T Ti g i i i i gλ
−

= p X X X X p and eigenvector corresponding to the maximum eigenvalue is the first 

common basis vector gp . In the final, two-step basis vector extraction calculation comes into being. In the first 

step, the problem is formulated as 

( )22
1

1
max max , . .

1

TC
g gT T

g i i T
i i i

R s t
=

 == 
=


p p

p X a
a a

                                                                                                            (6) 

The problem is simplified by the Lagrangian operator as 

( )
1

C
T
i i g g g

i
λ

=

= X X p p                                                                                                                                          (7) 
Correspondingly, the sub-basis vector can be calculated by 

1T T
i i i i i g

iλ
= =p X a X X p                                                                                                                                     (8) 

where T Ti g i i gλ = p X X p a new subspace ,1 ,2 ,, , ,
TT R J

i i i i R R
× = ∈ P p p p is spanned. 

1730


In the second step, the common basis vector extraction turns into a simple analytical solution as 

( ) ( )( )1
1

C TT T T T T
i i g g i i i i g i i g g g g

i
λ

−

=

= X X P P X X X X P X X P p p                                                                                     (9) 

The common subspace ,1 , 2 ,, , ,
c

c
J R

g g g g R
R × = ∈ P p p p is spanned of multiple gp , 

which characterizes all the 

common data information over sets.
 

3.  The Offline Extraction Modelling and Online Artificial Neural Networks Operating 
Performance Assessment 
With the expert’s experience and process knowledge, the training data can be easily divided into C species 
corresponding to different performance grades. According to the idea of just in time learning, similar input will 
produce similar output (Cheng et al., 2005). In this paper, industrial field input data is organized as training 
datasets, which represent implied information of different performance grades. 

3.1 The Offline Extraction Modelling on Training Data 

Assume that the chemical production process contains a total of C steady performance grades, and the 
training data corresponding to grade ith is denoted as 1, 2, ,i C=  . iX should be normalized into zero mean 
with unit variance. After the data pre-processing, iX is divided into two orthogonal subspaces as 

( )1 -c s T Ti i i i g g i g g= = +X X + X X P P X P P                                                                                                               (10) 
Where ciX includes the process variations determined by the common variable correlation among the whole 

process, while siX contains differentiate variable correlation specific to grade i. The common subspace is 

spanned by V basis vectors as ,1 , 2 ,, , ,
J V

g g g g V R
× = ∈ P p p p . It should further analyse the amplitudes of the 

variations on each of the basis vectors orderly, and then the basis vectors with approximately equal variations 
amplitudes are separated from gP and constitute a subspace which contains the common variable correlation 

and amplitude. In vectors selection, the variations can be calculated in score matrix manner as

, , , 1, 2, ,i v i g v i C= t = X p . If the score matrix 1, 2, ,, , ,v v C v∞ ∞ ∞t t t are approximately equal, it certificates that ,g vp

is of no use to identify the operating performance, otherwise the ,g vp will be left behind as a column vector for 

operating performance identification. Therefore, an index is denoted as i, , , 1,2, , 1i v C v i Cη ∞ ∞= = −t t to complete 

the screening. If 1 2 1, , , Cη η η − are all within 1 δ± , it indicates that 1, 2, ,,v v C v∞ ∞ ∞t t t are almost equal one by 

one. δ is a relaxation factor that can be selected by trial and adapted to the specific training data. On the 
contrary, the ,g vp will be left behind while the corresponding iη is out of range. As a result, the basis vectors in

gP can be divided into two new subspace g

P and gP , with ,1 , 2 ,, , ,

J V
g g g g V R

× = ∈ 





P p p p reflecting the common 

variable correlation and common amplitude while ,1 , 2 ,, , ,
J V

g g g g V R
× = ∈ 



 P p p p denoting the subspace in which 

the common variable correlation but unequal amplitudes present in different performance grades with
V V V= −

 . Finally, the industrial training data is separated into three parts as follows: 

( )-c s T T T c si i i i g g i g g i g g i i= = + + =
     X X + X X P P X P P X I P P X + X                                                                               (11) 

Where c Ti i g g=
 X X P P , representing the optimality unrelated variations and unrelated to operating performance 

assessment, while ( )-s T Ti i g g i g g= +  X X P P X I P P covering specific operating performance identification information 
to grade i.  

3.2 Online Operating Performance Assessment Based on Artificial Neural Networks 

In online operating performance assessment, it is far from characterizing the true operating performance with 
a single sampling. Liu et al. (2009) carried out an approach of data window to trace the time-varying dynamics 
of a chemical process. In this paper, test data analysis unit is conducted on a data window with width H whose 
value is determined by the actual situation of the chemical industrial process. The step size of the sliding 
window is determined by the test data object. Data-driven operating performance assessment takes the 
similarity of between the test data and historical training data to indicate the performance grade of online data. 
Because the simulation data is less perturbed and often lack of non-linear features. The Euclidean distance 

1731


similarity index based on the score matrix is feasible where the whole process data is relatively stable and the 
noise is relatively small (Qin et al., 2012). However, industrial field online data owns numerous input variables 
and the data may contain a certain degree of nonlinear characteristics. Therefore, artificial neural network 
classification is carried out to measure the similarity of the online test data to the training data extracted offline. 
The online operating performance assessment based on artificial neural networks is arranged as follows: 
Step1: At moment k, construct online test data as 

( ) ( ), 1 ,
T

test k test testk H k=  − +  X x , x                                                                                                              (12) 

Step2: Assume that ,test kX belongs to one specific grade. There would be C normalized versions of ,test kX . A 

data window with width H is introduced. Denote the ,test kX and its mean vector as ( ),
1

/
k

i i
test k test

h k H
h H

= − +

= x x  
respectively. 
Step3: Remove the common information over sets from the online data as  

,
, , ,

i s iT iT T
test k test k test k g g= −

 x x x P P                                                                                                                                  (13) 

Where , ,
i s
test k
x indicates the specific variations of the online test data.  

Step4: Apply the simplest single hidden layer feed-forward artificial neural network as a classifier, with , ,
i s
test k
x        

as the input data of the neural network. The number of input neurons is J and the number of output neurons is 
C, representing the number of input variables and the number of performance grades, respectively. Then, an 

index matrix will be defined as 1, , ,, , ,
Ti q

test k test k test kγ γ =  γ 1, 2, ,q C=  where [ ], 0,1
q
test kγ ∈ is the label data of the 

neural network classifier, representing the similarity of the online test data to the grade q . The closer the value 

of ,
q
test kγ is to 1, the higher the belief that the online test unit belongs to the performance grade q .  

Step5: An assessment threshold ( )0 1β β< < is introduced for distinguishing whether the performance runs in 
a specific performance grade or the conversion. 
Case 1: If ( ),1max

q
test kq C

γ β
≤ ≤

≥ , it indicates that the online test data unit is operating in grade q . 

Case 2: If case1 is not satisfied, such as ( ),10 max
q
test kq C

γ β
≤ ≤

≤ < and , , ,
q p
test k test kγ γ>  { }( , ,arg max , 1, 2,p itest k test k iγ γ= =  

), &C i q≠ , it can be determined that the current process is converting from grade q to grade p . 
Case3: If both case 1 and 2 are not satisfied, influence of noise or uncertainties may answer the door and the 
assessment result is still. 

 
Figure1: Amplitudes of score matrix corresponding to the variables 

4. Case study 
4.1 Description of Depth Optimization Process of Ethylene Cracking Furnace  

Ethylene is a cornerstone in petrochemical industry, addressing many implications in process safety 
management in view of sustainable production (Fabiano et al., 2015). In the ethylene production process, the 
cracking furnace is the core of the production. Its operating performance not only affects the entire ethylene 
production, but also the downstream process. In this paper, actual process data is collected from a cracking 
furnace to a domestic ethylene industry. After preheating in the convection section, cracked raw materials flow 
into the radiation section of the furnace tube for cracking reaction through Venturi flow distributor. Next, the 
heated high temperature pyrolysis gas flows into the quench heat exchanger, and boiler water heat transfer for 
a quickly cool and end to the cracking reaction. The pyrolysis gas of the quench heat exchanger enters the 

1732


quench tank and is cooled by the quench oil to about 200 °C. Finally, the cracker gas is separated into the 
gasoline by the pyrolysis gas. The ethylene cracking process improves the assessment accuracy by 
maintaining the stability of the input variables to maximize production efficiency. In this study, 18 process 
variables are taken into the actual process for operating performance assessment. The operating performance 
of the multivariable system is evaluated by the high absorption rate, which acts as a single output object. 
According to expert knowledge, the ethylene production process can be divided into three steady performance 
grades, i.e. poor, general, and optimal. The training datasets is divided into three categories as 1X 2X and 3X , 
corresponding to poor, general, and optimal grade historically. There are 1,200 samples in total of historical 
process. Set the relaxation factor as 0.1δ = . 

  
Figure 2: A comparison curve of the assessment results 

4.2 Accuracy Analysis and discussion  

To clearly illustrate the superiority of the assessment method proposed in this study, the PCA-ANN operating 
performance assessment method is illustrated for comparison. Assessment results of the two methods are 
shown as Figure 2. Relevant parameters are set as 30, 0.9β =H = . Because the optimal operating 
performance grade is relatively stable, the two methods are not very different. In the general grade, the 
assessment of the performance is different due to the data disturbance. After the sampling time 1143, the 
PCA-ANN method has two serious state misjudgments and the whole process assessment coefficient is jittery 
while the MTBVE-ANN method still owns a relatively good effect in the poor performance grade, and there is 
no miscarriage of assessment, indicating that MTBVE-ANN method is superior to the traditional feature 
extraction assessment method. MTBVE-ANN method of operating performance degradation is not only better 
than traditional methods, but also can be found in the advantage of selecting the relevant input variables to 
performance degradation, so as to provide a basis for the adjustment of control strategies. According to the 
index in Figure 2, the process variables, such as Ffeed, FDS, Fss, Tss, COLE are screened out, indicating that 
these variables have less responsibility for performance degradation. 

Table 1: Comparison of several operating performance assessment methods  

 Actual Grade PCA-ANN MTBVE-ANN
Optimal 1-490 1-484 1-491 
Conversion 
General 
Conversion 
Poor 

491-502 
503-1,055 
1,056-1,062 
1,063-1,371 

485-501 
502-1,127 
1,128-1,147 
1,148-1,234 

492-505 
506-1,071 
1,072-1,142 
1,143-1,371 

Table 2: Variables definitions 

Variables Descriptions Unit Variables Descriptions Unit 
NAPρ  cracking raw material density kg·m-3 FSfuel side fuel flow m3·h-1

CNP concentration of normal paraffin % Co2 concentration of flue oxygen % 
CIP concentration of iso-paraffin % Tg1 flue gas temperature 1 °C 
COLE concentration of olefin % Tg2 flue gas temperature 2 °C 
CBTX concentration of arene % Fss high-pressure steam flow kg·h-1 
CNAP  concentration of naphthenic  % Tss high-pressure steam temperature °C 
Ffeed feed flow t·h

-1
COT cracking outlet temperature °C 

FDS dilution steam flow t·h
-1

THK initial boiling point temperature °C 
FBfuel bottom fuel flow m3·h-1 TKK final boiling point temperature °C 

1733


5. Conclusion 
A novel online operating performance assessment method based on MTBVE-ANN is carried out for industrial 
processes. The advantages of the MTBVE-ANN assessment method are summarized as follows: (a) Common 
and specific variations over online steady performance grades are measured respectively, which highlights the 
distinction of different performance grade. (b) Online assessment with artificial neural network is adaptive to 
the nonlinear characteristics of the data somehow. (c) The proposed operating performance assessment 
method is applied to the real industrial production data, which is more credible than the simple simulation data. 
However, the method still owns some shortcomings, such as the assessment method of the transitional grade 
targeting, which is the spot for further research. 

Acknowledgments  

This work was supported by the National Natural Science Foundation of China (61590922, 61533003, 
21376077, 61673268), the Key Project of NSFC (61533012), the Shanghai Natural Science Foundation 
(14ZR1421800), the State Key Laboratory of Synthetical Automation for Process Industries (PAL-N201404). 

References 

Cheng C., Chiu M.S., 2005, Nonlinear process monitoring using JITL-PCA, Chemometrics and Intelligent 
Laboratory Systems, 76(1), 1-13.  

Fabiano B., Pistritto F., Reverberi A., Palazzi E., 2015, Ethylene–air mixtures under flowing conditions: a 
model-based approach to explosion conditions, Clean Technologies and Environmental Policy, 17(5), 
1261-1270. 

Guo L., Gao J., Yang J., Kang L., 2009, Criticality evaluation of petrochemical equipment based on fuzzy 
comprehensive evaluation and a BP neural network, Journal of Loss Prevention in the Process Industries, 
22(4), 469-476.  

Harris T.J., 1989, Assessment of control loop performance, The Canadian Journal of Chemical Engineering, 
67(5), 856-861.  

Hwang J.H., Roh M.I., Lee K.Y., 2013, Determination of the optimal operating conditions of the dual mixed 
refrigerant cycle for the LNG FPSO topside liquefaction process, Computers & Chemical Eng, 49, 25-36.  

Huang B., Shah S.L., 1998, Practical issues in multivariable feedback control performance assessment, 
Journal of Process Control, 8(5-6), 421-430.  

Krämer N., Sugiyama M., Braun M.L., 2009, Lanczos Approximations for the Speedup of Kernel Partial Least 
Squares Regression, AISTATS, 288-295.  

Li G., Qin S.J., Zhou D., 2010, Output relevant fault reconstruction and fault subspace extraction in total 
projection to latent structures models, Industrial & Engineering Chemistry Research, 49(19), 9175-9183.  

Liu Y., Hu N., Wang H., Li P., 2009, Soft chemical analyzer development using adaptive least-squares support 
vector regression with selective pruning and variable moving window size, Industrial & Engineering 
Chemistry Research, 48(12), 5731-5741.  

Liu Y., Chang Y., Wang F., 2014, Online process operating performance assessment and nonoptimal cause 
identification for industrial processes, Journal of Process Control, 24(10), 1548-1555.  

Peng K., Zhang K., Li G., 2013, Quality-related process monitoring based on total kernel PLS model and its 
industrial application, Mathematical Problems in Engineering, 1-14. 

Qin S.J., 2012, Survey on data-driven industrial process monitoring and diagnosis, Annual Reviews in Control, 
36(2), 220-234.  

Tarafder A., Rangaiah G.P., Ray A.K., 2007, A study of finding many desirable solutions in multiobjective 
optimization of chemical processes, Computers & Chemical Engineering, 31(10), 1257-1271.  

Yu J., Qin S.J., 2008, Statistical MIMO controller performance monitoring. Part I: Data-driven covariance 
benchmark, Journal of Process Control, 18(3), 277-296.  

Yu J., Qin S.J., 2008, Statistical MIMO controller performance monitoring. Part II: Performance diagnosis, 
Journal of Process Control, 18(3), 297-319.  

Xu Y., Yang J.Y., Jin Z., 2004, A novel method for Fisher discriminant analysis, Pattern Recognition, 37(2), 
381-384.  

Zhao C., Gao F., Niu D., Wang F., 2011, A two-step basis vector extraction strategy for multiset variable 
correlation analysis, Chemometrics and Intelligent Laboratory Systems, 107(1), 147-154.  

Zhou D., Li G., Qin S.J., 2010, Total projection to latent structures for process monitoring, AIChE Journal, 
56(1), 168-178.  

1734