Microsoft Word - 211.docx CHEMICAL ENGINEERING TRANSACTIONS VOL. 61, 2017 A publication of The Italian Association of Chemical Engineering Online at www.aidic.it/cet Guest Editors: Petar S Varbanov, Rongxin Su, Hon Loong Lam, Xia Liu, Jiří J Klemeš Copyright © 2017, AIDIC Servizi S.r.l. ISBN 978-88-95608-51-8; ISSN 2283-9216 Data-driven Online Operating Performance Assessment for Multi-Datasets Multivariable Industrial Processes Yupeng Dua, Zhenlei Wanga,*, Xin Wangb aKey Laboratory of Advanced Control and Optimization for Chemical Processes, East China University of Science and Technology, Shanghai 200237, China bCenter of Electrical & Electronic Technology,Shanghai Jiao Tong University, Shanghai 200240, China wangzhen_l@ecust.edu.cn In this study, a novel online operating performance assessment method based on multi-sets two-step basis vector extraction artificial neural networks (MTBVE-ANN) strategy is proposed for industrial applications. The MTBVE-ANN method focuses on finding common and specific information involved in multi-datasets, which improves the accuracy of data nonlinear characterization with artificial neural networks introduced. The optimality related variations are extracted from each operating performance grade by analysing the common and unique variations over online steady performance grades. The online operating performance assessment method is performed based on the similarities between the optimality related variations of the test data and that of historical training data. Previously, total projection to latent structures (T-PLS) operating performance assessment method must be performed based on the availability of both input and output data. The proposed method in this paper takes the artificial neural network to assess the operating performance grade of the online test data without output. The validity and precision of the proposed operating performance assessment method is illustrated with the industrial data of multi-datasets multivariable industrial processes. 1. Introduction Under the conditions of maximizing the protection of the environment, maximizing economic efficiency has become the focus. Owing to the complexity of the chemical process, maintaining the excellent operating performance assess of chemical process control system is crucial. Its optimization control strategy research is always the hot spot of process control (Hwang et al., 2013), but due to process disturbance, equipment wear, instrument failure and operational environment changes and other uncertain factors, the operating state may gradually deviate from the optimal operating point, resulting in economic performance deterioration and environmental damages. The online process performance monitoring and optimization method is very meaningful. In 1989, Harris proposed the concept of "feedback invariant" to solve the control performance assessment (CPA) problem of single-variable control system. This marks the beginning of CPA theory (Harris et al., 1989). In order to further assess the system operating performance from a data-driven perspective, Qin et al. (2008) proposed a data-based covariance benchmark for control performance monitoring. Therefore, data-driven performance assessment methods gradually become a hot spot. To solve the above problem, Zhou et al. (2010) proposed total projection to latent structures (T-PLS) for process monitoring to reduce false alarm and missing alarm rates of faults related to output data and improve the method for nonlinear processes performance monitoring. Liu et al. (2014) applied the T-PLS algorithm to the field of offline operating performance assessment by comparing the online data and offline model similarity according to the actual needs. However, the immediacy of online operating performance assessment is limited by the scarcity of the online output data and the nonlinearity of the data degrades the accuracy of assessment. In this study, a novel online operating performance assessment method called MTBVE-ANN is proposed to solve the problems above. In data feature extraction, the paper takes a multi-sets two-step basis vector extraction strategy (MTBVE) on input data, which focuses on finding their common subspace and specific subspace over sets. In online assessment, sliding time window and the artificial neural network approach is carried out for performance DOI: 10.3303/CET1761286 Please cite this article as: Du Y., Wang Z., Wang X., 2017, Data-driven online operating performance assessment for multi-datasets multivariable industrial processes, Chemical Engineering Transactions, 61, 1729-1734 DOI:10.3303/CET1761286 1729 grade classifier. The similarity indices between test data and those of each online performance grade are measured between 0 and 1, which reflect the grade membership. The remainder of this article proceeds as follows. In Section 2, multi-sets two-step basis vector extraction strategy is briefly summarized. In Section 3, the offline extraction modeling and online operating performance assessment process is demonstrated. In Section 4, MTBVE-ANN method is taken into operation on one chemical factory real online input data. Finally, the main conclusions and acknowledgements are presented at the end. 2. Summary of Multi-sets Two-step Basis Vector Extraction Strategy To analyse the common variable correlation shared among multi-datasets, Zhao et al. (2011) proposed the method. This method solves the problem of finding the basis vector by the means of mathematical solution to constrained optimization. Assume there are C input datasets, and the i-th dataset is denoted as ,1 , 2, , , ii T N J i i i i,N R × = ∈ X x x x (1) Where 1, 2, ,i C=  , Ni and J represent the numbers of samples and process variable respectively. According to industrial production experience, the input data is divided into different datasets. Each dataset is represented by some sub-basis vectors which are similar or close related with each other over various performance grade. The paper defined each sub-basis vector as a linear combination of the original sample subspace as , , , , 1 iN T i, j i j n i n i i j n a = = =p x X a (2) Where 1Ji, j R ×∈p is denoted as the jth sub-basis vector of iX and , , ,1 , ,, , i T i j i j i j Na a =  a as the combination coefficient. In order to simplify the interrelationships over sets, 1Jg R ×∈p is denoted as pseudo (C+1)-th sub- basis vector. Sustained, the extraction problem is transferred into a constrained optimization problem as ( )22 1 max max 1 . . 1 C T T g i i i T g g T T i i i i R s t = =  =  =  p X a p p a X X a (3) In order to obtain the solution of the optimization problem, construct Lagrange operator is introduced as ( ) ( ) ( ) ( )2 , , 1 1 , , 1 1 C C T T T T T g i g i g i i g g g i i j i i i j i i F λ λ λ λ = = = − − − − p a p X a p p a X X a , in which gλ and iλ are construct scalars, and the optimization problem is degenerated into an unconstructed extremum problem. The optimization problem is finally converted to an analytical solution as ( )( )1 1 C T T i i i i g g g i λ − = = X X X X p p (4) Therefore, the sub-basis vector in each dataset can be calculated by ( ) 11T T Ti i i i i i i g iλ − = =p X a X X X X p (5) Where ( ) 1T T Ti g i i i i gλ − = p X X X X p and eigenvector corresponding to the maximum eigenvalue is the first common basis vector gp . In the final, two-step basis vector extraction calculation comes into being. In the first step, the problem is formulated as ( )22 1 1 max max , . . 1 TC g gT T g i i T i i i R s t =  ==  =  p p p X a a a (6) The problem is simplified by the Lagrangian operator as ( ) 1 C T i i g g g i λ = = X X p p (7) Correspondingly, the sub-basis vector can be calculated by 1T T i i i i i g iλ = =p X a X X p (8) where T Ti g i i gλ = p X X p a new subspace ,1 ,2 ,, , , TT R J i i i i R R × = ∈ P p p p is spanned. 1730 In the second step, the common basis vector extraction turns into a simple analytical solution as ( ) ( )( )1 1 C TT T T T T i i g g i i i i g i i g g g g i λ − = = X X P P X X X X P X X P p p (9) The common subspace ,1 , 2 ,, , , c c J R g g g g R R × = ∈ P p p p is spanned of multiple gp , which characterizes all the common data information over sets. 3. The Offline Extraction Modelling and Online Artificial Neural Networks Operating Performance Assessment With the expert’s experience and process knowledge, the training data can be easily divided into C species corresponding to different performance grades. According to the idea of just in time learning, similar input will produce similar output (Cheng et al., 2005). In this paper, industrial field input data is organized as training datasets, which represent implied information of different performance grades. 3.1 The Offline Extraction Modelling on Training Data Assume that the chemical production process contains a total of C steady performance grades, and the training data corresponding to grade ith is denoted as 1, 2, ,i C=  . iX should be normalized into zero mean with unit variance. After the data pre-processing, iX is divided into two orthogonal subspaces as ( )1 -c s T Ti i i i g g i g g= = +X X + X X P P X P P (10) Where ciX includes the process variations determined by the common variable correlation among the whole process, while siX contains differentiate variable correlation specific to grade i. The common subspace is spanned by V basis vectors as ,1 , 2 ,, , , J V g g g g V R × = ∈ P p p p . It should further analyse the amplitudes of the variations on each of the basis vectors orderly, and then the basis vectors with approximately equal variations amplitudes are separated from gP and constitute a subspace which contains the common variable correlation and amplitude. In vectors selection, the variations can be calculated in score matrix manner as , , , 1, 2, ,i v i g v i C= t = X p . If the score matrix 1, 2, ,, , ,v v C v∞ ∞ ∞t t t are approximately equal, it certificates that ,g vp is of no use to identify the operating performance, otherwise the ,g vp will be left behind as a column vector for operating performance identification. Therefore, an index is denoted as i, , , 1,2, , 1i v C v i Cη ∞ ∞= = −t t to complete the screening. If 1 2 1, , , Cη η η − are all within 1 δ± , it indicates that 1, 2, ,,v v C v∞ ∞ ∞t t t are almost equal one by one. δ is a relaxation factor that can be selected by trial and adapted to the specific training data. On the contrary, the ,g vp will be left behind while the corresponding iη is out of range. As a result, the basis vectors in gP can be divided into two new subspace g  P and gP , with ,1 , 2 ,, , , J V g g g g V R × = ∈     P p p p reflecting the common variable correlation and common amplitude while ,1 , 2 ,, , , J V g g g g V R × = ∈     P p p p denoting the subspace in which the common variable correlation but unequal amplitudes present in different performance grades with V V V= −  . Finally, the industrial training data is separated into three parts as follows: ( )-c s T T T c si i i i g g i g g i g g i i= = + + =      X X + X X P P X P P X I P P X + X (11) Where c Ti i g g=  X X P P , representing the optimality unrelated variations and unrelated to operating performance assessment, while ( )-s T Ti i g g i g g= +  X X P P X I P P covering specific operating performance identification information to grade i. 3.2 Online Operating Performance Assessment Based on Artificial Neural Networks In online operating performance assessment, it is far from characterizing the true operating performance with a single sampling. Liu et al. (2009) carried out an approach of data window to trace the time-varying dynamics of a chemical process. In this paper, test data analysis unit is conducted on a data window with width H whose value is determined by the actual situation of the chemical industrial process. The step size of the sliding window is determined by the test data object. Data-driven operating performance assessment takes the similarity of between the test data and historical training data to indicate the performance grade of online data. Because the simulation data is less perturbed and often lack of non-linear features. The Euclidean distance 1731 similarity index based on the score matrix is feasible where the whole process data is relatively stable and the noise is relatively small (Qin et al., 2012). However, industrial field online data owns numerous input variables and the data may contain a certain degree of nonlinear characteristics. Therefore, artificial neural network classification is carried out to measure the similarity of the online test data to the training data extracted offline. The online operating performance assessment based on artificial neural networks is arranged as follows: Step1: At moment k, construct online test data as ( ) ( ), 1 , T test k test testk H k=  − +  X x , x (12) Step2: Assume that ,test kX belongs to one specific grade. There would be C normalized versions of ,test kX . A data window with width H is introduced. Denote the ,test kX and its mean vector as ( ), 1 / k i i test k test h k H h H = − + = x x respectively. Step3: Remove the common information over sets from the online data as , , , , i s iT iT T test k test k test k g g= −  x x x P P (13) Where , , i s test k x indicates the specific variations of the online test data. Step4: Apply the simplest single hidden layer feed-forward artificial neural network as a classifier, with , , i s test k x as the input data of the neural network. The number of input neurons is J and the number of output neurons is C, representing the number of input variables and the number of performance grades, respectively. Then, an index matrix will be defined as 1, , ,, , , Ti q test k test k test kγ γ =  γ 1, 2, ,q C=  where [ ], 0,1 q test kγ ∈ is the label data of the neural network classifier, representing the similarity of the online test data to the grade q . The closer the value of , q test kγ is to 1, the higher the belief that the online test unit belongs to the performance grade q . Step5: An assessment threshold ( )0 1β β< < is introduced for distinguishing whether the performance runs in a specific performance grade or the conversion. Case 1: If ( ),1max q test kq C γ β ≤ ≤ ≥ , it indicates that the online test data unit is operating in grade q . Case 2: If case1 is not satisfied, such as ( ),10 max q test kq C γ β ≤ ≤ ≤ < and , , , q p test k test kγ γ> { }( , ,arg max , 1, 2,p itest k test k iγ γ= =  ), &C i q≠ , it can be determined that the current process is converting from grade q to grade p . Case3: If both case 1 and 2 are not satisfied, influence of noise or uncertainties may answer the door and the assessment result is still. Figure1: Amplitudes of score matrix corresponding to the variables 4. Case study 4.1 Description of Depth Optimization Process of Ethylene Cracking Furnace Ethylene is a cornerstone in petrochemical industry, addressing many implications in process safety management in view of sustainable production (Fabiano et al., 2015). In the ethylene production process, the cracking furnace is the core of the production. Its operating performance not only affects the entire ethylene production, but also the downstream process. In this paper, actual process data is collected from a cracking furnace to a domestic ethylene industry. After preheating in the convection section, cracked raw materials flow into the radiation section of the furnace tube for cracking reaction through Venturi flow distributor. Next, the heated high temperature pyrolysis gas flows into the quench heat exchanger, and boiler water heat transfer for a quickly cool and end to the cracking reaction. The pyrolysis gas of the quench heat exchanger enters the 1732 quench tank and is cooled by the quench oil to about 200 °C. Finally, the cracker gas is separated into the gasoline by the pyrolysis gas. The ethylene cracking process improves the assessment accuracy by maintaining the stability of the input variables to maximize production efficiency. In this study, 18 process variables are taken into the actual process for operating performance assessment. The operating performance of the multivariable system is evaluated by the high absorption rate, which acts as a single output object. According to expert knowledge, the ethylene production process can be divided into three steady performance grades, i.e. poor, general, and optimal. The training datasets is divided into three categories as 1X 2X and 3X , corresponding to poor, general, and optimal grade historically. There are 1,200 samples in total of historical process. Set the relaxation factor as 0.1δ = . Figure 2: A comparison curve of the assessment results 4.2 Accuracy Analysis and discussion To clearly illustrate the superiority of the assessment method proposed in this study, the PCA-ANN operating performance assessment method is illustrated for comparison. Assessment results of the two methods are shown as Figure 2. Relevant parameters are set as 30, 0.9β =H = . Because the optimal operating performance grade is relatively stable, the two methods are not very different. In the general grade, the assessment of the performance is different due to the data disturbance. After the sampling time 1143, the PCA-ANN method has two serious state misjudgments and the whole process assessment coefficient is jittery while the MTBVE-ANN method still owns a relatively good effect in the poor performance grade, and there is no miscarriage of assessment, indicating that MTBVE-ANN method is superior to the traditional feature extraction assessment method. MTBVE-ANN method of operating performance degradation is not only better than traditional methods, but also can be found in the advantage of selecting the relevant input variables to performance degradation, so as to provide a basis for the adjustment of control strategies. According to the index in Figure 2, the process variables, such as Ffeed, FDS, Fss, Tss, COLE are screened out, indicating that these variables have less responsibility for performance degradation. Table 1: Comparison of several operating performance assessment methods Actual Grade PCA-ANN MTBVE-ANN Optimal 1-490 1-484 1-491 Conversion General Conversion Poor 491-502 503-1,055 1,056-1,062 1,063-1,371 485-501 502-1,127 1,128-1,147 1,148-1,234 492-505 506-1,071 1,072-1,142 1,143-1,371 Table 2: Variables definitions Variables Descriptions Unit Variables Descriptions Unit NAPρ cracking raw material density kg·m-3 FSfuel side fuel flow m3·h-1 CNP concentration of normal paraffin % Co2 concentration of flue oxygen % CIP concentration of iso-paraffin % Tg1 flue gas temperature 1 °C COLE concentration of olefin % Tg2 flue gas temperature 2 °C CBTX concentration of arene % Fss high-pressure steam flow kg·h-1 CNAP concentration of naphthenic % Tss high-pressure steam temperature °C Ffeed feed flow t·h -1 COT cracking outlet temperature °C FDS dilution steam flow t·h -1 THK initial boiling point temperature °C FBfuel bottom fuel flow m3·h-1 TKK final boiling point temperature °C 1733 5. Conclusion A novel online operating performance assessment method based on MTBVE-ANN is carried out for industrial processes. The advantages of the MTBVE-ANN assessment method are summarized as follows: (a) Common and specific variations over online steady performance grades are measured respectively, which highlights the distinction of different performance grade. (b) Online assessment with artificial neural network is adaptive to the nonlinear characteristics of the data somehow. (c) The proposed operating performance assessment method is applied to the real industrial production data, which is more credible than the simple simulation data. However, the method still owns some shortcomings, such as the assessment method of the transitional grade targeting, which is the spot for further research. Acknowledgments This work was supported by the National Natural Science Foundation of China (61590922, 61533003, 21376077, 61673268), the Key Project of NSFC (61533012), the Shanghai Natural Science Foundation (14ZR1421800), the State Key Laboratory of Synthetical Automation for Process Industries (PAL-N201404). References Cheng C., Chiu M.S., 2005, Nonlinear process monitoring using JITL-PCA, Chemometrics and Intelligent Laboratory Systems, 76(1), 1-13. Fabiano B., Pistritto F., Reverberi A., Palazzi E., 2015, Ethylene–air mixtures under flowing conditions: a model-based approach to explosion conditions, Clean Technologies and Environmental Policy, 17(5), 1261-1270. Guo L., Gao J., Yang J., Kang L., 2009, Criticality evaluation of petrochemical equipment based on fuzzy comprehensive evaluation and a BP neural network, Journal of Loss Prevention in the Process Industries, 22(4), 469-476. Harris T.J., 1989, Assessment of control loop performance, The Canadian Journal of Chemical Engineering, 67(5), 856-861. Hwang J.H., Roh M.I., Lee K.Y., 2013, Determination of the optimal operating conditions of the dual mixed refrigerant cycle for the LNG FPSO topside liquefaction process, Computers & Chemical Eng, 49, 25-36. Huang B., Shah S.L., 1998, Practical issues in multivariable feedback control performance assessment, Journal of Process Control, 8(5-6), 421-430. Krämer N., Sugiyama M., Braun M.L., 2009, Lanczos Approximations for the Speedup of Kernel Partial Least Squares Regression, AISTATS, 288-295. Li G., Qin S.J., Zhou D., 2010, Output relevant fault reconstruction and fault subspace extraction in total projection to latent structures models, Industrial & Engineering Chemistry Research, 49(19), 9175-9183. Liu Y., Hu N., Wang H., Li P., 2009, Soft chemical analyzer development using adaptive least-squares support vector regression with selective pruning and variable moving window size, Industrial & Engineering Chemistry Research, 48(12), 5731-5741. Liu Y., Chang Y., Wang F., 2014, Online process operating performance assessment and nonoptimal cause identification for industrial processes, Journal of Process Control, 24(10), 1548-1555. Peng K., Zhang K., Li G., 2013, Quality-related process monitoring based on total kernel PLS model and its industrial application, Mathematical Problems in Engineering, 1-14. Qin S.J., 2012, Survey on data-driven industrial process monitoring and diagnosis, Annual Reviews in Control, 36(2), 220-234. Tarafder A., Rangaiah G.P., Ray A.K., 2007, A study of finding many desirable solutions in multiobjective optimization of chemical processes, Computers & Chemical Engineering, 31(10), 1257-1271. Yu J., Qin S.J., 2008, Statistical MIMO controller performance monitoring. Part I: Data-driven covariance benchmark, Journal of Process Control, 18(3), 277-296. Yu J., Qin S.J., 2008, Statistical MIMO controller performance monitoring. Part II: Performance diagnosis, Journal of Process Control, 18(3), 297-319. Xu Y., Yang J.Y., Jin Z., 2004, A novel method for Fisher discriminant analysis, Pattern Recognition, 37(2), 381-384. Zhao C., Gao F., Niu D., Wang F., 2011, A two-step basis vector extraction strategy for multiset variable correlation analysis, Chemometrics and Intelligent Laboratory Systems, 107(1), 147-154. Zhou D., Li G., Qin S.J., 2010, Total projection to latent structures for process monitoring, AIChE Journal, 56(1), 168-178. 1734