Format And Type Fonts CHEMICAL ENGINEERING TRANSACTIONS VOL. 39, 2014 A publication of The Italian Association of Chemical Engineering www.aidic.it/cet Guest Editors: Petar Sabev Varbanov, Jiří Jaromír Klemeš, Peng Yen Liew, Jun Yow Yong Copyright © 2014, AIDIC Servizi S.r.l., ISBN 978-88-95608-30-3; ISSN 2283-9216 DOI: 10.3303/CET1439121 Please cite this article as: Farsang B., Nemeth S., Abonyi J., 2014, Synergy between data reconciliation and principal component analysis in energy monitoring, Chemical Engineering Transactions, 39, 721-726 DOI:10.3303/CET1439121 721 Synergy between Data Reconciliation and Principal Component Analysis in Energy Monitoring Barbara Farsang*, Sandor Nemeth, Janos Abonyi University of Pannonia, Department of Process Engineering, P.O. Box 158, H-8201, Hungary farsangb@fmt.uni-pannon.hu Monitoring of energy consumption is central importance for the energy-efficient operation of chemical processes. Fault detection and process monitoring systems can reduce the environmental impact and enhance safety and energy efficiency of chemical processes. These solutions are based on the analysis of process data. Data reconciliation is a model-based technique that checks the consistence of measurements and balance equations. Principal component analysis is a similar multivariate model based technique, but it utilises a data-driven statistical model. We investigate how information can be transferred between these models to get a more sensitive tool for energy monitoring. To illustrate the capability of the proposed method in energy monitoring, we provide a case study for heat balance analysis in the well- known Tennessee Eastman benchmark problem. The results demonstrate how balance equations can improve energy management of complex process technologies. 1. Introduction Over the last ten years the global energy consumption increased by 30 % (Rühl, 2013). Chemical companies are faced with rising energy and material costs. Increasing energy efficiency and reducing energy usage become the most important factors of competitiveness. Energy efficiency can be defined in several ways (Xia and Zhang, 2010):  Performance efficiency is characterised by production, cost, energy sources and environmental impact.  Operational efficiency is evaluated by considering the proper coordination of different system components.  Equipment efficiency is indicated by capacity, specifications and standards, constraints, maintenance.  Technology efficiency includes the reduction of life cycle cost and coefficients in the conversing/ processing/transmitting rate, in addition improving of the novelty and optimality of processes. Energy monitoring systems can be used to improve energy efficiency and reduce the energy consumption. The purpose of energy management is “to enable understanding of energy consumption data; identify underlying factors which impact upon consumption; and set appropriate targets that allow you to review performance” (Carbon Trust, 2010). Monitoring systems improve the energy efficiency in processes, because these systems calculate actual energy use, estimate the needed energy for normal operation and highlight where energy use can be improved. These systems can detect energy wastages caused by human error, equipment malfunction or poor process control. Significant difference between measured and theoretically required values indicates the abnormal behaviour. These abnormal situations can cause a significant impact on the safety and economy of the process industry. When the monitoring system responds fast and supports the control of the unusual situation, the economic loss can be significantly reduced. A detailed overview of these systems is given in Bayindir et al. (2011). In chemical processes robust methods are required for process monitoring due to the safety, economic operation and production specifications. Multivariate statistical data-based models techniques are powerful tools for process monitoring. Although multivariate statistical models do not directly reduce operation costs, when financial indicators are 722 calculated based on exact and accurate process values a more realistic picture is available for the decision makers. The most commonly used model is the Principal Component Analysis (PCA). PCA is applied in various areas of chemical engineering, e.g. process monitoring, quality control, disturbance detection, sensor fault diagnosis and process fault diagnosis (Misra, 2002). When a priori model is not available, measured values involve the model, because process variables are linked by a set of constraints, e.g. balance equations. Using PCA the correlation among variables can be found under normal operating conditions and information can be extracted from process data. The main idea of PCA is to replace a large number of interrelated variables by a few uncorrelated variables (Wold, 1987). The performance of model based process monitoring systems highly depends on the quality of the model. Hence, good PCA based solutions require accurate and validated historical process data with high information content. Measurements are always affected by errors due to imperfect instruments, signal transmission, power fluctuation, improper instrument installation and miscalibration. To minimize random errors pre-processing of data is necessary. Data reconciliation (DR) technique is a useful tool, because this method uses the balance equations and physical-chemical laws so the consistency of data is provided. Jiang et al. (2013) summarized the principle of DR and presented a study to illustrate the capability of data reconciliation for operational data accuracy. Sometimes it is difficult to measure important variables, which influence the energy uses, e.g. steam flow. In this case, DR technique is used to reconcile the measurements and to estimate unmeasured variables. DR and PCA were already combined in some applications. It has been shown that data reconciliation can improve the quality and sensitivity of PCA model by reducing the number of principle components (Amand, 2001). In this paper we show a stronger relationship between PCA and DR techniques and we propose a multivariate model based energy monitoring system using the synergistic combination of PCA tools, data reconciliation and flowsheeting simulator. The paper is organised as follows: in Section 2 we describe the synergy between PCA and DR. The application of the proposed fault diagnosis system is illustrated in energy balance of Tennessee Eastman Process. In Subsection 3.1 the analysed process is introduced. The results are presented in Subsection 3.2. Section 4 summarizes the paper with same key results. 2. Similarity of PCA and DR projections PCA and DR both perform optimal projection of the process data into a (linear) multivariate model. The model of PCA is defined by the covariance matrix of the data, while the model of the DR is defined by material and energy balance equations, usually given in a system of linear equations, (A is incidence matrix, x vector contains variables and b is the source vector). The classical data reconciliation is formulated by the following equation: (1) where x represents the measured variables, I is an identity matrix, is the variance matrix of the error, PDR is the projection matrix and c is a constant shift vector. The Projection matrix of PCA is determined based on covariance matrix of normalized data pairs. The covariance matrix (F) is decomposed three matrices with singular value decomposition: , where the columns of U are eigenvectors of covariance matrix, the diagonal elements of S are the eigenvalues, and the columns of V is represent the right singular vectors. According to the number of principal components (p), the first p columns of eigenvectors are selected, and the projection matrix is formulated based on these p vectors as: (2) To compare projection matrices Krzanowski similarity factor is applied. Krzanowski (1979) defined a factor to measure the similarity between matrices by comparing the hyper planes spanned by eigenvectors. This factor characterizes the angle (Θ) between two hyper planes, because Krzanowski similarity factor shows the squared cosine values between all the combinations of the first p principal components from two matrices (X, Y): where Up matrix contains the eigenvectors, p is the number of principal component. (3) 723 Figure 1: The normal vector of the plain is determined by the neglected principal component (PC3) The concept is illustrated in three dimensions (see Figure 1). In this example the third principal component is (PC3) perpendicular to the plane determined by the data, so the neglected eigenvector as a normal vector of the plane can define the coefficients in equation of plain, like . This interpretation is useful when a data – driven model is needed, so balance equations should be detected from the correlation between variables. It should be noted that this approach results in the application of total least squares (TLS) technique. TLS is type of errors-in-variables regression, in which observational errors on both dependent and independent variables are taken into account (Ganger, 2008). With the use of this approach not only the projection matrix of PPCA can be calculated and its similarity to the projection matrix of PDR can be evaluated, but by using TLS coefficients parameters of the balance equations can also be (re)calculated. These approaches are verified based on a case study detailed in Section 3. Projection matrix of data reconciliation and principal component analysis are compared based on Krzanowski similarity factor. Based on data (which come from the process) we determine the relationship between the variables. The results demonstrate how balance equations can improve energy management of complex process technologies. 3. Results and discussion In this work we use energy balances of Tennessee Eastman Process to illustrate the synergy between projection matrices of data reconciliation and principal component analysis. In this section we use the nomenclature of Tennessee Eastman Process, so users of Tennessee Eastman model can easily identify variables if they would reproduce our experiments. The numbers after the variable name identify streams in system (Figure 2 helps coupling the streams and numbers). The operating cost (TS) of the technology is determined by the loss of raw materials (purge stream, byproducts and dissolved reactant in product), compressor work and steam flow: (4) where PC, PrC, CC and SC are the cost of purge, product stream, compressor and steam, FTM(9), FTM(13) are the component flow of purge and product streams, CW is the compressor work, FS is the steam rate. In this study we do not examine how optimization and decision support techniques rely on examined variables and how these variables influence the manipulated variables. We only deal with the reliability of the measurements and how this uncertainty appears in the estimated operating costs. This equation draws attention to importance of accurate measurement of flow rate. In this paper we analysed the effect of flows rates to total cost and present the role of DR and PCA in energy balances. In Subsection 3.1 we present this process and highlight the analysed parts of technology. Simulation results can be found in Subsection 3.2. 724 Figure 2: Analysed streams in Tennessee Eastman Process (numbers identify streams in Eq(5)) 3.1 Tennessee Eastman Process Downs and Vogel (1993) prepared a process model of an industrial chemical process to develop, study and evaluate process control technology. This model is often used for evaluate and compare different data-analysing methods. The system includes five major unit operations: reactor, condenser, gas-liquid separator, compressor and stripper. The gaseous reactants are fed to the reactor. In reactor four reactions take place: all reactions are exothermic and irreversible. The reactor product stream (in gas-phase) passes through a cooler for condensing the products and in the separator the two phases (products and reactant) are separated. The reactants are recirculated; purge stream removes inert components and byproducts from system. The liquid stream of separator contains dissolved reactants which are removed in stripper. The bottom stream of stripper is the product; the overhead stream recycles back to the reactor feed. In Figure 2 we highlight the analysed streams and heat sources. The energy balances of operation units can be defined easily in a matrix form: .    )13()11()10()9()8()7()5()4()3()2()1(' 0)100)8(())8(()(' 000)9(0)7()5(0)3()2()1( )13()11(0000)5()4(000 0)11()10()9()8(000000 0000)8()7(00000 FTMFTMFTMFTMFTMFTMFTMFTMFTMFTMFTM TSTUACTWSTSTUASTWRTCRUARRH HSTHSTHSTHSTHSTHST HSTHSTHSTHST HSTHSTHSTHST HSTHST                    x b A (5) where FTM is the mole flow of streams, HST is the specific enthalpy of streams (it depends on the composition and temperature of stream), RH is the released reaction heat, UAR and UAS are the transferred heat to water in reactor and separator, UAC is the transferred heat from steam, TWR is the reactor temperature, TWS is the separator temperature and TCC is the stripper temperature. Measurements of process variables are always affected by errors, so variables do not satisfy energy balances. Data reconciliation can minimize the balance error ( ) by optimal projection of the project variables to the model equations. Table 1: Square balance error (sum(mean((Ax-b) 2 )) and specific total cost (TC) in case of raw and different ways reconciled values x vectors Square balance error TC ($/h) Raw measurements 12.0662 55,975 Reconciled values with time-dependent A and b matrices 6.72 10 -29 54,271 Reconciled values with time-independent A and b matrices 1.21 10 -28 54,924 Projected points with PCA (5 principal components) 0.5899 56,084 Projected points with PCA (7 principal components) 0.5974 55,974 725 0 2 4 6 8 10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Number of Principal Component K rz a n o w s k i- s im il a ri ty Figure 3: a) Comparison of measured, reconciled with time-dependent and time-independent projection matrix and projected values in case of product stream. b) Similarity of eigenvectors of projection matrix of PCA and DR These model equations represent the dependency of process variables (e.g. stream 11 is the liquid product of separator and the inlet stream of stripper, so the enthalpy of the outlet and inlet stream should be identical – apart from the heat loss in pipeline). PCA can also detect such relationship, so the combined application of PCA and DR can increase the sensitivity and accuracy of energy balance based energy monitoring. 3.2 Results and discussion Firstly we compared the projections of DR and PCA and we analysed how they influence the estimation of the total cost of process. Since temperature is measured more accurate (0.1 - 0.5 %) than flow rates (1.5 - 3 %) (Lipták, 2003), we assumed perfect temperature measurements and focus on balancing the flow rates. This assumption allows the application of linear data reconciliation. Since temperature varies in time, therefore elements of incidence and source matrices (which are non-linear function of the temperature) can change over time resulting in a linear parameter varying (LPV). Table 1 shows that projections based on time dependent or time-independent A and b matrices are almost identical, so the process in the studied operating regime is almost linear. Principal component based projection also increases the reliability of the data, so indirectly it also reduces the balance error. The last column of Table 1 highlights that value of total cost depends on the quality of the variables. It should be noted that neither PCA nor DR reduces the total cost but these techniques give more realistic cost estimation. To illustrate the similar effects of these techniques Figure 3a shows reconciled and projected values of mole flow of A stream (FTM(1)). We also calculated the Krzanowski similarity factor (see Figure 3b) to compare the eigenvectors of the projection matrices of DR and PCA. In the third step we examined how we can get information about balance equation from data using TLS based interpretation of the eigenvectors. Data from steady state operation were analyzed because we tried to avoid the unhandled effects of process dynamics. The heat balance of separator (2 nd model equation is Eq.5 which contains four variables) is used for the demonstration of the approach. We collected the necessary data from simulator (FTM(8:11)). Eigenvalues of the covariance matrix of the data show that there is a strong connection between variables. The first tree principal components define a hyperplane and the remaining one eigenvector defines the parameters of the equation defining of this plane. TLS can estimate these parameters: (6) Since PCA is based on normalized process values, the extracted equation describes the relationship between the normalized variables. When the effect of the normalization is taken into account it can be shown that the extracted model identical to the balance equation, so the PCA based Eq.6 defines the same hyperplane as balance equation based data reconciliation. The proposed technique can be used to verify PCA and DR models and detect significant changes in the process affecting energy efficiency. 90 100 110 120 130 140 150 160 220 230 240 250 260 270 280 290 300 Time (h) In p u t m o le f lo w o f A s tr e a m measured reconciled projected with 5 PC projected with 7 PC projected with 11 PC 726 4. Conclusions Energy monitoring requires validated data and informative alarms related to abnormal operations. Principal component analysis and data reconciliation are widely used techniques to improve the accuracy and reliability of data. We found strong relationship between these techniques, and we presented how we can infer the coefficient matrix of DR from the projection matrix of PCA. The whole concept is illustrated based on the well-known Tennessee Eastman case study. In this study we assumed perfect temperature measurement and balanced flow rates. The resulted linear parameter varying model gave almost the same performance as a global linear model, so we showed that in the studied operating regime linear data reconciliation technique can be effectively applied. The operating cost of the technology has been calculated to show the effect of the projections. It has been shown that increasing the reliability of the data highly modifies the estimated cost, so PCA and DR are useful tools when the estimated cost is used in real time control or optimization. In further work we analyse how model equations of DR and PCA can be merged together, how the analogy of the two techniques can be demonstrated in more complex examples, and how temperature measurements can be balanced to further improve the accuracy of cost estimation. Acknowledgements The research of Barbara Farsang and Janos Abonyi was realized in the frames of TÁMOP 4.2.4.A/2-11-1- 2012-0001 „National Excellence Program – Elaborating and operating an inland student and researcher personal support system convergence program”. The infrastructure of research is supported by the frame of the TÁMOP 4.2.4.A/11/1-KONV-2012-0071 project. The projects were subsidized by the European Union and cofinanced by the European Social Fund. References Amand T., Heyen G., Kalitventzeff B., 2001, Plant monitoring and fault detection: Synergy between data reconciliation and principal component analysis, Computers & Chemical Engineering, 25, 501-507, DOI: 10.1016/S0098-1354(01)00630-5. Bayindir R., Irmak E., Colak I., Bektas A., 2011, Development of a real time energy monitoring platform, International Journal of Electrical Power & Energy Systems, 33, 137-146, DOI: 10.1016/j.ijepes.2010.06.018. Carbon Trust, 2010, Monitoring and Targeting, , Accessed 07.02.2014. Downs J. J., Vogel E. F., 1993, A plant-wide industrial process control problem, Computers & Chemical Engineering, 17, 3, 245-255, DOI: 10.1016/0098-1354(93)80018-I. Ganger W., 2008, The Singular Value Decomposition, , Accessed 15.03.2014. Jiang X., Liu P., Li Z., 2012, A data reconciliation based approach to accuracy enhancement of operational data in power plants, Chemical Engineering Transactions, 35, 1213-1218 DOI:10.3303/CET1335202. Krzanowski W., 1979, Between-groups comparison of principal components, Journal of the American Statistical Society, 74, 367, 703-707, DOI: 10.1080/01621459.1979.10481674. Lipták B.G., 2003, Instrument Engineers’ Handbook, Volume 1, Fourth Edition: Process Measurement and Analysis, CRC PRESS, Boca Raton, Florida, United States of America, ISBN: 0-8493-1082-0 (v. 1) Misra M., Yue H.H., Qin S.J., Ling C., 2002, Multivariate process monitoring and fault diagnosis by multiscale PCA, Computers & Chemical Engineering, 26, 1281-1293, DOI: 10.1016/S0098- 1354(02)00093-5. Rühl C., 2013, BP Statistical Review of World Energy 2013, , Accessed 07.02.2014. Wold S., Esbensen K., Geladi P., 1987, Principal Component Analysis, Chemometrics and Intelligent Laboratory Systems, 2, 37-52, DOI: 10.1016/0169-7439(87)80084-9. Xia, X., Zhang, J., 2010, Energy efficiency and control systems-from a POET perspective, Methodologies and Technology for Energy Efficiency, 1, 255-260, DOI: 10.3182/20100329-3-PT-3006.00047. http://www.math.ethz.ch/education/bachelor/lectures/hs2012/other/linalg_INFK/svdneu.pdf http://www.math.ethz.ch/education/bachelor/lectures/hs2012/other/linalg_INFK/svdneu.pdf