Microsoft Word - 476hernandez.docx CHEMICAL ENGINEERING TRANSACTIONS VOL. 43, 2015 A publication of The Italian Association of Chemical Engineering Online at www.aidic.it/cet Chief Editors: Sauro Pierucci, Jiří J. Klemeš Copyright © 2015, AIDIC Servizi S.r.l., ISBN 978-88-95608-34-1; ISSN 2283-9216 PCA Based Data Reconciliation in Soft Sensor Development – Application for Melt Flow Index Estimation Barbara Farsang*a, Imre Baloghb, Sándor Németha, Zoltán Székvölgyib, János Abonyia aUniversity of Pannonia, Department of Process Engineering, Veszprém, P.O. Box 158, H-8201, Hungary bTisza Chemical Group Public Limited Company, Tiszaújváros, P.O. Box 20, H-3581, Hungary farsangb@fmt.uni-pannon.hu Melt flow index (MFI) is a very important property of thermoplastic polymers. Laboratory measurements follow standard methods (ASTM D1238 or ISO 1133) to determine MFI and give accurate values, but these measurements are available only in 2-4 hour sampling intervals. Using soft sensors real time estimation of MFI is available for process control and monitoring. When detailed knowledge about the process is not available, data-driven soft sensors can be applied. In this case historical process data are used to build statistical models to determine the relationship between inputs and outputs. Since these methods are based on measurements, the performance of soft sensor depends on the quality of data. Measurements are always affected by errors so pre-processing of data should be necessary. The measurement noise and process variable can be correlated with each other so one opportunity to improve measurement accuracy is using multivariate statistical methods (PCA, PLS). Statistical methods can be improved when phenomenological knowledge is taken into account (e.g. balance equations). The aim of the presented research is to propose a methodology to support the data-driven development of process monitoring systems. We developed a method which improves the effectiveness of data based MFI soft sensor. This method includes advantages of a priori knowledge based models and data-driven multivariate statistical process monitoring tools. As a case study we developed a soft-sensor to estimate MFI of the products of industrial polypropylene reactor at TVK Plc. The proposed method is able to detect undesirable operation states and it can be used for fault detection. 1. Introduction Advanced process control and monitoring algorithms are based on process variables which are not always measurable or they are measured off-line. Hence, for the effective application of these tools there is a need for estimation algorithms that are based on the model of the process. In the last few years several methods are developed to estimate unmeasured variables. Soft sensor is a mathematical model which can estimate unmeasured, but important variables from other easily measured variables using computational models. The advantage of these inferential measurements compared to the laboratory measurement is that estimation happens in real time so information is continuously available to improve process control and optimization and increase process safety. Kadlec published a very comprehensive review of the applications of soft-sensors in process industry (Kadlec, 2009). In polymerization industry more important variables are immeasurable in real time, e.g. conversion rate in a reactor. Teran and Machado built a neural network based soft sensor which can estimate the conversion rate online from temperature and pressure of polystyrene reactor, percent of blowing agent, residual monomer and the average molecular weight (Teran, 2011). Melt flow index (MFI) is an important variable to determine the product quality in the industrial ethylene or propylene polymerization process. It gives information about flow of molten thermoplastic polymers; the high MFI means low viscosity and low molecular weight. It is defined as the mass of polymer, in grams, flowing in ten minutes through a capillary of a specific diameter and length by a pressure applied via prescribed alternative gravimetric weights for alternative prescribed temperatures. So DOI: 10.3303/CET1543260 Please cite this article as: Farsang B., Balogh I., Nemeth S., Szekvolgyi Z., Abonyi J., 2015, Pca based data reconciliation in soft sensor development – application for melt flow index estimation, Chemical Engineering Transactions, 43, 1555-1560 DOI: 10.3303/CET1543260 1555 the exact MFI value is determined by an accurate flow measurement, which can be easily measured in the laboratory. Since laboratory measurements are periodically, continuous information is not available for monitoring system. But MFI involves a lot of information about the product quality. So online monitoring of MFI is recommended to operators can get information about changes in process. In the last few years several algorithms are developed to estimate MFI. Feil and co-workers developed a semi- mechanistic (neural network) model to estimate MFI of polyethylene (Feil, 2004). Sharmin used PLS algorithm to build a soft sensor to predict melt flow index (Sharmin, 2006). Zhang and Liu made a detailed summary about history of neural network based soft sensors and they improved this method. They used an adaptive fuzzy neural network to estimate MFI of polypropylene products (Zhang, 2013). New research area is least squares support vector machine based soft sensors which are applied in chemical industries (Cheng, 2015). This theme is significant not only in the academic world. The practical significance of the theme is shown in case of a new project in VIKKK Research Center at the University of Pannonia supported by the largest Hungarian polymer production company (TVK Plc.). Our task is to develop a software sensor, which continuously calculates the melt flow index from other online measured process variables. The laboratory measurements can be used to validate the soft sensor and in further check the reliability of model. We developed a hybrid model to estimate melt flow index from other easily measured variables. This technique uses empirical parameters and a priori information based model. The aim of this article to show how data based models can improve the effectiveness of hybrid models to estimate MFI of polypropylene. In Section 2 we introduce the Principal Component Analysis (PCA) based methods and we show how probabilistic PCA can support the estimation algorithm. In Section 3 we introduce the principle of polypropylene process in case of Spheripol technology and results are summarized. Some conclusion is drown in Section 4. 2. Role of Principal Component Analysis in soft sensor development PCA maps the data points into a lower dimensional space, which is useful in the analysis and visualization of the correlated high-dimensional data. Any PCA based method shall be applied only if the examined process is linear. Unfortunately, many processes are not linear thus PCA should not be used. In addition, original PCA method assumed that time dependency does not exist between data points. This drawback can be solved when PCA is dynamited: 1 … … (1) where A1…Am and B1…Bn are coefficient matrix, tm and tn show time dependence U(k) is the kth sample of input, Y(k) is the kth sample of output. In case of standard PCA method variables can be formed in the following way: X=[Y,U]. Ku suggested that X data matrix should be formed by considering the process dynamics at every sample points (Ku, 1995). Generally speaking, every sample points were completed with the points they can be depended on, i.e. the past n values (n-order model):             −+−+ −+−+++ −− = )()(...)()( )1()1(...)1()1( )()(...)()( nmkUnmkYmUmY nkUnkYkUkY nkUnkYkUkY X  (2) Tipping and Bishop (1999) developed a method called Probabilistic Principal Component Analysis (PPCA). The probability model offers the potential to extend the scope of conventional PCA method. One of the most important advantages of PPCA models is that it allows their combination into mixture of models (Abonyi, 2005). Figure 1: Proposed approach to develop a soft sensor DPCA PPCA with clustering Empirical model A priori model Estimated parameter 1556 PPCA is based on an isotropic error model. It seeks to relate a p-dimensional observation vector y to a corresponding k-dimensional vector of latent (or unobserved) variable x. The relationship is (1) where y is the row vector of observed variable, x is the row vector of latent variables, vector μ permits the model to have a nonzero mean and ε is the isotropic error term. Standard principal component analysis- where the residual variance is zero- is the limiting case of PPCA. PPCA assumes that the values are missing at random through the data set. This means that whether a data value is missing or does not depend on the latent variable, the observed data values are given. There is no closed-form analytical solution for W, so their estimates are determined by iterative maximization of the corresponding loglikelihood using an expectation- maximization (EM) algorithm. Figure 1 shows our proposed approach to development a soft sensor to predict melt flow index. Firstly credibility test is made using dynamic PCA method. Then the whole operating range is separated to smaller groups called clusters if it is necessary (reason can be the magnitude difference). Then in each cluster projected data are used to identify the coefficients of empirical model to estimate the currently MFI value of produced polymer in case of current operating parameters. Then these values are used in a priori model. In the next chapter we introduce this approach based on a case study. 3. Application in Melt Flow Index estimation Adequacy of proposed approach is applied in development of a data driven soft sensor which estimates the melt flow index (MFI) real time. In the first subsection technology is described then the results of analysis are introduced. 3.1 Description of the process Tisza Chemical Group Plc. (TVK) is the largest petrochemical company of Hungary where polymer raw materials (ethylene, propylene, butylenes, etc.) are produced by steam cracking of naphtha or gasoline. In PP-3 plant polypropylene is produced by Spheripol process. This process consists of the following operation steps: catalyst activation, pre-polymerization, polymerization (loop reactor), separation, polymerization (gas phase), steaming, drying. Typical process parameters are summarized in Table 1. As shown in Figure 2, two loop reactors are in TVK plant. The final MFI value is mainly influenced in the two loop reactor so our task was to develop a soft sensor which can estimate online the MFI value from other measured parameters. In Spheripol process MFI is controlled by hydrogen concentration in reactors, but that it is not the one important variable. So firstly we isolated the more important variables that have got a large effect to the MFI values. We found that hydrogen concentration, ethylene mass flow and triethyl-aluminium (TEAL):donor ratio are the main variables which we have to take into account. Then we separate these measured data into 4 groups because one model cannot be described exactly the whole process due to the magnitude differences; the basic of grouping was the magnitude of MFI value and the type of catalyst. Then we built empirical polynomial model (3) to estimate the produced MFI in reactors in case of all intervals. Since we had got parallel measurements we can validate these models. We had to build a filtering algorithm to the soft sensor due to that empirical models are based on noisy data so the estimation cannot be exact. Table 1: Typical operating parameters of Spheripol process (Csernyik, 2010) Process step Temperature (°C) Pressure (bar) Catalyst activation 10 40 Pre-polymerization 20 35 Polymerization (loop reactor) 70 34 High pressure separation 90 18 Polymerization (gas phase) 75-80 10-14 Steaming 105 1.2 Drying 90 1.1 , , , , (3) 1557 In order to we can describe product changes exactly in time we have to analyse the dynamic behaviour of system. We take into account the produced MFI in each reactor, the effect of blending, input and output MFI of reactors. The dynamic model can be written as: ∙ , ∙ (4) ∙ ∙ , ∙ (5) where Pr is the productivity in loop reactors (it is calculated from production rate and conversion), MFIp is the resulting MFI in reactors (it is calculated based on empirical model), Mpp is the mass of polymer in the reactor at any given moment, a is an identified parameter (value: -0.282). In other words, results of empirical model are used in a priori information based model. Since measured, erroneous data are used for identification of parameter of empirical model, the results of soft sensor can be inaccurate and filtering algorithm is needed. So we aimed to improve the effectiveness of soft sensor. The first step is the elimination of the measurement noise from the basic process values measurements. We used different type of PCA method: original PCA method and dynamic PCA. Then we used probabilistic PCA and clustering method to check the adequacy of the grouping. In the next subsection we introduce the results of analysis. 3.2 Results and discussion Measured process data are available from PP-3 plant. We have got data from 1 y interval: the sampling interval in case of continuous variables is 1 min, while laboratory measurements are in every 2nd – 3rd hours. Numerous polymer grade are produced in this plant. We can analyse steady state and dynamic state of technology. To increase the acceptability of measurements we used PCA and first order dynamic PCA. We have got 22 measured variable, but we find the most of the variance of system can be described with the first 8 principal component. So we projected the 22 dimensional data to an 8 dimensional space. Figure 3 shows the measured and projected PCA values. Since they are industrial process data, we scaled them into a 0-100 interval. Illustrated data basis contains data from steady state and dynamic state. Figure 2: Flowsheet diagram of Spheripol technology with loop reactors to process polypropylene (Csernyik, 2010) 1558 Figure 3: Results of PCA (A) and dynamic PCA (B) If we use only the original PCA, the accuracy of measured data can be increased, but we do not take into account the time dependency. If we analyse data from dynamic state and apply the dynamic PCA we can see that some of data are outliers. When we built the measured data based soft sensor the estimation of empirical model was difficult in case of large MFI group because these values are higher than others so the estimated data are far from the measured data. If outliers are skipped, the identification of model parameters was more successful and we can increase the efficiency of soft sensor. In Figure 4 A we can see the organised measured MFI values. We can separate two groups easily: small and large MFI range, because they are obviously differing from other groups. In case of middle MFI range the separation is not unequivocal so we separate the products based on catalyst type. We used probabilistic PCA and clustering algorithm to see which groups are recommended based on main variables (principal components). The result of analysis is visible in Figure 4 B. It shows what the probability is that given point belongs to a group. We can see, that the small and large MFI range form clearly separate groups. In the middle range can be separated into 2 clusters, but in this case the basic of grouping is the MFI range. Then we compare the result of the measured data based and the projected data based soft sensor. We get a more reliable and accurate soft sensor to estimate the MFI value in time, because the mean square error is decreased more than 10 %. Figure 4: A) Scaled measured MFI values and groups about magnitude of MFI. B) Results of probabilistic principal component analysis and clustering algorithm. 0 100 200 300 400 0 20 40 60 80 100 Points S c a le d M F I Measured Projected 0 100 200 300 400 0 20 40 60 80 100 Points S c a le d M F I Measured Projected 0 100 200 300 400 500 600 0 20 40 60 80 100 Ordered points S c a le d M F I 100 200 300 400 500 600 0 0.2 0.4 0.6 0.8 1 Ordered points P ro b a b ili ty 1 2 3 4 1559 4. Conclusions In chemical processes some important variables cannot be measured or online measurements are not available. Some properties can be measured in laboratory but data are available periodically. Soft sensors can be used to estimate these values from other easily measured values. Our aim was to develop a soft sensor which can estimate melt flow index of polypropylene product in real time. Firstly we built a measured data based soft sensor. We developed a hybrid model, because empirical and a priori information are used for estimation. Since measured data are erroneous, the measurement noise can influence the effectiveness of hybrid model. So before the hybrid model data should be pre-processed. We compared the original PCA and dynamic PCA methods. Steady state PCA can increase the accuracy of measurements, but it does not take into account the time-dependency. So dynamic principal components analysis was used and this technique can isolate the outliers or gross errors. In polymer plant more products are produced which have got different melt flow indexes. Since magnitude difference exists between them, we should grouping data, because one model cannot describe the whole process. Firstly we separated products based on type of catalyst and MFI values. Then we used probabilistic principal component analysis with clustering algorithm to check the rightness of groups. We found that clusters are similar to our groups. Then we re-identified the empirical parameters based on projected data in case of all clusters and we experienced that the projected data based soft sensor results a more reliable MFI values. Acknowledgements This research of Barbara Farsang and Janos Abonyi was realized in the frames of TÁMOP 4.2.4. A/2-11-1- 2012-0001 „National Excellence Program – Elaborating and operating an inland student and researcher personal support system convergence program” The project was subsidized by the European Union and co- financed by the European Social Fund. This research was supported by the European Union and financed by the European Social Fund in the frame of the TÁMOP 4.2.2.A/11/1-KONV-2012-0071 project. References Abonyi J., Feil B., Nemeth S., Arva P., 2005, Modified Gath–Geva clustering for fuzzy segmentation of multivariate time-series, Fuzzy Sets and Systems, 149, 39–56 DOI: 10.1016/j.fss.2004.07.008. Cheng Z., Liu X., 2015, Optimal online soft sensor for product quality monitoring in propylene polymerization process, Neurocomputing, 149, 1216–1224 DOI: 10.1016/j.neucom.2014.09.006. Csernyik I. PP technology, , accessed 05.12.2014 Feil B., Abonyi J., Pach P., Nemeth S., Arva P., Nemeth M., Nagy G., 2004, Semi-mechanistic Models for State-Estimation – Soft Sensor for Polymer Melt Index Prediction, Artificial Intelligence and Soft Computing, 3070, 1111-1117. Kadlec P., Gabrys B., Strandt S., 2009, Data-driven soft sensors in the process industry. Computers and Chemical Engineering, 33(4), 795-814. DOI: 10.1016/j.compchemeng.2008.12.012. Ku W., Storer R.H., Georgakis C., 1995. Disturbance Detection and Isolation by Dynamic Principal Components Analysis. Chemometrics and Intelligent Laboratory Systems. 30, 179-196. DOI: 10.1016/0169-7439(95)00076-3. Sharmin R., Sundararaj U., Shah S., Griend L.V., Sun Y.J., 2006, Inferential sensors for estimation of polymer quality parameters: Industrial application of a PLS-based soft sensor for a LDPE plant, Chemical Engineering Science, 61(19), 6372-6384, DOI: 10.1016/j.ces.2006.05.046. Teran R.A.C., Machado R.A.F., 2011, Soft-sensor based on artificial neuronal network for the prediction of physic-chemical variables in suspension polymerization reactions, Chemical Engineering Transactions, 24, 529-534, DOI: 10.3303/CET1124089. Tipping M.E., Bishop C.M., 1999, Mixtures of probabilistic principal component analyzers, Neural Comput., 11(2), 443–482. Zhang M., Liu X., 2013, A soft sensor based on adaptive fuzzy neural network and support vector regression for industrial melt index prediction, Chemometrics and Intelligent Laboratory Systems, 126, 83–90, DOI: 10.1016/j.chemolab.2013.04.018. 1560