001.docx DOI: 10.3303/CET2189026 Paper Received: 9 July 2021; Revised: 23 September 2021; Accepted: 16 November 2021 Please cite this article as: Rashid N.A., Shamsuddin A., Khu W.H., Lee M.H., Ibrahim N., Abd Hamid M.K., 2021, Quality Prediction Improvement through Adaptive Nonlinear Principal Component Regression, Chemical Engineering Transactions, 89, 151-156 DOI:10.3303/CET2189026 CHEMICAL ENGINEERING TRANSACTIONS VOL. 89, 2021 A publication of The Italian Association of Chemical Engineering Online at www.cetjournal.it Guest Editors: Jeng Shiun Lim, Nor Alafiza Yunus, Jiří Jaromír Klemeš Copyright © 2021, AIDIC Servizi S.r.l. ISBN 978-88-95608-87-7; ISSN 2283-9216 Quality Prediction Improvement through Adaptive Nonlinear Principal Component Regression Nor Adhihah Rashida, Azmer Shamsuddinb, Wai Hong Khua, Muhammad Hisyam Leec, Norazana Ibrahima,*, Mohd Kamaruddin Abd Hamidd a b School of Chemical and Energy Engineering, Faculty of Engineering, Universiti Teknologi Malaysia, 81310 Skudai. Johor Bahru, Johor, Malaysia. Lahad Datu Edible Oils Sdn. Bhd., KM 2, Jalan Minyak Off Jalan POIC, Locked Bag No. 16, 91109 Lahad Datu, Sabah, Malaysia. c d Department of Mathematical Sciences, Faculty of Science, Universiti Teknologi Malaysia, 81310 Skudai. Johor Bahru, Johor, Malaysia. JMH Integrated Services Sdn. Bhd., 34, Jalan PI 10/3, Taman Pulai Indah, 81300 Johor Bahru, Johor Malaysia. norazana@utm.my The purpose of this paper is to present the predictor improvement for the refined palm oil quality based on the adaptive multivariate statistical process control. The time-varying behaviour of the palm oil refinery process has made it difficult for the industrial personnel to monitor and sustain the production of high quality refined palm oil. It will be very costly for the palm oil industries to repeat the refining process for the low quality refined palm oil, to meet the customer’s preference in the market. Alternatively, the quality of the refined palm oil can be measured before the process through a systematic quality monitoring, where the information from the quality analysis and process condition is integrated to develop an efficient quality prediction tool. The hybrid of time- series expansion methods namely recursive window (RW) analysis and exponentially weighted recursive window (EWRW) analysis along with the nonlinear principal component regression based on the nonlinear iterative partial least squares algorithm (NIPALS-PCR) are proposed to develop the adaptive prediction model. The predictor coefficient is then used to predict the refined palm oil quality based on the input quality and process variables. Through the validation with the online data, both NIPALS-PCR RW and NIPALS-PCR EWRW perform better than the NIPALS-PCR static, where the prediction is improved by 95 %. 1. Introduction In real industrial process, accurate prediction of quality variables is important for process control, decision making, safety and reliability of industrial processes (Xu et al., 2017). Due to the complexity of batch process, natural variation of raw material, process setting for different quality production and chemical interactions at different time-series may degrade the quality of the final product (Zhang et al., 2015). The weather and other natural factor (e.g., rain, soil acidity, etc.) may affect the quality of Crude Palm Oil (CPO), and consequently affecting the production of high quality Refined Bleached Deodorized Palm Oil (RBDPO).To maintain the RBDPO quality during the refinery process, an accurate quality control is needed. In current practices, the RBDPO quality can only be determined via chemical analysis after the end product exit the refining process. If the RBDPO quality did not satisfied the specification, the refining plant opts to go for the recycle to reprocess the off-specification RBDPO which eventually increasing the processing cost and lead to processing delay. Alternatively, the recycle stream can be prevented using quality prediction tool where the RBDPO quality can be predicted at the beginning of the process. The plant operator can perform corrective action on the defect quality before the product exits the refining process and the RBDPO quality can be systematically guaranteed. Multivariate statistical prediction model such as principal component analysis (PCA) is the most common method for modeling the relationships between variables. Due to the multicollinearity in the regression coefficient, PCA requires to combine with other regression method such as Principal Component Regression (PCR) to reduce the variance and covariance between the variables. Conventional PCR model is incapable to 151 trace the nonlinear relation and time-varying behaviour of the refinery process, given it is a linear regression model and the predictor coefficient is developed with the assumption that the process mean and variance are constant over time-series (Rashid et al., 2017). The accuracy of static prediction model is reduced when different time-series data is fed to the model, due to the fact that the trained model is no longer represent the current process status (Jeng, 2010). The prediction model must be frequently updated over time to achieve more accurate prediction such that the trained model representing the current process status (Jungmittag, 2014). This paper would like to improve the quality prediction through adaptive nonlinear PCR prediction model using time-series expansion method for designing better prediction tool’s framework. The nonlinear PCR is developed through the combination of nonlinear PCA with the ordinary least square regression, where in this paper, nonlinear iterative partial least square algorithm (NIPALS) is used as nonlinear PCA. The nonlinear PCR is then integrated with the time series expansion methods namely recursive window (RW) analysis and exponentially weighted recursive window (EWRW) analysis to develop the adaptive prediction model. 2. Methodology The main idea of this paper is to develop the enhanced framework through the integration of time-series expansion method with the nonlinear PCR model as illustrated as in Figure 1. The data is collected from a palm oil refinery located in Sabah, Malaysia for three months. The refining plant is producing the RBDPO according to the three major quality specifications which are Vietnam, China and Palm Oil Refiners Association Malaysia (PORAM). Figure 1: Flowchart of prediction model’s framework The requirement of statistical analysis is that, the data must be random and the processing delay must be augmented to ensure the data is competent during the development of prediction model. The data is pre- processed using autocorrelation analysis to ensure the data is random and exhibit zero correlation between previous and current data (Aue et al., 2014). The autocorrelation analysis is conducted by calculating the correlation of the same time series twice; once in its original form and once is a lagged version of itself over successive time interval. The cross-correlation analysis is performed to identify the processing delay, where the output data is staggered by the processing lag while the input variables remain intact, to ensure the input variables are aligned and matched to the output variables (Rosely et al., 2016). The cross-correlation analysis is conducted by identifying the relationship of the data series by calculating a set of correlation values at increasing time delay. The predictor coefficient is generated using NIPALS-PCR model through the combination of nonlinear iterative partial least squares algorithm (NIPALS) and Multiple Linear Regression. NIPALS algorithm is one of the nonlinear Principal Component Analysis (PCA) which decomposed the input variables data into principal component (PC) through the iteration of score and loading. The dimension reduction has been utilized in this study by retaining only relevant PC with optimum amount of cumulative variance percentages which is 100 % and 95 % variation, to prevent the overfitting and underfitting of the prediction model’s performance (Kracík and Strnadel, 2018). The retained PC from NIPALS algorithm is used as input to the multiple linear regressions to generate the predictor coefficients. The adaptive prediction model is developed using time-series expansion method, where the new sample observation is included in the model, and the predictor coefficient is continuously updated over time-series (Zou et al., 2018). For Recursive Window (RW) analysis, the number of observation per window is expanding and increases over time-series since the new observation is included without removing the old observation (Pan and Lee, 2003). For example, in the first window, the first predictor coefficient, k1 is generated from 1st to nth observation number to predict the next observation, n+1th. Then, the data is updated in second window by including the n+1th observation without excluding the 1st observation. The predictor coefficient generated from the second window is then used to predict the n+2th observation. The process is repeated until all out-of-sample observations are predicted. The prediction equation can be expressed as in Eq(1). STAGE 1 PRE-PROCESSING DATA • Data standardization • Autocorrelation analysis • Cross-correlation analysis STAGE 2 DEVELOPMENT OF PREDICTION MODEL • Nonlinear Principal Component Regression based on Nonlinear Iterative Partial Least Squares Algorithm (NIPALS-PCR) STAGE 3 DEVELOPMENT OF TIME- SERIES EXPANSION ANALYSIS • Recursive Window (RW) analysis • Exponentially Weighted Recursive Window (EWRW) analysis STAGE 4 VALIDATION AND VERIFICATION • Testing (online) data • Quality monitoring • Mean Absolute Error (MAE) calculation 152 y�i=k(i-1),1xi,1+k(i-1),2xi,2+k(i-1),3xi,3+⋯+k(n-1),pxn,p (1) where 𝑦𝑦� is the predicted output variable; 𝑥𝑥 is the input variables; k is the predictor coefficient between input and output variables; i is the observation number (i=1,2,3…,n); j is the input variables (j=1,2,3,…,p). As expressed in Eq(1), the previous predictor coefficient is used to predict the next or future observation. The Recursive Window analysis can be improved via Exponential Weight Recursive Window (EWRW) analysis. Exploiting the advantage of Exponential Weight Moving Average (EWMA) statistic, the EWRW analysis is developed using the predictor coefficients from recursive window analysis, by assigning the weights to the coefficients such that the present predictor coefficient gets a larger weight, while previous predictor coefficient gets smaller weight. The EWRW predictor coefficient sign statistic can be expressed as in Eq(2). kEWRW,i= λkRW,i + (1-λ)kEWRW,(i-1) (2) where λ is the smoothing constant between 0 and 1. Larger λ indicates little memory on the past predictor coefficient, giving more weight to the recent predictor coefficient, while smaller λ indicates the statistic gives high importance to past predictor coefficient. Monitoring chart is a graph used to monitor the process or quality changes over time (Hrehova, 2016) and constructed based on the three RBDPO quality specifications which are China, Vietnam and Palm Oil Refiners Association Malaysia (PORAM). The actual and predicted output values are plotted on the chart to visually monitor the possible quality deterioration that is the point located beyond the specification limit. The reliability of the prediction models can be analyzed from the consistency of the prediction error from training to testing. The prediction error is calculated using Mean Absolute Error (MAE) as expressed in Eq(3). 𝑀𝑀𝑀𝑀𝑀𝑀 = ∑|𝑦𝑦 − 𝑦𝑦�| 𝑛𝑛 (3) where y� is the predicted output variable; y is the actual output variables; n is the total number of sample. The improvement of adaptive prediction models from the static model is measured using Unscaled Mean Bounded Relative Absolute Error (UMBRAE) as expressed in Eq(4) (Chen et al., 2017). The percentage improvement is calculated as (1-UMBRAE)*100 %, where the positive percentage value indicates the adaptive model performs better than the static model and vice versa. UMBRAE= 1 n ∑ � ��y�i - yi�adaptive� ��y�i - yi�adaptive�+��y�i - yi�static� �ni=1 1-�1 n ∑ � ��y�i - yi�adaptive� ��y�i - yi�adaptive�+��y�i - yi�static� �ni=1 � (4) 3. Results and discussion The data is statistically analyzed to ensure data is random and the input-output data is matched to each other. Figure 2a is a plot of autocorrelation value against the time lag, where the point falls inside the confidence band (blue horizontal line) is said to be statistically insignificant or random. In time-series, the data is sequentially taken in time order and cannot be solely used, since the current output quality is not the actual output value for the current input. Figure 2b is a plot of cross-correlation value against the time lag, where the three lags (i.e. first lag touches the zero correlation value) are determined as the processing lag or processing delay. The input- output data is matched to each other, when the output data is shifted by the three lags. Figure 2: (a) Autocorrelation plot, (b) Cross-correlation plot 0 10 20 -0.5 0 0.5 1 Lag A ut oc or re la tio n FFAin A t l ti (a) (b) 0 2 4 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 IVin with IVout 153 The performance of the prediction models is evaluated in terms of deviation error between the actual and predicted output quality using mean absolute error (MAE) as shown in Figure 3. A smaller MAE value, approaching to zero value indicates the predicted output is less deviated from the actual output value. The comparisons of MAE are made for the NIPALS-PCR static model, NIPALS-PCR RW model and NIPALS-PCR EWRW model. The performance of the prediction models was also compared in term of percentage of retained variance, by using 100 % and 95 % variation. The qualities of interest for RBDPO are Free Fatty Acid (FFAout) content, Moisture content (MOISTout), Iodine Value (IVout) and Colour (COLORout). Figure 3: Mean Absolute Error of (a) NIPALS-PCR using 100 % PC Variation, (b) NIPALS-PCR using 95 % PC Variation The performance consistency is described as the consistency of MAE range value from the training data to testing data. If the prediction model produced a consistent error range, the model is said to be well developed and does not suffer from overfitting or underfitting, and any type of data can fit into the prediction model. Figure 3a shows that the NIPALS-PCR with 100 % PC variation produced inconsistent error from training to testing data. The smallest MAE value (approximately to zero error value) during the training indicates that the predicted output value perfectly fit and follows the trend of the actual output value, but performed poorly when testing data is used as depicted by high MAE value for all prediction models. The NIPALS-PCR model using 100 % PC variation retained all the variability in the training data. The model learns the noise in the training data too well but does not generalized. The model is said to be ‘overfitted’ and negatively affects the prediction performance on new data (Kracík and Strnadel, 2018). The NIPALS-PCR with 95 % PC variation shows a consistent error range from training to testing data (Figure 3b), and is said to be well developed and can fit to any type of data. The NIPALS-PCR model using 95 % PC variation retained only 95 % of the variability in the training data. Based on the prediction error performance, the prediction model with reduced percentage variation is sufficient to produce a reliable prediction performance. In comparison of prediction analysis performance, the adaptive prediction models which are NIPALS-PCR RW model and NIPALS-PCR EWRW model, is outperformed the NIPALS-PCR static model. Although the prediction performance for NIPALS-PCR with 100 % PC variation (Figure 3a) shows a significance improvement from static model to adaptive model, still there is a large gap of error range from the training, which indicates that the model’s overfitting cannot be reduced using adaptive prediction model. For NIPALS-PCR with 95 % PC variation (Figure 3b), the performance of adaptive prediction model specifically the NIPALS-PCR RW model, improved slightly from the NIPALS-PCR static model, but still producing the error within the error range of training data. Table 1 shows the percentage improvement of adaptive prediction model from the static model for NIPALS- PCR using 95 % PC variations. The adaptive prediction models show significance improvement from static model for all RBDPO quality. Both NIPALS-PCR RW and NIPALS-PCR EWRW models give similar average percentage improvement that is 95 % which indicates that the two adaptive models have equivalence prediction performances. Table 1: Percentage improvement of adaptive prediction performance using 95 % PC variation Percentage Improvement FFAout MOISTout IVout COLORout Average Percentage Improvement RW model 93.91 % 91.77 % 99.89 % 94.52 % 95.02 % EWRW model 93.74 % 92.02 % 99.89 % 94.59 % 95.06 % 0.00E+00 2.00E-02 4.00E-02 6.00E-02 8.00E-02 1.00E-01 1.20E-01 1.40E-01 1.60E-01 1.80E-01 FFAout MOISTout IVout COLORout Training Data Testing data-Static model Testing data-RW model Testing data-EWRW model 2. 97 91 E -1 6 3. 81 63 9E -1 7 1. 42 10 9E -1 4 3. 49 35 E -1 5 0.00E+00 5.00E-02 1.00E-01 1.50E-01 2.00E-01 2.50E-01 FFAout MOISTout IVout COLORout Training Data Testing data-Static model Testing data-RW model Testing data-EWRW model (a) (b) 2. 98 x1 0- 16 3. 82 x1 0- 17 1. 42 x1 0- 14 3. 49 x1 0- 15 154 Monitoring chart is constructed for the actual and predicted value of the RBDPO quality, to observe the accuracy of the prediction, by comparing the trend and pattern of the actual and predicted value. The chart is also used to determine the specification of the predicted output quality. The quality is categorized in three specifications which are Vietnam, China and PORAM. These three specifications have similar quality value for MOIST(< 0.10 %), IV (50 - 55 Wijs) and COLOR (3 red colour), but for FFA quality, there are two quality value which are < 0.10 % (PORAM and Vietnam) and < 0.07 % (China). Figure 4 compares the actual and predicted output value of FFA quality between the NIPALS-PCR using 100 % PC variation 95 % PC variation. As can be seen from Figure 4, the deviation of predicted value from actual value is very large for 100 % PC variation (Figure 4a) compared to 95 % PC variation (Figure 4b). The predicted value using 100 % variations shows a random trend where the predicted output from three prediction models exceeding the China specifications, and to be specific, the predicted output of NIPALS-PCR static model also exceeding the PORAM and Vietnam specification. By referring to this prediction performance, the refining plant can produce RBDPO quality with PORAM and Vietnam specification using NIPALS-PCR RW and NIPALS-PCR EWRW prediction models. Several monitoring and process adjustment actions need to be planned by the plant manager if the refining plant wants to produce the RBDPO quality with China specification. The findings show that the integration of time-series expansion method can improve the accuracy and reliability of the prediction performance. Figure 4: Monitoring chart of actual and predicted free fatty acid quality of (a)NIPALS-PCR using 100 % PC variation, (b)NIPALS-PCR using 95 % PC variation 0 10 20 30 40 50 60 70 80 90 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 Sample number Fr ee F at ty A ci d (F FA ),% y USL-PORAM & VIETNAM USL-CHINA FFA Actual Static model RW model EWRW model (a) 0 10 20 30 40 50 60 70 80 90 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 Sample number Fr ee F at ty A ci d (F FA ),% y USL-PORAM & VIETNAM USL-CHINA FFA Actual Static model RW model EWRW model (b) 155 The predicted value using 95 % variations shows a predictable trend where the predicted output from three prediction models are not exceeding the PORAM, Vietnam and China specifications. The NIPALS-PCR RW and NIPALS-PCR EWRW models show a better prediction trend which is close to the actual output value than the NIPALS-PCR static model. This excellent performance shown by the NIPALS-PCR using 95 % variation is important to help the refining plant in predicting the incoming product quality with no false alarm. By referring to this prediction performance, there is no critical action required during the production since the RBDPO quality is successfully predicted within all quality specification. The findings show that the prediction models can be improved through dimension reduction by retaining optimum percentage of variance. The monitoring chart acts as guidelines to the plant operators at the beginning of the refinery process. Action plans such as identifying the time where off-specification input is expected to come into the process and actions for adjustment on the off- specification products should be prepared, specifically during the production of high-quality specification end product such as China specification. 4. Conclusions The prediction using static model shows random prediction trend, and is incapable to predict the actual quality specification, which can bring in false alarm to the process. Through the integration of MSPC prediction model with time-series expansion method, the process variation is reduced and the predicted output follows the trends of the actual output quality. The prediction performance is improved by 95 % which depicts that the adaptive prediction model is more capable to predict the actual quality specification of the RBDPO quality. The improvement of quality prediction can bring many benefits to the palm oil refinery plants in the form of monetary and energy savings through the prevention of the product recycle. This allows the refining plant to operate at optimum level and minimizing the process downtime. As data sample increases, the proposed models lead to a slower speed of model adaptation. This limitation can be overcome through the hybrid of two time-series expansion method such as recursive window analysis with moving window analysis. Acknowledgments The authors would like to acknowledge the financial support by Universiti Teknologi Malaysia (R.J130000.7351.4B572). References Aue A., Norinho D.D., Hörmann S., 2014, On the prediction of stationary functional time series, Journal of The American Statistical Association, 110(509), 378-392. Chen C., Twycross J., Garibaldi J.M., 2017, A new accuracy measure based on bounded relative error for time series forecasting, PLoS ONE, 12(3), 1–23. Hrehova S., 2016, Predictive model to evaluation quality of the manufacturing process using MATLAB tools, Procedia Engineering, 149, 149-154. Jeng J.C., 2010, Adaptive process monitoring using efficient recursive PCA and moving window PCA algorithms, Journal of the Taiwan Institute of Chemical Engineers, 41, 475-481. Jungmittag A., 2014, Combination of forecasts across estimation windows: An application to air travel demand, Journal of Forecasting, 35(4), 373-380. Kracík J., Strnadel B., 2018, A statistical model for lifespan prediction of large steel structures, Engineering Structures, 176, 20-27. Pan Y., Lee J.H., 2003, Recursive data-based prediction and control of product quality for a PMMA batch process, Chemical Engineering Science, 58, 3215-3221. Rashid N., Rosely N.M., Noor M.M., Shamsuddin A., Hamid M.M., Ibrahim K., 2017, Forecasting of refined palm oil quality using principal component regression, Energy Procedia, 142, 2977-2982. Rosely N., Rashid N., Noor M., Hawi N., Sepuan S., Shamsuddin A., Abd. Hamid, M.K., 2017, Product sampling time and process residence time prediction of palm oil refining process, Chemical Engineering Transactions, 56, 1411-1416. Xu Y., Mi C., Zhu Q.-X., Gao J.-Y., He Y.-L., 2017, An effective high-quality prediction intervals construction method based on parallel bootstrapped RVM for complex chemical processes, Chemometrics and Intelligent Laboratory Systems, 171, 161-169. Zhang L., Li Y., Wang Q., Yan C., 2015, Prediction model for steel/slag interfacial instability in continuous casting process, Ironmaking & Steelmaking, 42(9), 705-713. Zou M., Zhao L., Wang S., Chang Y., Wang F., 2018, Quality analysis and prediction for start-up process of injection molding processes, 10th IFAC Symposium on Advanced Control of Chemical Process ADCHEM 2018, 25th-27th July, Shenyang China, 51(18), 233-238. 156 026.pdf Quality Prediction Improvement through Adaptive Nonlinear Principal Component Regression