PRES22_0231.docx DOI: 10.3303/CET2294194 Paper Received: 12 April 2022; Revised: 20 May 2022; Accepted: 02 June 2022 Please cite this article as: Du J., Zheng J., Liang Y., Liao Q., Wang B., 2022, A Hybrid Intelligent Method for Predicting Gasoline Octane Number and Optimising Operation Parameters, Chemical Engineering Transactions, 94, 1165-1170 DOI:10.3303/CET2294194 CHEMICAL ENGINEERING TRANSACTIONS VOL. 94, 2022 A publication of The Italian Association of Chemical Engineering Online at www.cetjournal.it Guest Editors: Petar S. Varbanov, Yee Van Fan, Jiří J. Klemeš, Sandro Nižetić Copyright © 2022, AIDIC Servizi S.r.l. ISBN 978-88-95608-93-8; ISSN 2283-9216 A Hybrid Intelligent Method for Predicting Gasoline Octane Number and Optimising Operation Parameters Jian Dub, Jianqin Zhengb, Yongtu Liangb, Qi Liaob, Bohong Wanga,* a National-Local Joint Engineering Laboratory of Harbor Oil & Gas Storage and Transportation Technology/Zhejiang Provincial Key Laboratory of Petrochemical Pollution Control, Zhejiang Ocean University, Zhoushan, 316022, PR China b National Engineering Laboratory for Pipeline Safety/ MOE Key Laboratory of Petroleum Engineering/ Beijing Key Laboratory of Urban Oil and Gas Distribution Technology, China University of Petroleum-Beijing, Fuxue Road No. 18, Changping District, Beijing 102249, PR China wangbh@zjou.edu.cn The Research Octane Number (RON) is a significant indicator for reflecting the combustion performance of gasoline, and the products of gasoline combustion provide a major influence on the atmospheric environment. It is crucial to measure RON during the process of gasoline refining. However, current methods generally use the instrument in the laboratory to measure RON, which is time-consuming and expensive. This paper proposes a hybrid intelligent method for predicting RON and optimising operating parameters to reduce RON loss, aiming to improve combustion performance and reduce pollutant emissions. Considering that the feature engineering and model optimision are time-consuming, the RON prediction model is generated and optimised automatically via automatic machine learning technique. To reduce RON loss, the particle swarm optimisation algorithm is applied to optimise the operation parameters based on predicted RON. Subsequently, a real-world industrial dataset is taken as an example to test the accuracy and effectiveness of the proposed model. The results suggest that the proposed model is more accurate than other advanced models, such as Support Vector Machine (SVM) and Decision Tree (DT), with mean absolute error and root mean squared error being 0.162 and 0.225, 46 % and 44 % reduction of the SVM and DT. The experiment also shows that the RON loss of data samples is reduced by more than 40 % after optimisation. 1. Introduction 1.1 Background As one of the most widely utilised petroleum products, gasoline provides the fuel supply for small vehicles (Dabbagh et al., 2013). With the rapid development of society and economic, the gasoline demand increases continuously in recent years (Demirbas, 2007). The increase in gasoline consumption has also brought serious environmental problems, caused by the exhaust emissions from gasoline combustion (Cui et al., 2021). Stringent gasoline quality standards have been increasingly formulated by many countries (Pan et al., 2018). Many countries supply domestic crude oil dependent on imports, particularly China. In China, there are about 40 %-60 % of imported crude oil is heavy oil, which contains sulphur and olefins and is hard to employ directly. For this reason, China has rapidly developed lightweight technology to convert heavy oil into gasoline, diesel, and light olefins. The research octane number (RON) is the most significant indicator which reflects the combustion performance of gasoline. Each unit reduction of RON results in an economic decrease of 22 $/t. In China, the catalytic cracking has produced more than 70 % of gasoline and brings more than 95 % of sulphur and olefins in refined gasoline. As a consequence, it is necessary to refine the catalytic cracking gasoline to reduce the content of sulphur and olefins in gasoline. However, the RON is decreased while reducing the sulphur and olefins in the process of catalytic cracking. To improve the quality of refined gasoline, catalytic gasoline adsorption desulphurisation technology (S-Zorb) is obtained. Nevertheless, the RON of refined gasoline is difficult to require due to the complexity of operation conditions and measurement equipment. Given this circumstance, it is worthwhile to develop an accurate model 1165 for the gasoline refining process to predict the RON of refined products. The refining process is complex and measurement is diversified, which leads to the high nonlinearity and strong coupling of involved control variables. The historical operating data of the industrial refining process is under-used. In summary, the difficulty to model the refining process of gasoline can be concluded as follows: (1) The refining equipment contains many important control variables, the influence of control variables on the RON of refined products is hard to quantify. (2) It is difficult to establish an accurate model to illustrate the refining process due to the high nonlinearity and strong coupling relationship of control variables. 1.2 Literature review In the past decades, a lot of research has been conducted to model the industrial process of gasoline refining. Among these, the infrared spectroscopy (IR) technique, which has become the most common technique, is combined with chemometric algorithms for remote monitoring and acquiring high information content of spectra. Kelly et al. (1989) applied multiple linear regression (MLR) and partial least squares (PLS) to demonstrate the possibility of predicting RON from the IR data on the 660-1,215 nm spectral range. Based on selected wavenumbers from the mid-IR spectrum, Rashid et al. (1989) proposed a predicting model to determine clear RON, the heat of formation, gravity, and mole fraction. Parisi et al. (1990) employed Near-IR in combination with multivariate calibration to measure online gasoline characteristics, such as RON, cetane number, paraffin, and olefin. Though the traditional model has obtained certain results, the prediction results show less satisfactory due to the high nonlinearity and strong coupling relationship of control variables. In the context of industrial 4.0 and Big Data, many researchers show increasing interest in the huge amount of industrial process data (Dias et al., 2020). The excellent nonlinear approximate function of machine learning can be used to model complex and nonlinear chemical processes. Doicin and Onutu (2014) proposed a prediction method using a neural network to predict the octane number of petroleum mixtures. However, the influencing factors were not included broadly in input properties. Based on the chemical features acquired from IR, Ibrahim and Farooq (2019) utilised an artificial neural network (ANN) and Principal Component Analysis (PCA) to predict octane number. Kubic et al. (2017) developed an ANN-based model on special chemical composition to predict the octane number of hydrocarbons and oxygenated. Though the machine learning models can better demonstrate the nonlinear relationship between control variables and RON, feature engineering is time-consuming and few researchers had provided the optimisation plan to increase the RON of refined products. Despite the great progress made in the above-mentioned research, there are still some limitations summarised as follows: (1) The feature engineering is time-consuming due to lots of operation parameters in the refined process, and the process of optimising control variables further increases the complexity and difficulty of modelling. (2) Few researchers focused on process optimisation to increase the RON of refined products based on prediction models and intelligent algorithms. To overcome the drawback of conventional chemical modelling of gasoline refining, this paper proposes a novel hybrid intelligent framework for RON predicting and process parameters optimisation. At first, the Pearson correlation coefficient (PCC) is calculated to select the input variables. Subsequently, based on the automatic machine learning (AML) technique, the RON prediction model is established. To increase the RON of refined products, the prediction model is combined with the PSO algorithm to optimise control variables. 2. Methodology In this section, the hybrid framework is developed to predict RON and optimise operating parameters of the refined process. The fundamental theories of AML and PSO are briefly introduced initially. Then a hybrid framework in combination with the RON prediction model and parameters optimisation method is proposed. 2.1 Basic principles 2.1.1 Automatic machine learning Despite the advancement obtained in the field of machine learning, the model construction process is time- consuming and the model performance is related to the goodness of parameters optimisation (Zhao et al., 2021). Given this circumstance, the AML technique is proposed to perform repetitive works, such as model selection, model combination, and model optimisation, aiming to improving modelling efficiency (Balaji and Allen, 2018). In this paper, AML technique is applied for automatic model construction. To find the best pipeline for the research dataset, genetic programming is used to automate the most tedious part of machine learning. 1166 2.1.2 PSO algorithm PSO algorithm is first proposed by Eberhart and Kennedy in 1995 (Poil et al., 2007) and has received increasing awareness and application in recent years. In this work, PSO is applied to optimise the operation parameters of the refined process. At first, the size of the population is determined to form the first generation. The dimensional of each individual is determined based on the input features of the RON predicted model. Then, the RON loss (Eq (1)), which is the difference between the RON of material gasoline and the RON of product gasoline, is calculated to conduct a fitness evaluation. The optimal individual and the optimal population value are acquired to update the particles relying on the less RON loss. To this end, the best individual of the group will be found, as the best value will converge within the maximum iteration. The operation parameters are optimised by it. 𝑅𝑅𝑅𝑅𝑁𝑁𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 = 𝑅𝑅𝑅𝑅𝑁𝑁𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑙𝑙 − 𝑅𝑅𝑅𝑅𝑁𝑁𝑝𝑝𝑚𝑚𝑙𝑙𝑝𝑝𝑝𝑝𝑝𝑝𝑚𝑚 (1) where 𝑅𝑅𝑅𝑅𝑁𝑁𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑙𝑙 represents the RON of refining material, which is the catalytic cracking gasoline. 𝑅𝑅𝑅𝑅𝑁𝑁𝑝𝑝𝑚𝑚𝑙𝑙𝑝𝑝𝑝𝑝𝑝𝑝𝑚𝑚 is the RON of gasoline products. 2.2 A hybrid framework for RON prediction and parameters optimisation The main process of the proposed hybrid framework is depicted in Figure 1. The historical operating data of S- Zorb equipment is collected to form a database. The operating data is pre-processed to select input control variables, the output variable is product gasoline’s RON. The process of data pre-processing mainly contains data dimensionality reduction, data normalisation, and data division. Then the prediction model is generated by automating model selection, model combination, model optimisation, and the historical operating data is used to train the model. After the training process is finished, the RON predicted model is constructed. Then, the real- time operating parameters are obtained to select input control variables for the predicted model, and the RON of refined products is estimated through it. Based on the prediction model and intelligent algorithm, the operating parameters are optimised by maximising RON loss reduction. To increase the RON of refined products to improve economic benefits, the initial values of control variables are utilised to initialise the particle swarm. After several iterations, the best value of operating parameters and RON loss reduction tend to converge. Eventually, the operating parameters of the refining process are acquired to decrease RON loss. Figure 1: The hybrid intelligent framework of this work 2.3 Evaluation metrics To evaluate the performance of prediction models, several evaluation metrics are applied, such as mean absolute percentage error (MAPE), root mean square error (RMSE), mean absolute error (MAE), and the R- squared (R2). The formulations of these metrics are as follows: 1 ˆ1 100, % n i i i i y y MAPE n y= − = ×∑ (2) 1167 2 1 1 ˆ( ) n i i i RMSE y y n = = −∑ (3) 1 1 ˆ n i i i MAE y y n = = −∑ (4) 2 2 21 2 1 ˆ( ) 1 1 ( ) n i i i n i i i y y R R y y = = − = − −∞ ≤ ≤ − ∑ ∑ (5) Where 𝑦𝑦𝑚𝑚, 𝑦𝑦�𝑚𝑚 and �̄�𝑦𝑚𝑚 are observed values, predicted values, and average values of RON product. 3. Case study In this section, a real-world industrial dataset is established by collecting operating data of S-Zorb refining equipment. Then, the performance of the proposed hybrid intelligent framework is verified by testing the accuracy of the RON prediction model and optimising the operating parameters to increase product RON. 3.1 Data description The historical operating data accumulated over 4 y from the S-Zorb refining equipment of a petrochemical corporation in China is collected for experiments. The dataset contains the following information: 7 material properties, 2 raw adsorbent properties, 2 regenerated adsorbent properties, 2 product properties, and 354 operation parameters. Before constructing the proposed framework, it is necessary to carry out data pre-processing, which includes data normalisation and data splitting. In this paper, the Min-Max normalisation method (as shown in Eq (6)) is employed to normalised the input data to values between 0 and 1. min max min i i x x x x x − = −  (6) where ix is the input original value, ix is the normalised value. minx and maxx denote the minimum and maximum value of the input original value. After normalising the input data, the data splitting is conducted to ensure the robustness of the proposed framework. In general, the input data will be divided into two data sets, namely, 80 % for training the prediction model (260 data samples), and 20 % for verifying the prediction performance after the training process is finished (65 data samples). To predict the product RON, the operation variables ought to be analysed to determine the main control variables. The PCC between operation variables and product RON is calculated for dimension reduction. A total of 16 variables with high correlation (PCC ≥ 0.4) are selected as the input variables. To this end, the input data is prepared for model construction. 3.2 RON prediction via the hybrid framework After acquiring the input data for RON prediction, the experiments are conducted ten times to obtain the average predicted errors of different models. The programming environment of the prediction models is Python. The model construction of ANN, DTR, and SVR are optimised by the means of trial and errors. For ANN, the number of hidden layers is set as 2, the number of nodes is set as 15, 20, learning rate is 0.001, and Adam optimiser is used. The optimal model constructed by AML technique is an ensemble of extreme gradient boost, stochastic gradient descent, and k-neighbour. The predicted errors of different models are shown in Table 1. By comparing the results predicted by SVR and DTR, it can be determined that the DTR obtains the worst prediction performance and the highest predicted errors. It is suggested that the DTR is not suitable for RON prediction. From the RON curves shown in Figure 2, it can be seen that the RON predicted by ANN are closer to the observed RON, and the predicted errors of ANN are lower than that of SVR. The proposed RON prediction method constructs the model automatically, and the model parameters are optimised by using genetic programming. This not only reduces the time cost in feature engineering and parameters optimisation, also provides better performance to the proposed model, which acquires the lowest predicted errors compared to other machine learning models and previous work. The RON results predicted by the proposed model are much closer than that of other comparative models. 1168 Figure 2: The prediction results of different models Table 1: The predicted errors of different models Metrics AML ANN SVR DTR Previous work (Li and Qin, 2021) MAPE / % 0.207 0.310 0.328 0.347 0.305 RMSE 0.225 0.344 0.390 0.374 0.335 MAE 0.175 0.274 0.290 0.307 - 𝑅𝑅2 0.934 0.785 0.445 0.258 0.860 3.3 Parameters optimisation for increasing product RON The decrease in product RON brings economic loss and reduces the combustion performance. To improve the product RON, the control variables are adjusted to increase product RON. At first, the original values of control variables are used to initialise the particle swarm, and the percentage of RON loss reduction (as shown in Eq (7)) is set to optimise the control variables. The experimental results of the three cases are depicted in Figure 3 and Table 2. For case A, the original value of product RON is 88.65. To increase the product RON, the reduction percentage is set as 40 % to obtain the expected optimised RON, as shown in Eq (8). With the optimisation of operation variables, the product RON is increasing gradually. When the objective function tends to converge, the optimised product RON is obtained. Similar to case A, the experiments of cases B and C set the product RON reduction as 50 % and 60 %. The comparisons of original values and optimised values are shown in Table 2. Each operation variable is adjusted within its corresponding value range. Through the parameter optimisation process, the operation variables are optimised to increase the product RON. optimised product material product RON RON reduction RON RON − = − (7) min( )optimised expectedObjective function RON RON= − (8) where 𝑅𝑅𝑅𝑅𝑁𝑁𝑙𝑙𝑝𝑝𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑙𝑙𝑚𝑚𝑝𝑝 is the optimised product RON. 𝑅𝑅𝑅𝑅𝑁𝑁𝑚𝑚𝑒𝑒𝑝𝑝𝑚𝑚𝑝𝑝𝑚𝑚𝑚𝑚𝑝𝑝 is the expected optimised product RON. Figure 3: The iteration curves of product RON for these three cases 4. Conclusions The accurate and efficient intelligent model for petrochemical refining process analysing is of great significance for improving the combustion performance and reducing pollutant emissions. This paper proposes a novel intelligent framework for RON prediction of refining products and increasing product RON by optimising operation parameters. The proposed framework overcomes the shortcoming of conventional prediction methods which are time-consuming and have poor accuracy in manual optimisation and fills the gap in operation 1169 parameters optimisation. Compared to other advanced models, such as ANN, SVR, and DTR, the proposed framework achieves better prediction results, with MAPE being 0.207 %. Through the iterative optimisation of intelligent algorithms, the RON loss reductions of three cases are improved by at least 40 % higher. Applied in the industrial process of gasoline refining, the proposed method can estimate the product RON in real-time accurately. To further improve the economic efficiency and combustion performance, the operators can employ the proposed framework to increase product RON. Table 2: The optimisation results of model variables Model variables Case A Case B Case C Original values Optimal values Original values Optimal values Original values Optimal values Operation variables S-ZORB.FC_1203.PV 9.86 (t/h) 11.22 (t/h) 10.41 (t/h) 13.02 (t/h) 10.00 (t/h) 12.11 (t/h) S-ZORB.FT_1004.PV 72.24 (t/h) 77.03 (t/h) 75.09 (t/h) 58.63 (t/h) 76.54 (t/h) 79.81 (t/h) … … … … … … … Product property Product RON 88.65 89.15 88.12 88.67 87.95 88.54 Nomenclature RONmaterial – RON of refining material RONproduct – RON of refining products RONoptimised – The optimised product RON yi – Observed RON xi – The input original values xmax – The maximum of the input value xmin – The minimum of the input value Acknowledgments This work was funded by the Science Foundation of Zhejiang Ocean University (11025092122). References Ibrahim A.E., Farooq A., 2019. Octane prediction from infrared spectroscopic data. Energy & Fuels, 34, 817- 826. Balaji A., Allen A., 2018. Benchmarking automatic machine learning frameworks. arXiv preprint arXiv:1808.06492. Cui S., Qiu H., Wang S., Wang Y., 2021. Two-stage stacking heterogeneous ensemble learning method for gasoline octane number loss prediction. Applied Soft Computing, 113, 107989. Dabbagh H.A., Ghobadi F., Ehsani M.R., Moradmand M., 2013. The influence of ester additives on the properties of gasoline. Fuel, 104, 216-223. Demirbas A., 2007. Importance of biodiesel as transportation fuel. Energy Policy, 35, 4661-4670. Dias T., Oliveira R., Saraiva P., Reis M.S., 2020. Predictive analytics in the petrochemical industry: Research Octane Number (RON) forecasting and analysis in an industrial catalytic reforming unit. Computers & Chemical Engineering, 139, 106912. Doicin B., Onuțu I., 2014. Octane number estimation using neural networks. Revista de Chimie, 65, 599-602. Kelly J.J., Barlow C.H., Jinguji T.M., Callis J.B., 1989. Prediction of gasoline octane numbers from near-infrared spectral features in the range 660-1215 nm. Analytical Chemistry, 61, 313-320. Kubic Jr W.L., Jenkins R.W., Moore C.M., Semelsberger T.A., Sutton A.D., 2017. Artificial neural network based group contribution method for estimating cetane and octane numbers of hydrocarbons and oxygenated organic compounds. Industrial & Engineering Chemistry Research, 56, 12236-12245. Li B., Qin C., 2021. Predictive Analytics for Octane Number: A Novel Hybrid Approach of KPCA and GS-PSO- SVR Model. IEEE Access, 9, 66531-66541. Pan L., Liu P., Li Z., 2018. A discussion on China's vehicle fuel policy: Based on the development route optimization of refining industry. Energy Policy, 114, 403-412. Parisi A.F., Nogueiras L., Prieto H., 1990. On-line determination of fuel quality parameters using near-infrared spectrometry with fibre optics and multivariate calibration. Analytica Chimica Acta, 238, 95-100. Poli R., Kennedy J., Blackwell T., 2007. Particle swarm optimization. Swarm Intelligence, 1, 33-57. Rashid H.A., Dekran S.B., Fakhri N.A., Aziz H.J., Hamoudi N.A., 1989. Determination of several physical properties of light petroleum products using IR. Fuel science & Technology International, 7, 237-250. Zhao W., Zhang H., Zheng J., Dai Y., Huang L., Shang W., Liang Y., 2021. A point prediction method based automatic machine learning for day-ahead power output of multi-region photovoltaic plants. Energy, 223, 120026. 1170 PRES22_0414.pdf A Hybrid Intelligent Method for Predicting Gasoline Octane Number and Optimising Operation Parameters Nomenclature