Microsoft Word - CET--006.docx CHEMICAL ENGINEERING TRANSACTIONS VOL. 59, 2017 A publication of The Italian Association of Chemical Engineering Online at www.aidic.it/cet Guest Editors: Zhuo Yang, Junjie Ba, Jing Pan Copyright © 2017, AIDIC Servizi S.r.l. ISBN 978-88-95608- 49-5; ISSN 2283-9216 Study on Computer-assisted Infrared Spectroscopy for Identification of Chemical Structure System Daping Denga, Yameng Baib, Xiaohong Denga, Xiaoyun Xiea, Jiuming Yana a College of Applied Science, Jiangxi University of Science and Technology, Ganzhou 341000, China b College of information engineering, Jiaozuo University, Jiaozuo 454000, China apd21105@21.cn In the past decades, people are trying to search the way to analyze the infrared spectra. Along with the computerization of the commercialized infrared spectroscopy, there are many computer-assisted identification methods of infrared spectroscopy. For decades, people have been exploring the empirical analysis of infrared spectroscopy. These methods can be divided into three categories: expert system; spectrum retrieval system and pattern recognition method. The most commonly used pattern identification methods are artificial neural network and partial least squares. The literature shows that the prediction accuracy of the structural fragments is not very high, and the neural network is still unstable, easy to fall into the local optimal and slow convergence and other issues. In this paper, the support vector machine is used to analyze the sub-structure of infrared spectroscopy. The vector machine is a good machine learning algorithm for small sample system. For most of the substructures, the predictive ability of support vector machines is better. The support vector machine also has the advantages of stability and fast training speed. It is a good tool for assistant analysis of infrared spectrum. 1. Introduction Infrared test technology is gradually built and developed in 1800 by the physicist W. Herschel after found the infrared radiation (Zhang, et al., 2016). In the early 1950s, the infrared spectrometer came out, and the infrared spectroscopy was widely developed, which opened a new stage in the identification of organic structure. A wealth of infrared spectral data was accumulated till the late 1950s. The infrared spectroscopy has been the most important method to identify the organic compounds till the mid-70s (Mecozzi & Sturchio, 2017). In recent decades, the advent of Fourier transform infrared spectroscopy and the emergence of some new technologies (such as emission spectra, photoacoustic spectroscopy, color-red combination, etc.) have made the infrared spectrum more widely used. The infrared spectrum has a wide adaptability to the samples, regardless of solid, liquid or gaseous samples. In addition, infrared spectroscopy has the characteristics of fast, high sensitivity and less sample amount. Therefore, it has become the most commonly used and indispensable tool for modern structural chemistry and analytical chemistry (Allen, et al., 2016). The wave length of infrared light covers 0.76µm~1000 µm, and the corresponding wave number is in the range of 13330~10 cm-1. Usually, the infrared region is divided into near infrared region (13330~4000 cm-1), middle infrared region (4000~650 cm-1) and far infrared region (650~10cm-1). Because the vibration frequency of most of the organic compounds is in the mid infrared region. The study on the mid infrared spectra is the most. The data collection, collation and induction of the absorption peak area has become quite perfect. The application of infrared spectroscopy in chemistry is various. It can be used not only for basic research structure, such as determining the molecular space structure, and calculating chemical bond force constants, bond lengths and angles. It is also widely used in the qualitative and quantitative analysis of compounds and chemical reaction mechanism research. The widest use of infrared spectroscopy is the structural identification of unknown compounds. DOI: 10.3303/CET1759106 Please cite this article as: Daping Deng, Yameng Bai, Xiaohong Deng, Xiaoyun Xie, Jiuming Yan, 2017, Study on computer-assisted infrared spectroscopy for identification of chemical structure system, Chemical Engineering Transactions, 59, 631-636 DOI:10.3303/CET1759106 631 The infrared spectrum is very complex. The different atomic mass of compounds, different chemical bond properties, and different order and spatial location of the atoms will cause the difference of infrared spectrum. In recent years, people have been looking for a better pattern identification method to analyze the structure of infrared spectrum. Vapnik et al proposed support vector machine (SVM) based on the Statistical Learning Theory (SLT) in 1995. According to the limited sample information, it found the best compromise between model complexity and learning ability in order to obtain the best generalization ability. In this paper, support vector machine (SVM) is used to analyze the sub-structure of infrared spectrum, and compared with back propagation artificial neural network. Support vector machine (SVM) is similar to multilayer feedforward network in form, and can also be used for pattern recognition and nonlinear regression (Nguyen, et al., 2016). 2. Application of support vector regression for carbon black process modeling 2.1 Carbon black Carbon black is produced by the incomplete combustion or pyrolysis of hydrocarbons (solid, liquid or gaseous. Carbon black is an important reinforcing agent and filler for rubber products (mainly tires). It not only can improve the strength of the rubber products, but also can improve the technical performance of the rubber material, and can endow the products with the advantages of wear resistance, tear resistance, heat resistance, cold resistance, oil resistance, etc., and can prolong the service life of the product. About 75% of the carbon black is used to make all kinds of tires. Therefore, the production of carbon black is closely related to the development of the automobile industry. In addition, carbon black can also be used for ink, plastic and batteries and so on (Moriya, et al., 2016; Torrado et al., 2016; Torrado et al., 2016). At present, the carbon black which the annual production capacity is 10000 tons mostly realized automatic control, such as DSC (Dynamic Stability Control) control system, being able to adjust the technological parameters on the computer. But how to adjust the technical process according to the carbon black products information is not clear, even some factories still mainly depends on the experience. This is not beneficial to improve the quality of the carbon black to better meet the development of rubber industry and other related industries. Especially after China joined WTO, foreign carbon black entered the Chinese market, such as the United States, Japan, Western Europe, they have established carbon black production base in China, which occupied the market share, and China's carbon black industry is facing strong competition and challenge. Therefore, it is necessary to introduce advanced science and technology to optimize the technological parameter which plays a decisive role in the production of carbon black, and establish a reliable prediction model between the carbon black product index and process parameters, improve the carbon black production technology, and improve the ability of producing high quality carbon black. Therefore, the significance and necessity of carbon black process modeling are summarized into three points: (1) To change the current status that carbon black production mainly depends on the long-term accumulated experience. (2) To meet the requirements of high quality carbon black for the development of rubber industry. (3) To enhance the competitive ability of carbon black at home and abroad after China's accession to the WTO. 2.2 Data processing and hardware and software equipment 2.2.1 Data sources and pretreatment Data is from the original records of carbon black production experiment workshop of carbon black industry research and design institute of China Rubber Group, a total of 112 samples. According to the experience that the operating variable participated in optimal control totally have 7 (Ullah, et al., 2016), which are natural gas flow, raw oil flow, the first chilled water flow, the second emergency water flow rate and dosage of carbon black, the dryer outlet temperature and granulating machine power. Learned from the production experience, the iodine absorption value and the DBP oil absorption value are mainly detected in the carbon black production workshop. The data was processed with standardization, and the treated variables were the same as the weight, and the mean value was 0, the variance was 1. The Cluster analysis method (CA) (Li and Liu, et al., 2016; Li and Hou et al., 2016), and Multi-discrimination vector (MDV) were used and combined with the actual production situation. 3 outliers were removed, remaining 109 samples. 2.3 Hardware and software equipment The hardware and software environment of the experimental data processing: Hardware: 60G disk, 1.0G Celeron CPU, 256M memory. Software: Windows XP, Office XP, MATLAB 6.5. 632 2.4 Application of support vector machine (SVM) in carbon black process modeling 2.4.1 Application of support vector machine for carbon black process modeling After 3 outliers are removed from 112 original samples, according to 10~20% of the number of prediction set samples to the number of calibration set samples, the original samples are divided into two groups: one group is the calibration set (training set), a total of 89 samples, which is used in the construction of carbon black production model; another is the prediction set, a total of 20 samples which is used for detecting model. The 20 prediction samples are randomly generated in 109 samples by using the rands function in MATLAB. The carbon black production process obtained from the previous work in our laboratory has a very strong nonlinear (Zampieri, et al., 2016). In this paper, support vector machine (SVM) is used to establish the model of carbon black process, and compared with the back propagation artificial neural network and radial basis function neural network modeling method. When using support vector machine regression, the parameter group which has a great influence on the training result is (σ, C), the error will be increased if it is too large or too small. ε is not sensitive to the loss function. If ε is too small, it is easy to produce the phenomenon of over fitting, and if too large, it is easy to produce less fitting. C is a penalty factor, which controls the degree of penalty for misclassification samples. The greater the C, the greater the penalty for the error. σ is the width of the kernel function. According to the experience, σ and C are adjusted respectively in 0.1-512, and the optimal parameter set (σ, C) is found, which make the regression have the best prediction ability. After debugging, the prediction effect of the iodine absorption is the best when ε is 0.01, C is 274 and ε is 3. When ε is 0.01, C is 3, σ is 1.1, the prediction effect of the oil absorption is the best when ε is 0.01, C is 3 and σ is 1.1. The fitting of the network output and actual production value of the carbon black iodine absorption and oil absorption value in the training set is shown in Figure 1. Figure 1: Fitting of the SVR prediction and actual production values in the training set It can be seen from Figure 1 that the fitting of the predicted value and the actual production value is very good. The model is used to predict the prediction set, and the fitting of the predicted value and the actual production value is shown in Figure 2. Figure 2: Fitting of SVR model prediction value and actual production value 633 It can be seen from Figure 2, the prediction of SVR on the carbon black iodine absorption and oil absorption value is better, and for the oil absorption value, the prediction error on the 5,11, 19 point is larger. 2.4.2 Application of neural network on carbon black process modeling The training method of back propagation artificial neural network and radial basis function neural network is the same as the literature. The fitting of the network output and actual production value of the carbon black iodine absorption and oil absorption value in the two training sets is shown in Figure 3. Figure 3: Fitting of forecast and actual production value of BPx and RBFN model of prediction set As can be seen from Figure 3, the fitting of the two neural networks to the carbon black training set is also very good. Figure 4 is the the prediction value and the actual production value fitting of the prediction set of BPx and RBF to the two models. From Figure 2 and Figure 4 it can be seen that the prediction ability of three models of carbon black oil absorption value is higher than the corresponding prediction ability of iodine absorption on carbon black. For the iodine absorption, the difference between the predictive value and the actual production value of BPx is the largest, and only a few points’ prediction effect is good. The prediction of RBFN is better than BPx, but not as good as SVR. For the oil absorption value, the overall prediction effect of SVR and RBFN is better than BPx. The difference between SVR and RBFN two prediction model is not very large. 634 Figure 4: Fitting of prediction value and actual production value of BPx and RBFN model of prediction set 2.5 Comparison of prediction results support vector machine and neural network In order to compare the modeling results of ANN-BPx, RBFN and SVR more intuitively, we use the average prediction error, the average relative error and the square error of the prediction set as the evaluation criteria of three indicators. The comparison results of the three models are shown in table 1 and table 2. Table 1: Prediction results of iodine absorption value Models Average prediction error Average relative error Error sum of squares BPx 3.05 2.54 292.99 RBFN 2.19 1.85 146.47 SVR 1.93 1.62 109.40 Table 2: Prediction results of oil absorption value Models Average prediction error Average relative error Error sum of squares BPx 2.00 1.64 136.07 RBFN 1.68 1.38 105.30 SVR 1.60 1.31 97.79 From table 1 and table 2, relative prediction errors of iodine absorption value and oil absorption value of SVR on carbon black are 1.62% and 1.31%, the model prediction accuracy is significantly higher than that of ANN- BPx (2.54%, 1.64%), and is slightly better than that of RBFN (1.85%, 1.38%). The prediction ability of three models on the oil absorption value of carbon black are higher than the corresponding prediction ability of iodine absorption of carbon black, which is the same as the conclusion obtained in Figure 2 and Figure 3. 635 3. Conclusion Support vector machine is a machine learning method of small sample theory. It can use the limited data to get the optimal solution. In this paper, it was applied to carbon black process modeling. Compared with RBFN and BPx-ANN method, the results show that the prediction accuracy of SVR models is better than that of ANN-BPx, and slightly better than the RBFN, solving the model building problems in the process of carbon black production, which has great significance for optimizing the operating conditions of carbon black production. Reference Allen F., Pon A., Greiner R., Wishart D., 2016, Computational prediction of electron ionization mass spectra to assist in GC/MS compound identification. Analytical chemistry, 88(15), 7689-7697. Li W., Liu Y., Sun H., Pan Y., Qian Z., 2016, Monitoring reduced scattering coefficient in pedicle screw insertion trajectory using near-infrared spectroscopy. Medical & biological engineering & computing, 54(10), 1533-1539. Li Y., Hou S.H., Yao L.M., 2016, Profitability assessment using data envelopment with cluster analysis: a case for different types of gas stations, Chemical Engineering Transactions, 51, 727-732, DOI: 10.3303/CET1651122 Mecozzi M., Sturchio E., 2017, Computer Assisted Examination of Infrared and Near Infrared Spectra to Assess Structural and Molecular Changes in Biological Samples Exposed to Pollutants: A Case of Study. Journal of Imaging, 3(1), 11. Moriya Y., Yamada T., Okuda S., Nakagawa Z., Kotera M., Tokimatsu T., Goto S., 2016, Identification of Enzyme Genes Using Chemical Structure Alignments of Substrate–Product Pairs. Journal of chemical information and modeling, 56(3), 510-516. Nguyen S.C., Zhang Q., Manthiram K., Ye X., Lomont J. P., Harris C. B., Alivisatos A.P. (2016). Study of heat transfer dynamics from gold nanorods to the environment via time-resolved infrared spectroscopy. ACS nano, 10(2), 2144-2151. Torrado D., Cuervo N., Pacault S., Dufour A., Glaude P., Murillo C., Dufaud O., 2016, Explosion of gas/carbon blacks nanoparticles mixtures: an approach to assess the role of soot formation, Chemical Engineering Transactions, 48, 379-384, DOI: 10.3303/CET1648064 Ullah I., Ahmad I., Nisar H., Khan S., Ullah R., Rashid R., Mahmood H., 2016, Computer assisted optical screening of human ovarian cancer using Raman spectroscopy. Photodiagnosis and photodynamic therapy, 15, 94-99. Zampieri D., Vio L., Fermeglia M., Pricl S., Wünsch B., Schepmann D., Laurini E., 2016, Computer-assisted design, synthesis, binding and cytotoxicity assessments of new 1-(4-(aryl (methyl) amino) butyl)- heterocyclic sigma 1 ligands. European Journal of Medicinal Chemistry, 121, 712-726. Zhang Z., Cao T., Liu H., Shu J., Li Z., 2016, September. A Computer-Assisted Learning System in the Teaching of Infrared Spectroscopy Course. In Educational Innovation through Technology (EITT), 2016 International Conference, 91-95. IEEE. 636