Microsoft Word - 001.docx


 CHEMICAL ENGINEERING TRANSACTIONS  
 

VOL. 66, 2018 

A publication of 

 
The Italian Association 
of Chemical Engineering 
Online at www.aidic.it/cet 

Guest Editors: Songying Zhao, Yougang Sun, Ye Zhou 
Copyright © 2018, AIDIC Servizi S.r.l. 

ISBN 978-88-95608-63-1; ISSN 2283-9216 

Application of Data Mining Technology in Chemical 

Engineering Optimization 

Yongmei Niu  

Nanyang Institute of Technology, Henan 473000, China 

yongmeiniu7456@21cn.com 

To find a fast and efficient fault diagnosis method for induction motor using limited data samples, the 

modelling and application of deep learning in induction motor fault diagnosis was studied. Firstly, the 

application of traditional convolution neural network (CNN) in fault diagnosis of induction motor was analysed, 

and a convolutional discriminative feature learning (CDFL) method based on improved CNN training mode 

used for discriminative learning of induction motor fault characteristics was put forward. The method mainly 

used BP neural network discriminative learning ability to learn local features of convolutional layer filter, so 

that the convolution pool model could learn the default features with the fault characteristic invariant of the 

induction motor vibration signal not adjusting the network parameters. In addition, the correct classification of 

fault type was achieved by selecting support vector machine SVM. At last, experiment was designed to 

compare CDFL method and signal processing wavelet packet transform method and it was compared with 

other deep learning methods. The experimental results showed that the classification results of CDFL 

fluctuated the minimum; when the filter window size was 200, the classification effect of model was the best 

and when the pool domain size was 20, it achieved the best effect of classification. To sum up, CDFL has the 

highest classification accuracy and good robustness, and it can learn the fault characteristics of induction 

motor more quickly, intelligently and effectively. To study the application of data mining technology in chemical 

engineering optimization, the data mining technology was adopted in this paper to discretize and analyze the 

experimental data. Results have shown that after the data discretion by the database technology, the 

expansion rate was below the optimal value when the deposition rate and recovery rate of the chemical 

equipment were the best. It was then concluded that the application of data mining technology in chemical 

engineering optimization, far from unrealistic, can greatly improve economic benefits for the company, for 

which it can be highly promoted and widely applied. 

1. Introduction 

Evolving with the Internet, the data mining technology has been applied to certain extent. Currently, as a hot 

topic in the chemical industry, the technology has been studied by a number of domestic and foreign scholars. 

Traced back to as early as the 1980s to 1990s, the data mining technology mainly refers to finding the target 

information timely and accurately from a large amount of information through certain technologies. Affected by 

several factors in the chemical manufacturing, the production equipment under operations may suffer 

disturbance from the environment, leading to inability of the original design scheme or procedure. However, it 

is difficult to accurately understand the relationship between variables by optimizing the parameters manually, 

which hinders the chemical companies to further enhance the production efficiency. To this end, the 

application of data mining technology to the chemical engineering optimization is very essential. Energy 

consumption and recovery rate are important indicators for evaluating the production efficiency of chemical 

companies, while the two indicators are often affected by product quality and manufacturing time. 

With a large number of domestic and foreign related documents as the reference, this paper used the data 

mining technology to discretize and analyze the experimental data, and studied the application of data mining 

technology in chemical engineering optimization, with the purpose of further understanding the optimization of 

parameters in the chemical industry for enhanced economic efficiency of companies. Therefore, this study is 

of important practical significance. 

                                
DOI: 10.3303/CET1866151 

 
Please cite this article as: Niu Y., 2018, Application of data mining technology in chemical engineering optimization, Chemical Engineering 
Transactions, 66, 901-906  DOI:10.3303/CET1866151   

901


2. Literature review 

The research methods of data mining technology are mainly established based on the theories and methods 

of artificial intelligence, computational intelligence and statistical methods, which mainly include: computational 

intelligence (neural network and genetic algorithm), statistical method (principal component analysis and 

partial least squares), fuzzy theory method, rough set theory, machine learning method (decision making) and 

so on. In the procedure of chemical process modelling, neural network (ANN), genetic algorithm (GA), 

principal component analysis (PCA) and partial least squares (PLS) are widely used. 

The neural network imitates the biological neural network from the structure, and is composed of a large 

number of simple neurons to form a network system according to certain rules, so as to achieve the goal of 

simulating human image intuition. Neural network uses its idea of nonlinear mapping and parallel processing, 

and uses its own structure to express input and output related knowledge. The neural network method has 

more advantages when it is used for nonlinear data and data containing noise. It can accomplish a variety of 

data mining tasks, such as classification, clustering, feature mining and so on. Because neural networks can 

approximate arbitrary nonlinear mappings with arbitrary accuracy and bring a non-traditional expression tool to 

modelling, it is widely used in the field of nonlinear chemical process modelling. Many researchers have done 

a lot of research on how to improve the predictive ability of neural networks. Saon used combinatorial 

generalization to propose a combined neural network method, which could significantly improve the predictive 

ability of the model, in which the selection of combined weights was very critical for the good performance of 

the composite neural network (Saon, 2018). Therefore, in recent years, many methods have been proposed to 

reasonably select combination weights, such as multiple linear regression, principal component regression 

and information inference combination. Qiu and others used combined neural network to predict the mass of 

polymer in batch reactor, that is, a mathematical model that uses a combined neural network to represent the 

relationship between the polymerization formula and the trajectory of the mass variable of the polymer. The 

predictive confidence interval of the combined neural network model was calculated to improve its 

generalization ability and was successfully applied to the study of an intermittent isobutylene methyl ester 

polymerization reactor (Qiu et al., 2016). A hybrid neural network modelling for an industrial wastewater 

treatment process was carried out by Ostad-Ali-Askari and others. First of all, a simplified mechanism model 

was established based on the experience and knowledge of the process. Then, a neural network model was 

established according to the actual operation data. Finally, the neural network model and the simplified 

mechanism model were combined together by adopting the parallel method. It showed that the hybrid neural 

network model had better predictive power and extrapolation performance by comparing with conventional 

methods (Ostad-Ali-Askari et al., 2016). 

The genetic algorithm is an adaptive heuristic probability iterative global search algorithm, which has 

robustness and global optimality in solving nonlinear problems, and does not depend on the characteristics of 

the problem model in the process of solving the problem. In addition, it has the characteristics of parallelism 

and high efficiency. Due to the novelty of the genetic algorithm and the rapid development of computer 

technology, it has been widely used. For instance, the optimization of complex problems, pattern recognition, 

engineering design, and control system optimization have achieved good results. In chemical process, genetic 

algorithm is mainly used in modelling, control and optimization. Liu and others used genetic algorithms for the 

steady state modelling of the chemical process system, that is, the genetic programming method was adopted 

to establish the input and output model of the complex chemical process (Liu et al., 2016). The advantage of 

this method is that a simple model which can accurately reflect the characteristics of the process can be 

obtained without any modelling hypothesis, and the complexity of the model can be reduced by using the 

genetic algorithm. The method is successfully applied to the steady modelling of two typical chemical 

processes. Ivanov and so on also used genetic programming to establish the dynamic model of a chemical 

system, and proposed a nonlinear model predictive control strategy. The results showed that the performance 

of the predictive control strategy exceeded that of the usual linear model (Ivanov et al., 2016). 

Principal component analysis and partial least squares are both new methods of multivariate statistical data 

analysis. Principal component analysis can achieve the following objectives: data simplification, data 

compression, modelling and variable selection. Kamilov and Mansour applied iterative nonlinear PLS method 

in the modelling of nonlinear chemical process, and successfully applied the method to 3 typical chemical 

processes. By comparing with other similar methods, it is proved that the method is more suitable for 

nonlinear process modelling, and the prediction ability of the built model can be significantly improved by using 

this method (Kamilov and Mansour, 2016). 

Since each data mining technology is proposed for a specific background, there are some shortcomings that 

cannot be overcome by some methods themselves. Therefore, these data mining techniques can be 

combined to achieve a better effect than the use of certain data mining technology alone. The following two 

kinds of data mining integration technologies and their applications in chemical process modelling are 

902


introduced. Velásco-Mejía and others carried out the modelling research on a complex biochemical process 

by combining neural network with genetic programming. By comparing the result with the result of mechanism 

modelling, it is pointed out that the advantage of the method is that some process experience and knowledge 

are not required in the process of modelling, but the model established can still accurately predict the system 

response (Velásco-Mejía et al., 2016). Pablo and others applied neural network and genetic algorithm to 

model the complex process of unknown or complex mechanism and optimize its operating conditions. The 

genetic algorithm was used to quickly search the optimized region and the results proved the feasibility of the 

proposed method (Pablo et al., 2016). 

In recent years, data mining technology has been widely applied in chemical process optimization, and new 

methods with great application prospect, such as ANN, GA, PLS and PCA, are proposed. But data mining, as 

a brand-new technology, is at the initial stage both at home and abroad at present, and its theories, methods 

and applications are not mature enough. In general, many data mining techniques are proposed for specific 

background. There are some limitations in the application of chemical process optimization, and many data 

mining techniques have not been widely used in chemical industry, such as rough set theory, decision tree 

and so on. However, data mining is a hot research field. With the further development of chemical process 

monitoring and control technology, a large number of process data are collected and stored by computer 

control system. New applications of data mining technology in chemical process optimization will continue to 

emerge. 

3. Methods 

The database technology simply organizes and stores the data in the database efficiently, and makes some 

simple analysis of the data, while a lot of useful information hidden inside the data cannot be obtained. In the 

field of machine learning, pattern recognition, and statistics, there are a large number of methods for 

knowledge extraction, but they largely play a role in experimental data or academic research as they are not 

combined with the massive data used in practical applications. Data mining combines the fields of database 

technology, machine learning, pattern recognition, and statistics from a new perspective, and explores a 

deeper, effective, novel, potentially useful, and ultimately understandable patterns that exist within data. The 

commonly used methods of data mining are as follows: (1) Decision tree. Decision tree technology is mainly 

used for predictive modeling of classification, clustering, and prediction. It uses the mutual information 

(information divergence) in the information theory to find the field with the largest amount of information in the 

database, establishes a node of the decision tree, then establishes the branch of the tree according to the 

different values of the field, and repeats the establishment of lower nodes and branches in each branch sub -

set, so that a decision tree is generated. (2) Pattern recognition is one of the main methods of data mining. It 

is a kind of mathematical statistical method that uses computer to process information and judge classification. 

(3) Artificial neural network method is used for classification, clustering, feature mining, prediction and pattern 

recognition, as shown in Figure 1. The neural network method mimics the brain neuronal structure of animals 

based on the M-P model and Hebb learning rules. In essence, it is a distributed matrix structure that gradually 

calculates (including iterative iteration or cumulative calculation) the weights of neural network connections 

through the mining of training data. 

 
Figure 1: Artificial neural network method 

In this experiment, acidic copper sulfate aqueous solution was used as the electrolyte to electrolyze copper 

ions. The copper ion concentration was 0.011 mol/l and the sulfuric acid concentration was 0.714 mol/l. The 

electrolysis process of the narrow component of the spherical copper powder was studied when the copper 

solution was flowing through the fluidized bed. Also, the porosity of the bed during the process of the fluidized 

bed electrode and the other parameters of the electrolysis process of copper power with different diameters 

were measured, such as current efficiency, recovery, deposition rate, and power consumption. With narrow-

903


grained spherical copper powder as conductive particles, constant-potential electrolysis was used to 

electrolyze low-concentration copper ions, and the effects of current efficiency, recovery rate, deposition rate, 

and energy consumption under different bed expansion rates were examined. The average particle size of the 

particles was 0.33, 0.40, 0.52, 0.69 and 1.00mm, a total of 5 commonly used particle sizes. The bed mass 

was 100 g, and the bed expansion ratio was 10 to 50%, accounting as 5 expansion ratios. The experimental 

device is shown in Figure 2. 

 
1. Sink; 2. Magnetic pump; 3. Rotameter; 4. Fluidized bed; 5. Distribution board; 6. Predistributor; 7. U-tube 

differential pressure meter 

Figure 2: Experimental apparatus 

Recovery rate: θ = 1 −
𝐶1

𝐶0
 

In the formula, C0 refers to the concentration of the reactants at the beginning of the reaction; 

Ct means the concentration of the reactants at the end of the reaction; 

4. Research results and discussion 

Table 1 shows the experimental results obtained when the copper particles in commonly used five particle 

sizes in electrolytic copper processing are fluidized particles. 

Table 1: Electrolysis of copper in a fluidized bed electrode containing monocomponent particle 

Expansion ratio % Current efficiency% Recovery rate% Deposition rate g/h Particle size mm 

7 57 32 1.3 0.3 

24 71 47 1.3 0.3 

33 85 56 1.7 0.3 

To evaluate a data mining tool, consider the following aspects: 

1. The number of patterns generated and the ability to solve complex problems. With the increase of data 

volume and higher requirements for fineness and accuracy of the model will increase the complexity of the 

problem. The data mining system can provide the following solutions to complex problems. The first is the 

multiple modes. The combination of multiple category modes helps discover useful patterns and reduces 

problem complexity. For example, grouping data by clustering and then mining predictive patterns for each 

group will be more efficient and accurate than simply performing operations on the entire dataset. The second 

is multiple algorithms. Many models, especially those related to classification, can be implemented with 

different algorithms, each with its own advantages and disadvantages, applicable to different needs and 

environments. 

Data selection and conversion patterns are often hidden by a large number of data items. Some data is 

redundant, and some is completely irrelevant. The existence of these data items will affect the discovery of 

valuable models. A very important function of the data mining system is to be able to deal with data 

complexity, provide tools, select the correct data items and convert data values. 

2. Easy to operate. Easy operation is an important factor. Some tools have graphical interfaces that guide 

users to perform tasks semi-automatically, and some use scripting languages. Some tools also provide data 

mining APIs that can be embedded into programming languages such as C, VisualBasic, and PowerBuilder. 

Patterns can be applied to existing or newly added data. Some tools have a graphical interface, and some 

allow the schema to be exported to a program or database by using a programming language such as C or a 

set of rules in SQL. 

904


iDA provides support for business and technical analysis by providing a complete set of visual data mining 

tools, including a preprocessor, 3 data mining tools, and a report generator. iDA provides a Microsoft Excel 

with user interface. The components of IDA are shown in Figure 3. 

 
Figure 3: iDA system structure 

Before the preprocessor submits the data in the file to the iDA's mining engine, it scans the file for several 

types of errors, including illegal values, blank lines, and null values. The preprocessor corrects several errors 

but does not fix errors in the numeric data. The preprocessor outputs a file waiting to be mined, or informs us 

of the location of the error that we cannot resolve. 

The heuristic agent is responsible for the display of data files containing thousands of records, and allow us to 

decide whether to extract a representative data set from a data set for analysis or analyze all data sets. 

The neural network. iDA, contains two neural network architectures: one is a backward propagating neural 

network that supports supervised learning, and the other is an unsupervised clustering self-organizing 

functional map. 

In the experimental data of the narrow fluidized bed electrode electrolysis experiment we selected, the 21st 

and 22nd sets of data were isolated due to experimental conditional control failure and were manually cleared. 

The initial bed height and fluidized bed height were used to calculate the voidage, the flow rate was used to 

calculate the apparent flow rate, and the power consumption was linearly related to the current efficiency. 

Therefore, when the target data set was created, these three attributes were cleared, and the selected data 

set is shown in Table 2: 

Table 2: Data set after data cleaning for narrow component fluidized bed electrode electrolysis experiments 

Expansion ratio % Current efficiency% Recovery rate% Deposition rate g/h Particle size mm 

7 57 32 1.4 0.5 

24 71 47 1.5 0.5 

33 85 56 1.9 0.5 

Table 3: Discretized Data Sets for Narrow Fraction Fluidized Bed Electrolysis Experiments 

Expansion ratio % Current efficiency% Recovery rate% Deposition rate g/h Particle size mm 

0-20 50-60 30-40 M 0.3 

20-40 70-80 40-50 M 0.3 

20-40 80-90 50-60 H 0.3 

 
In general, the attributes in the database can be divided into two types. One is the continuous (quantitative) 

attribute that represents certain measurable properties of the object being described, with its value taken from 

a continuous interval, such as temperature, length, etc.; the other is discrete (qualitative) attributes, whose 

value is expressed in language or a small number of discrete values, such as gender, color, etc. In most 

cases, the same database contains both continuous and discrete attributes. The discretization process 

includes univariate discretization and multivariate discretization. The univariate discrete refers to a continuous 

attribute for a discretization, while the multivariate discretization can handle multiple continuous attributes at 

the same time. The following describes a typical univariate discretization process: (1) Rank data with 

continuous attributes to be discretized; (2) Preliminarily determine candidate points for continuous attributes; 

905


(3) Continue to segment or merge candidate points according to some criteria; (4) If (3) reaches the 

suspension condition, the entire discretization process is aborted; otherwise, the step (3) is continued.  

We use the “3-4-5” rule to discretize experimental data as shown in Table 3. 

The expansion ratio is one of the important parameters of the fluidized bed electrode electrolyzer. When the 

expansion rate is very low, it may lead to the adhesion and agglomeration of the conductive particles, which is 

equivalent to a completely “on” state between copper particles in the packed bed. The discharge was reduced 

only near the semi-permeable membrane closest to the anode. When the expansion rate is too large, the 

particles may not be in contact with each other, reducing the effective use area so that the particle bed is in 

the "open circuit" state, and the copper ions are only discharged in the vicinity of the feed electrode. This will 

reduce the recovery rate and current efficiency, so the expansion rate should be properly controlled between 

10% and 30% when copper removal has the fastest and highest efficiency with the current efficiency 

significantly increased, thereby reducing power consumption. 

5. Conclusion 

The rapid development of the information technology has led to the emergence of database technology. At 

present, database technology has been widely used in data information management to improve the efficiency 

of data information processing and storage. Today, facing severe environmental problems, the application of 

database technology into chemical companies can further optimize their production processes, save energy 

and reduce emissions, and help them better handle waste water and waste residues with heavy metal content. 

The results of this study show that after the database is discretized, the expansion rate is below the optimal 

value. At this time, the deposition rate and recovery rate of the chemical equipment are optimal. It can be seen 

that the application of data mining technology in chemical engineering optimization is actually practical and 

can greatly increase the economic efficiency of enterprises with high promotion and application value. 

Owing to limited knowledge of the author, there may be some deficiencies in this paper, and the experimental 

design carried out in this paper is of a small scale, which may have a big gap with the actual chemical 

enterprise production situation, and requires further study. 

Reference  

Ivanov G., Nikolov N., Nikova S., 2016, Reversed genetic algorithms for generation of bijective s-boxes with 

good cryptographic properties, Cryptography & Communications, 8(2), 247-276, DOI: 10.1007/s12095-

015-0170-5 

Kamilov U.S., Mansour H., 2016, Learning optimal nonlinearities for iterative thresholding algorithms, Signal 

Processing Letters, 23(5), 747-751, DOI: 10.1109/LSP.2016.2548245 

Liu S., Pan J., Yang M.H., 2016, Learning Recursive Filters for Low-Level Vision via a Hybrid Neural Network, 

Computer Vision–ECCV 2016, Springer International Publishing, 560-576, DOI: 10.1007/978-3-319-

46493-0_34 

Ostad-Ali-Askari K., Shayannejad M., Ghorbanizadeh-Kharazi H., 2016, Artificial neural network for modeling 

nitrate pollution of groundwater in marginal area of Zayandeh-rood River, Isfahan, Iran, Ksce Journal of 

Civil Engineering, 21(1), 1-7, DOI: 10.1007/s12205-016-0572-8 

Pablo B.M.J., Piedad C.I.C.H., Santiago S.V., 2016, Neural Networks and Genetic Algorithms Applied for 

Implementing the Management Model “Triple A” in a Supply Chain, Case: Collection Centers of Raw Milk 

in the Azuay Province, 68, 06-08, DOI: 10.1051/matecconf/20166806008 

Qiu J., Wang J., Yao S., Guo K., Li B., Zhou E., 2016, Going Deeper with Embedded FPGA Platform for 

Convolutional Neural Network, Acm/sigda International Symposium on Field-Programmable Gate Arrays, 

26-35, DOI: 10.1145/2847263.2847265 

Saon G.A., 2018, Speaker adaptation of neural network acoustic models using i-vectors, 55-59, DOI: 

10.1109/ASRU.2013.6707705 

Velásco-Mejía A., Vallejo-Becerra V., Chávez-Ramírez A.U., Torres-González J., Reyes-Vidal Y., Castañeda-

Zaldivar F., 2016, Modeling and optimization of a pharmaceutical crystallization process by using neural 

networks and genetic algorithms, Powder Technology, 292, 122-128, DOI: 10.1016/j.powtec.2016.01.02 

 
906