HUNGARIAN JOURNAL 
OF INDUSTRIAL CHEMISTRY 

VESZPREM 
Vol. 30. pp. 241-245 (2002) 

CLASSIFICATION NEURAL NETWORKS IMPROVE:MENT USING GENETIC 
ALGORITHMS 

A. WOINAROSCHY, V. PLESUand K. WOINAROSCHY 

(Department of Chemical Engineering, University Politehnica of Bucharest, 
1-5, Polizu Street, Bucharest 78126, ROMANIA) 

Received: April 8, 2002 

The aim of the present work consists the development of a procedure capable to realize a high accurate classification 
neural network with a reduced number of neurons. In order to avoid local minima. the weights of a proposed three-layer 
network were computed using a common genetic algorithm. The performances of the genetic algorithm were strongly 
improved by several means that make an increased balance between exploration and exploitation of the search space. 
Thus, in order to decrease the number of genes, respectively weights, the dimension of the output vector was minimized 
by identification of classes with binary numbers. A very favorable effect was obtained by seeding the initial population 
with a good chromosome obtained by the use of the classical "delta-rule" learning procedure. It was also investigated the 
effect of the initial population size, the bounds imposed to the weights of the inter-neuronal connections, the number of 
neurons in the hidden layer, fitness expression, etc. 

Keywords: neural networks, genetic algorithms, classification 

Introduction 

An important field of neural network applications in 
bioprocessing and chemical engineering consists in 
classification problems, like process fault detection and 
diagnosis, detection and location of gross errors in 
experimental data sets, spectral analysis, models 
discrimination and identification of model parameters, 
etc. There are several types of neural networks used for 
classification problems: back-propagation, radial-basis-
function, learning-vector-quantization, probabilistic 
networks, networks based on adaptive-resonance-theory, 
and so on. Each type has some advantages and 
consequently disadvantages, depending on the nature of 
the problem. Representatives for chemical and 
biochemical processes are the problems with non-
uniform decision regions where the data points of each 
class are scattered. For these problems radial-basis-
function and back-propagation networks perform better 
[1,2]. The use of gradient of the error function during 
the learning stage affects the performances of these 
networks due to ending into a local minimum. In fact, 
the gradient technique is an example of a hill-climbing 
~trategy, which exploits the best solution for possible 
Improvement; on the other hand, it neglects exploration 
of the search space. Random search is a typical example 

of a strategy, which explores the search space ignoring 
the exploitations of the promising regions of the space. 
For small spaces, classical exhaustive methods usually 
suffice; for larger spaces special artificial intelligence 
techniques must be employed. Genetic algorithms are 
among such techniques, being a class of general purpose 
(domain independent) search methods which strike a 
remarkable balance between exploration and 
exploitation of the search space [3]. For several years, 
genetic algorith.ms have been used to evolve the neural 
networks structure. as well as the weights of the inter-
neuronal connections (e.g. see (4-71). The aim of the 
present work consists in an extended investigation of the 
features of genetic algorithms in order to realize a high 
accurate classification neural network with a reduced 
number of neurons. 

Methodology 

The investigations were done on the three layers 
networks with linear transfer function for the input layer, 
and sigmoid transfer function for the hidden and output 
layers. Because the number of neurons in the input layer 
is imposed by the number of elements of the vectors that 
are classified, the structural variables of these networks 


242 

Table 1 The effects of several factors on the performances of 
chemical reactor fault-diagnosis network 

B Nh N0 RMS error 
Percentage of wrong 

S;p classifications (%) 

100 [-100; 100] 3 4 0.3478 27.20 

200 [-100; 100] 3 4 0.3418 26.25 

500 [-100; 100] 3 4 0.2974 25.00 

500 [-50; 50] 3 4 0.3827 26.25 

500 [-100; 100] 3 4 0.2974 25.00 

500 [-250; 250] 3 4 0.4840 37.50 

500 [-100; 100] 3 4 0.2974 25.00 

500 [-100; 100] 5 4 0.2701 21.25 

500 [-100; 100] 7 4 0.1845 16.25 

500 [-100; 100] 9 4 0.2395 18.75 

500 [-100; 100] 3 4 0.2974 25.00 

500 [-100; 100] 3 3 0.1411 20.00 

500 [-100; 100] 3 2 0.2398 21.25 

equal the number of neurons in the hidden and output 
layers. The number of neurons in the output layer 
depends on the rule of classes codification. There are no 
theoretical guidelines to establish the number of hidden 
neurons. Different from the mentioned works, here each 
chromosome corresponds with a complete set of the 
weights of the inter-neuronal connections for a given 
structure of the network (respectively , each gene 
represent a weight). The structural variables, 
respectively, the number of neurons in the hidden and 
output layers, were step-by-step modified in an external 
loop: the procedure starts with the minimum numbers of 
the corresponding neurons and these are progressively 
increased up to imposed limits. 

The genetic algorithm used is a MATLAB 
implementation [8] that can be downloaded at 
ftp://ftp.eos.ncsu.edu/pub/simul/GAOT. Float 

. representation of chromosomes has been used. The 
selection of candidate chromosomes for crossover and 
mutation is made according with a ranking selection 
function based on the normalized geometric distribution. 
Three types, of crossover are applied: simple, 
interpolated, and ext:rapolated crossover. In_ the simple 
eroesover, the crossover point is randomly selected. The 
~lated c.roowve:r performs an interpolation along 
the line formed by the two parents. The extrapolated 
~ves performs an extrapolation along the line 
fonned by the two parents in the direction of better 
parent. Four types of mutation are applied: boundary, 
multi-nonuniform. nonuniform, and uniform mutation. 
Boundary mutation changes one gene of the selected 
chromosome randomly either to its upper or lower 
bound. Multi-nonuniform mutation changes aU genes. 
whereas nonuniform mutation changes one of the genes 
in a chromosome on the base of a non-uniform 
probability distribution. This Gaussian distribution starts 
wide~ and narrows to a point distribution as the current 
generation approaches to the maximum generation. 
Uniform mutation changes one of the genes based on a 
uniform probability distribution. The numbers of 

applications of the different crossover and mutati~n 
operators are imposed as parameters of the genetic 
algorithm. The investigation of the effects of these 
parameters was out of the aims of the present work, and 
their default· values have been used, respectively for 
each generation: 2 simple, 2 interpolated, and 2 
extrapolated crossover, 4 boundary, 6 multi-nonuniform, 
4 nonuniform, and 4 uniform mutation. 

Due to the use of a maximization algorithm, the 
chromosome fitness corresponds to the negative value of 
RMS error. The attempts to use o~her expressions for 
chromosome fitness (e.g. the average relative error) did 
not give meaningful improved results. 

Applications 

Chemical reactor fault-diagnosis 

This application and the corresponding data are 
presented in [9]. The input vector contains reactor inlet 
temperature, reactor inlet pressure, and feed flowrate. 
The elements of the output vector are three fault classes: 
low conversion, low catalyst selectivity, and catalyst 
sintering. 
The following factors were studied: 

a) the size of initial population (Sip); 
b) the bounds imposed to the weights of the inter-

neuronal connections (B); 
c) the number of neurons in the hidden layer (Nh); 
d) the.number of neurons in the output layer (No); 

The last factor corresponds to different rules of 
classes codification. There are four types of outputs, 
corresponding to four classes: the correct operating 
regime, and the three fault regimes. ~he classical 
codification corresponds to the use of four output 
neurons, as follows: [1 0 0 0]; [0 1 0 0]; [0 0 1 0]; [0 0 
01]. Also, there are possible other two codification 
methods: a codification using three output neurons, 
respectively: [0 0 0]; [1 0 0]; [0 1 0]; [0 0 1], and a 
codification with two output neurons: [0 0]; [0 1]; [1 0]; 
[1 1]. The last codification is in fact the binary 
representation of each class number. 

Our investigations in the field of classification neural 
networks indicated that the activity and corresponding 
response of the output neurones must close to 0 or 1. A 
neural network with one output neuron (having the 
smallest number of the inter-neuronal connections 
weights) for which the above mentioned 4 classes are 
codified with the activity domains: 0-0.25; 0.25-0.5; 0.5-
0.75; 0.75-1 cannot be trained to give a correct 
classification. 

Due to the random generation of the initial 
population of the genetic algorithm, each test was 
repeated five times. The results presented in the Table 1 
represents the corresponding average values. In all tests, 
for a correct comparison of the results, the total number 
of generations (the stopping criterion) was the same, 
respectively 1000. 


Fig.] The evolution of mean( .... ) and best (-)fitness for the 
optimum neural network 

The analysis of these results indicates that: 

Expansion of the initial population size has a 
favorable effect due to increasing the probability to 
generate individuals with a good fitness. 
Unfortunately, this effect is present only in simple 
cases, for well-defined problems with smooth 
response surface. For highly complex. problems, the 
search space in which genetic algorithm usually 
operates is so large, that taking few additional 
chromosomes into initial population cause a very 
small, or no detectable effect, unless the problem 
surface is very smooth. 

- A too small, or a too large range of the weights 
values has a non-favorable effect: in the first 
situation the optimization solution is arbitrarily 
restricted, and in the second case the search space is 
unreasonably enlarged. 

- As in the case of gradient-based learning algorithms, 
there is an optimum number of neurons in the hidden 
layer for each neural network application, and the 
corresponding value must be establish through an 
iterative procedure. 

- Similarly, the best rule of classes codification and 
the corresponding number of neurons in the output 
layer must be found by iterations. 

With the best values of the investigated factors from 
Table 1, respectively Sip= 500; B= [-100; 100]; Nh= 1; 
No= 3, and by the aid of the genetic algorithm the 
weights of the corresponding neural network were 
established. This exhibit excellent results: RMS error 
= 0.007 and no wrong classification (related to the 
training set, due to the absence in the original mentioned 
reference [9] of a test set). The corresponding evolution 
of the genetic algorithm is represented in Fig.l. 

It is remarkable that the best values of RMS error 
obtained for this example by BAUGHMAN and LIU [9] 
(using the classical "delta rule" as learning procedure) at 
the end of 50,000 training iterations were: 0.088 for a 
network with 3 neurons in the· hidden layer and sigmoid 
transfer function, and 0.037 for a network with 5 

243 

neurons in the hidden layer and hyperbolic tangent 
transfer function. 

Consistency analysis of fuzzy sets 

This case study was already presented in literature [10] 
as an application of a modified back-propagation neural 
network. The reason of the actual selection consists in 
the fact that the· application represents a difficult 
classification problem· involving non-linearly separable 
patterns. 

In many chemical processes due to several reasons 
there are some variables that cannot be measured 
directly on-line (e.g. biological or catalyst activity). In 
these cases the values of measured variables are fuzzy 
sets. Related to system constraints and feasible domains 
of the variables these fuzzy sets must be consistent. 

In order to use a graphic representation, a 
hypothetical example with two measured variables, x1 
and x2, and one unmeasured variable, x3, all restricted to 
a system of three strongly non-linear constrains, was 
considered: 

- O.Sx3 +0.275x1x3+0.261x; x3+0.18xt x3+x2 =0 (3) 

The feasible domains of the variables are: 

The number of variables is incidentally ~e same as 
the number of constraints. Usually, the number of 
variables is greater than the number of constraints. The 
measured values of the variables correlated with the 
feasible domains of all variables, and with constraints' 
violations will generate several classes of errors, 
depending on combination of violated restrictions. For 
this application the following four classes were 
considered: 

A -all restrictions are satisfied; 
B -constraint (1) is violated regardless of constraint (2) 

and constraint (3); 
C- constraint (2) is violated and constraint (1) and 

constraint (3) are respected; 
D -constraint (3) is violated regardless of constraint (1) 

and constraint (2). 

These four domains, the data from the training set 
(points numbered from 1 to 124), and test data (prefix 
T) are indicated in Fig.2. It can be observed that all 
learning data correspond to points placed on. or near the 
bounds. 

The best results obtained by the aid of the genetic 
algorithm at the end of 500 generations are exposed in 
Table 2. 

The best result obtained in [10J using the classical 
"delta rule" as learning procedure corresponds to a 


244 

Table 2 The best results obtained by the aid of the genetic 
algorithm with no-seeding and with seeding the initial 

population with a chromosome having RMS error= 0.1012 
(S;p = 500; B = [-130; 130]; Nh= 5; No= 2) 

Procedure Trial number RMS error 
Percentage of wrong 
classifications(%) 

No-seeding 1 0.2816 25.00 

No-seeding 2 0.2506 22.58 

No-seeding 3 0.2466 20.77 

o-seeding 4 0.1445 12.10 

No-seeding 5 0.2446 20.97 

o-seeding Mean 0.2336 20.32 

Seeding 1 0.0907 3.22 

Seeding 2 0.0895 6.45 

Seeding 3 0.0902 5.65 

Seeding 4 0.0894 6.45 

Seeding ·5 0.0902 4.84 

Seeding Mean 0.0900 5.32 

Fig._ The four patterns das ification e ample 

percentage of wrong cla sifi ations of 9.70 ~ and ' as 
gi en by a modified ba k-propagation neural netv ork 
with neuron in the hidden layer neuron in the 
out ut layer and hyperb lie tangent transfer fun tion. 

It i bvious that in thi ca e the results gi en by 
genetic algorithm are wor e than that obtained using the 
las ical "delta rule" learning pro edure. Due to thi 

fact, a hybrid pro edure was propo ed: in a fir t stage 
the arch is made with the aid of "delta rule" learning 
procedure; next in the second tage the initial 
population of chromo omes is eeding \! rith the be t 
olution fr m the end of the fir t stage. Th resul 

obtained with thl " e ding " procedure are encouraging. 
Thu for th con idered network. "delta rule" learning 
pr edure give thew ights orresponding "th a 
error of 0.1012. Thi olution a improved \i ith 10% 
by the genetic algorithm. Th re ults obtain d ith 
eeding the initial population are gi en al in Table 

-1 
N N N "' "' "' " =F I I ~ 

I I 

~ ~ 
I 

~ ~ rD ~ rD rD 
Connection 

Fig.3 Changes of the weights of the inter-neuronal . 
connections between the input and hidden layers ( I - input; H 

-hidden; B -bias) 

9 0 0 0 ~ 0 
C\1 C\1 C\1 

9 0 0 
I c\J 

cJ, .;. rb I c\J 
cJ, 

I I I I I I 

Connection 

Fig.4 Changes of the weights of the inter-neuronal 
connections between the hidden and output layers ( H-

hidden; 0- output; B- bias) 

The improved solution was obtained mainly due to 
changes of the weights of the inter-neuronal connections 
between the hidden and output layers (Figs.3 and 4). 

For this difficult classification problem, the mean 
percentage of wrong classifications given by this hybrid 
procedure is with 82% better than the best solution 
corresponding to a larger neural network obtained in 
[10]. 

The corresponding evolutions of the genetic 
algorithm for "no-seeding" and "seeding" procedure are 
repre ented in Fig.5, and respectively Fig.6. 

Conclusions 

The genetic algorithms are an exciting way to realize 
high a urate classification multilayer neural networks. 
Different from other works, we search for the network 
tru ture in an external iterative loop. We consider that 

this two- tage procedure gives a better control of the 
re ults. Also, this procedure allows the selection of a 
ne tructural variable, respectively the number of 
neurons in the output layer. Thi means different rules of 
las es codification, and the equivalent modifications in 

the training data sets. To realize these in a single 
optimization loop seems to be a little bit complicated. 

aybe the two- tage procedure i many computer time 


Fig.5 The evolution of mean( .... ) and best (-)fitness for 
"no-seeding" procedure 

consumers, but our interest was focused on the accuracy 
of the results. It is obvious that, in order to realize high 
accurate classification multilayer neural networks for 
practical applications (e.g. process control or expert 
systems), the network performances are much more 
important than the computer time spent for this. 

Another direction of investigation in this work was 
the performance of a hybrid algorithm composed from 
"delta rule" and genetic algorithms. In fact this 
hybridization was realized by seeding the initial 
population from the genetic algorithm with the best 
solution obtained with classical "delta rule" learning 
procedure. Despite the warning of some authors [111 
that by seeding the genetic algorithm chases after a local 
minimum, and takes time to find its way out, very good 
results were obtained with this procedure in a difficult 
classification problem. 

Our research, realized in the frame of two 
applications, finished with good results, so we can 
conclude that the use of the genetic algorithms 
represents a promising way to realize high accurate 
classification multilayer neural networks. 

SYMBOLS 

B bounds imposed to the weights of the inter-
neuronal connections 

Nh number of neurons in the hidden layer 
No the number of neurons in the output layer 
Sip size of initial population 
xi variables in Eqs. (1)-(3) 

245 

Fig.6The evolution ofmean ( .... )and best (-)fitness for 
"seeding" procedure 

REFERENCES 

1. ALDRICH C. and VAN DEVENIER J. S. J.: Chern. 
Eng. Sci., 1994, 49(9), 1357-1368 

2. ALDRICH C. and VANDEVENTER J. S. J.: Ind. Eng. 
Chern. Res., 1995, 34(1), 216-224 

3. MICHALEWICZ Z.: Genetic Algorithms + Data 
Structures = Evolution Programs, 3rd ed., Springer 
Verlag, Berlin, 1996 

4. HARPS. A., SAMAD T. and GUHA A.: Towards the 
Genetic Synthesis of Neural Networks in: Davies, L. 
(Ed.): Handbook of Genetic Algorithms, Van 
Nostrand Reinhold, New York, 1991 

5. MANIEZZO V.: lEE Trans. Neural Net., 1994, 5 (I), 
E~3 . 

6. GAO F., LIM., WANG F., WANG B. and YUE P. L.: 
Ind. Eng. Chern. Res., 1999, 38(11), 4330-4336 

7. BOOZARJOMBHRY R. B. and SVRCEK W. Y.: 
Comput. Chern. Engng., 2001, 25(10), 1075-1088 

8. HOUCK C. R., JoiNES J. A. and KAY M. G.: NCSU-
IE, Technical Report, North Carolina State 
University, 95-09, 1995 

9. BAUGHMAN D. R. and LIU Y. A.: Neural Networks 
in Bioprocessing and Chemical Engineering, 
Academic Press, San Diego, 1995 

10. WOINAROSCHY A., ISOPESCU R., NITA I. and DINU 
S.: Data Filtering via Artificial Neural Nets. 5lh 
World Congress of Chemical Engineering, San 
Diego, vol. I, 1011-1016, 1996 

11. HAUPT R. L. and HAUPT S. E.: Practical Genetic 
Algorithms, Wiley, New York,1998 


	Page 248 
	Page 249 
	Page 250 
	Page 251 
	Page 252