Paper Title (use style: paper title)

BRAIN. Broad Research in Artificial Intelligence and Neuroscience
ISSN: 2068-0473 | e-ISSN: 2067-3957
Covered in: Web of Science (WOS); PubMed.gov; IndexCopernicus; The Linguist List; Google Academic; Ulrichs; getCITED;

Genamics JournalSeek; J-Gate; SHERPA/RoMEO; Dayang Journal System; Public Knowledge Project; BIUM; NewJour;

ArticleReach Direct; Link+; CSB; CiteSeerX; Socolar; KVK; WorldCat; CrossRef; Ideas RePeC; Econpapers; Socionet.

2020, Volume 11, Issue 2, pages: 82-103 | https://doi.org/10.18662/brain/11.2/76

Biogeography-Based
Optimization for
Weight
Optimization in
Elman Neural
Network Compared
with Meta-
Heuristics Methods

Habib DHAHRI1,2

1 College of Applied Computer Sciences
(CACS), King Saud University, Riyadh,
Saudi Arabia;

2
Faculty of Sciences and Techniques of

Sidi Bouzid, University of Kairouan,
Tunisia.

Abstract: In this paper, we present a learning algorithm for the Elman
Recurrent Neural Network (ERNN) based on Biogeography-Based
Optimization (BBO). The proposed algorithm computes the weights,
initials inputs of the context units and self-feedback coefficient of the
Elman network. The method applied for four benchmark problems:
Mackey Glass and Lorentz equations, which produce chaotic time series,
and to real life classification; iris and Breast Cancer datasets. Numerical
experimental results show improvement of the performance of the proposed
algorithm in terms of accuracy and MSE eror over many heuristic
algorithms.

Keywords: Biogeography-Based Optimization; Time series predictions;
classification.

How to cite: Dhahri, H. (2020). Biogeography-Based
Optimization for Weight Optimization in Elman Neural
Network Compared with Meta-Heuristics Methods. BRAIN.
Broad Research in Artificial Intelligence and Neuroscience, 11(2), 82-
103. https://doi.org/10.18662/brain/11.2/76

https://doi.org/10.18662/brain/11.2/76
https://doi.org/10.18662/brain/11.2/76

BRAIN. Broad Research in June, 2020
Artificial Intelligence and Neuroscience Volume 11, Issue 2

1. Introduction

Among the varied types of Neural Nets, Recurrent Neural Network
(RNN) is able to forecast the most accurate results (Senjyu et al., 2006). In
case of the RNN, the fixed back-connections save a copy of the previous
values of the hidden units in the context units. Actually, the RNN was
applied in various applications, such as, pattern recognition (Hori et al.,
2016), robotic control (Sharma et al., 2016), and genetic data prediction
(Baldi & Pollastri, 2003). Therefore, RNN has been widely used as a tool in
data classification (Nawi et al., 2015) and time series prediction (Chandra,
2015; Koskela et al., 1996).

In fact, there are two types of RNN; a fully RNN used by Kechriotis
Zervas and Manolakos (1994), and a Partially RNN used by Robinson and
Fallside (1991). Concerning the fully RNN, each unit of the NN is
connected to every other unit. BAM (Bidirectional Associative Memory)
(Kosko, 1988) and Hopfield (1982) are examples of FRNN. The recurrent
networks are still complicated in dealing with complex problems. While the
partially training is faster compared to globally recurrent NNs. Recent
researches exhibit that PRNN can be a highly effective forecasting method
in fields like Electricity Consumption and Wind Speed (Cao, Ewing, &
Thompson, 2012; Marvuglia & Messineo, 2012). PRNN offer both features.
This topology is considered for non-linear applications and also to
modulate time series data ( Müller-Navarra, Lessmann & Voß 2015). Elman
Neural Network (ENN) (Elman, 1990) is the most widely used PRNN
architecture. Its structure chosen over the Jordan network (Jordan, 1997)
thanks to its hidden layer being wider than the output layer. This wider layer
allows more values to be feedback to the input, consequently authorizing
more information to be available to the network (Venayagamoorthy, Welch
& Ruffing, 2009). Optimisation can be performed by metaheuristic methods
(Yao & Kim, 2014). This class of network could be trained with heuristics
algorithms because of the inconveniences gradient-based algorithms such as
suffering from the local minima. Generally, it was three tasks for RNN
optimization; weight and bias optimization, architecture optimisation and
parameter gradient optimization. This work concerns the first optimization
task in order to find the minimum training error.

A metaheuristic is officially known as an iterative generation process,
which guides a subordinate heuristic by combining rationally different
concepts for exploring and exploiting the search space, and learning the
strategies which are used for organizing information to find efficiently near-

Biogeography-Based Optimization for Weight Optimization in Elman Neural …
Habib DHAHRI

optimal solutions (Osman & Laporte, 1996). In general, the Nature Inspired
Algorithms is mainly classified in three major’s groups: Evolutionary
Algorithms, Ecology-Based Algorithms and Bio-Inspired Algorithm.

The Evolutionary computation algorithms are based on biological
Darwinian evolution to design a solution and it include the Genetic
Algorithms (Pham & Karaboga, 1999), the Differential Evolution (Storn &
Price, 1997) and Evolutionary Strategies (Kawada, Yamamoto, & Mada,
2004).

The Ecology-Based Algorithms is based on the ecosystems to solve
the problem. This group include Biogeography-Based Optimization (BBO)
and Invasive Weed colony Optimization (IWO) algorithms (Mehrabian &
Lucas, 2006).

Bio-Inspired Algorithm are inspired from the interactions between /
with the species. Based on the behaviours of species, different algorithms
have been invented. This category includes Particle Swarm Optimization
(Kennedy & Eberhart, 1995), Artificial Bee Colony (Karaboga & Basturk,
2007), Fish Swarm Algorithm (Li, Shao & Qian, 2002), Firefly Algorithm
(Yang, 2009), Bacterial Foraging Algorithm (Passino, 2002), Ant Colony
Optimization (Zhipeng et al., 2012), Cuckoo Search Optimization (Yang, &
Deb, 2009), Fruit fly Algorithm (Pan, 2012) and Bat Algorithm (Yang, 2010).

Despite the generation of an evolving solution is common for the
most of approaches, they have their distinctive way to exploit and explore
the search space of the problem.

BBO algorithm considered one of the powerful algorithms due to its
exploration and exploitation strategies depend on the two BBO operators;
migration and mutation.

The main objective of the mutation operator is to enhance the
diversity of the population. Based on this operator, the solution with low
HSI can improved as well as the solutions with high HSI. Consequently, the
probabilistic operator can be applied for any candidate solution. Unlike the
evolutionary algorithms, at each generation the solutions are the
combination of the parents ‘solutions and their offspring.

The rate of emigrations is evolved from one generation to another.
The habitat with a high emigration rate can share the information with the
one with low emigration rate.

Several techniques have been used to optimize Elman RNN
performance such as Genetic Algorithm (GA) (Pham & Karaboga, 1999),
Particle Swarm Optimization PSO (Xiao, Venayagamoorthy, & Corzine,
2007), Ant Colony Optimization (ACO) (Zhipeng et al., 2012), Evolutionary

BRAIN. Broad Research in June, 2020
Artificial Intelligence and Neuroscience Volume 11, Issue 2

Strategies (ES) (Kawada, Yamamoto, & Mada, 2004) and Population Based
Incremental Learning (PBIL) (Palafox & Iba, 2012).

Both BBO and GA are evolutionary algorithm, but each of them has
a specific characteristic. In (Simon et al., 2011), the authors claim that BBO
and GA have the same chances of finding the optimal solution, but BBO
able to conserve this optimum once it found. Thanks to immigration rate,
which help to retain good solutions in the population and reduces with
fitness. In addition, the use of mutation for each individual in a population
enhances the exploitation capability of BBO compared to GA, which applies
a single mutation for the entire population. In fact, (Simon, Ergezer & Du,
2009) prove that the advantage of BBO compared to GA is more marked
with larger and higher dimensional problems.

PSO is based on the behaviour of birds seeking their feeds while
BBO uses the principle of migration to the islands, despite this difference,
these two algorithms have similar characteristics as the sharing of
information between populations but the strength of BBO is that it retains
solutions from one iteration to another and ameliorate the solutions by the
migration mechanism. BBO uses the mutation mechanism that represents a
strong point compared to the Swarm Intelligence techniques (PSO, ACO).

In (Hordri, Yuhaniz, & Nasien, 2013), the author compares the
performance of BBO, PSO and GA, treating fourteen benchmark functions,
and finds that BBO makes a success in the convergence time and a great
performance in avoiding local minima.

In this work, the used BBO algorithm is for optimizing the weight of
the ENN. We also examine the advantages of this algorithm on the training
ENN for the classification and prediction of benchmark problems. The
performance of our algorithm will be compared also with other well-known
heuristics algorithms.

The results indicate that BBO algorithm proves its effectiveness on
training Elman Neural Network.

The used methodology in our analysis is as follows: Section II
presents a broad description of Elman Neural Network (ENN); Section III
explains the basic concept of the BBO algorithm and its use for the design
ENN. The experimental results will be given in the fourth Section. Finally,
the last Section gives the conclusions.

2. Elman Neural Network

Elman Neural Network (ENN) proposed in (Elman, 1990) designed
with the input layer, the hidden layer, the recurrent link known as context

Biogeography-Based Optimization for Weight Optimization in Elman Neural …
Habib DHAHRI

layer and the output layer. It is based on the context layer that contains a
copy of the hidden layer, which are subsequently used as input. The main
advantage of this layer is to store the information in the hidden layer and to
preserve the memory, which gives more information entered as input. As is
well known, this simple Recurrent Network has many advantages, such as
faster convergence, more accurate mapping and nonlinear prediction
capability (Chandra, 2015).

Let assume xi( i = 1 .. m ) the input vector, yk the output of ENN
and zj( j = 1 .. n ) the output of hidden layer. bj and bk are the biases in the
hidden layer and the output layer respectively. uj denotes the context layer
neurons. wij is the weight that connects between the input nodes (i) and the
hidden nodes (j). cj denotes the weight that connects between the hidden
nodes and the context nodes. vjk is the weight that connects the node j in the
hidden layer to the output nodes.

(1)

uj is the context node value, calculated by (2)
1) (2)

The activation function selected in hidden layer is the sigmoid function
defined as follows:

(3)

The output of ENN is given as follows:

(4)

The architecture of ENN is presented in Figure 1.

Fig. 1. Elman Neural Network architecture.

BRAIN. Broad Research in June, 2020
Artificial Intelligence and Neuroscience Volume 11, Issue 2

3. BBO Trained Elman RNN

Biogeography-Based Optimization (BBO) proposed in (Simon,
2008) is an Evolutionary Algorithm (EA), which is based on migration and
immigration to the islands. Recently BBO algorithm has proved its efficiency
and success to supply global optimal solution in different problems such as
(Ma et al., 2015; Mirjalili, Mirjalili, & Lewis, 2014; Rodan, Faris & Alqatawna,
2016; Zhang et al., 2019).

The general idea of this algorithm is to get the relation between the
species by emigration, immigration and mutation. Similarly, to GA, BBO
employs the habitats as chromosome. Each habitat is assigned by a vector of
habitants (genes in a GA), which are used as changeable variables to
optimize the process problem. To achieve this objective, BBO offers
Habitat Suitability Index (HSI) as performance index. High HSI represents a
good solution, so a large number of habitants, which are more likely to
immigrate to other islands with low HIS. Those poor solutions have a low
HSI and a higher immigration rate. The BBO algorithm is characterised by
emigration, immigration and mutation rates.

The time complexity analysis of BBO depends on numbers of used
resources. Based on the O-notation the time complexity expressed as
function describing the asymptotic upper bound. The big O notation is

defined as follows:
The computational complexity of the BBO algorithm depends on

the number of species (habitats), the number of generations, the migration
(selection of the solutions) and mutation operator and finding the best
solution. Therefore, at each iteration, the computational complexity of BBO
is as follows
O(BBO) = O(O(Initialization) + O(migration) + O(mutation) + O (best habitat))

The time complexity of Initialization is of O(nmd) where d is the
dimension of habitats, m is the number of habitants and n is the number of
habitats but in our implementation d is one dimension. In the migration
operation, the roulette wheel selection is used to select of a candidate
solution from which to immigrate so the computation complexity of
migration is of O(mn2). For each habitant, the mutation operation has been
applied, thus the computational complexity of the mutation is of O(nm).
The selection of the best solution is based on the fitness value of each
habitat. Consequently, the computational complexity of best habitat is of
O(n2). Therefore, the final computational complexity of the proposed
method is as follows: O(BBO)= O(g(mn2 +mn+ n2) where g is the number

Biogeography-Based Optimization for Weight Optimization in Elman Neural …
Habib DHAHRI

of generations. In the expression of time complexity, all variables with zero
space dimension are ignored because to their constant complexity time.

Generally, the main steps of BBO in (Simon, Ergezer & Du, 2009)
defined as follows:

1- Initialize habitats values and BBO parameters.
2- Calculate HSI for each island.
3- Update the immigration, emigration and mutation rates.
4- Modify habitats according to the last rates.
5- Mutate some habitants of different habitats, which selected randomly.
6- Save the best habitats as elites.
7- Replace the worst habitats with the elites.
8- If satisfying condition, terminate, else, repeat from step 2.
The BBO algorithm is used to an ENN to finding the best combination

of biases and weights based on two phases:
i. Encoding strategy: represent weights and biases in the proper

scheme (habitats) for BBO.
ii. Fitness evaluation: Calculate HSI as fitness function defined by the

error of ENN to evaluate habitats performance.

Encoding scheme of ENN trained by BBO

The optimization algorithm evolves the parameter of ENNs. Thus,
in BBO, a structure habitat is encoded based on vector scheme which is
defined as follows ENN = [ W12 W32 W24 b1 b2 ]

Fig. 2. ENN with the structure of 1-1-1.

In Figure 2 example, each layer (input, hidden and output) is

composed by only one node. W12 denotes the weight between input node
and hidden node. W32 denotes the weight between context node and hidden
node. W24 denotes the weight between hidden node and output node. b1
and b2 are the biases value of hidden node and output node respectively. So
the encoding vector scheme contains the list of weights between input and
hidden layer, the list of weights between the context and hidden layer, the
list of weights between hidden and output layer and biases values.

Fitness function

BRAIN. Broad Research in June, 2020
Artificial Intelligence and Neuroscience Volume 11, Issue 2

The fact remains that training RNNs is a challenging optimization
problem. So each training set should be evaluated by a fitness measure.
Thus, for each individual, the HSI function should be assigned depending to
the desired optimization. In this work, the Mean Square Error (MSE) is used
to compute the output error as HSI function:

(5)

Where S is the number of training samples, m denotes the number
of output, is the obtained output of the ith input unit and d denotes the
desired output. In this study, the proposed algorithm aims to minimize
network performance. The computational complexity can have written as
follows:
O(BBO-ENN) = O ( i( x(z+y) + hH2 + Hh + H2 )) (6)

Where i is the number of iterations, x is the number of input training
sets, z and y are the number of nodes in the hidden layer and the output
layer respectively, h is the number of habitants (weights and biases) and H is
the number of habitats (ENNs). Denotes that H2 represent the elitism
complexity, Hh is the mutation complexity, hH2 is the migration complexity
and x(z+y) is our ENN complexity. The proposed model BBO-ENN given
in Figure 3.

Fig. 3. BBO-ENN model.

Biogeography-Based Optimization for Weight Optimization in Elman Neural …
Habib DHAHRI

In fact, the first step of the proposed model is to generate a random
set of ENNs as habitats and initialize randomly weights and biases values as
habitants. The second step is to calculate MSE for each ENN by Eq. (1) to
distinguish between the best and the worst habitat. The third step is to
update and modify the emigration, immigration and mutation rates. After
having an idea about the good and poor solutions, we must make the
combination between different islands then select some habitats to mutate
various habitants. The last step is to select the good solution as elitism for
future generations. These steps repeated until satisfaction of the stop
condition, which can be the number of iterations or the error rate. Figure 4
presents a conceptual picture of the BBO-ENN.

Fig. 4. Conceptual picture of BBO-ENN model.

As seen in this figure, there are three habitats (ENNs). Habitat 1

provides a lower HIS, highest emigration and lower immigration. It presents
the good solution, so it is more likely to share weights and biases with
Habitat 2 and Habitat 3. Whereas, Habitat 2 provides a highest HIS, llower
emigration and a highest immigration, it presents the poor solution, then, it
is more likely to accept shared features (weights and biases) from Habitat 1
and Habitat 3.

Theoretically, the proposed BBO-ENN model can improve the
training phase according to the various advantages of emigration and
immigration rates, which are evolutionary mechanisms for each habitat,
which encourages exploration. Thus, BBO forced not to fall in local optima.

BRAIN. Broad Research in June, 2020
Artificial Intelligence and Neuroscience Volume 11, Issue 2

In addition, thanks to the migration of the better weights/biases towards the
worse ENNs, the error rate MSE (HSI) of ENN (habitat) can be improved
during the generation. Not to forget that the mutation mechanism helps
each habitat to show the various exploitations mechanisms. Finally, elitism
phase helps the proposed method to keep some of the best solutions, which
are never lost.

After having an idea about the theoretical functionality of the
proposed method, in the following section, we will see the results of the
practical handling, followed by a comparative study between different
algorithms.

4. Experiments

To verify the performance of BBO algorithm for training Elman
NN, it’s necessary to compare it with PSO, GA, ACO, ES and PBIL over
four benchmark problems: Breast Cancer (Wolberg & Mangasarian, 1990),
Iris (Fisher, 1936) for classification and Mackey and Glass (1977), and
Lorenz Attractor (1963) for time series prediction.

The classification datasets based on two performance criteria: (a)
MSE value and (b) classification accuracy.

In fact, the increase in population size and the number of iterations,
could improve the performance of the algorithms, but in this work, we are
interested on comparing the six algorithms during a fixed number of
iterations. Thus, we are not forced to find the best parameters. Just use the
same network parameters such as number of nodes, the value of weight
initialization and size of population. In this architecture, the log-sigmoid is
used as activation function.

For all algorithms, we initialise the habitat randomly in the range [-
10, 10]. The population size is 200 for each dataset. For all the experiments,
the performance was computed after 30 executed runs with 300 generations
for all the used methods.

According to (Shamsuddin, 2004) there is no standard rule for
determining the suitable number of hidden nodes. We fixed it based on this
theorem "One hidden layer and 2N + 1 hidden neuron sufficient for N
inputs". Table 1 show the different number of input, hidden and output
node of each datasets.

Biogeography-Based Optimization for Weight Optimization in Elman Neural …
Habib DHAHRI

Table 1. Structure of each datasets

Classification
datasets

Number of input
nodes

Number of
hidden nodes

Number of
output nodes

Iris 4 9 3

Breast Cancer 9 19 1

Mackey Glass 4 9 1

Lorenz 3 7 1

The initial parameters of meta-heuristics algorithms fixed in table 2;

it shows various initialization settings for the optimization methods. All the
parameters are chosen based on literature used value.

Table 2. Parameters settings of algorithms

Method Parameter Value

BBO Max immigration/emigration
Mutation
immigration bounds per gene

1
0.005
[0,1]

ACO Initial pheromone (τ0)
Pheromone update constant (Q)
Pheromone constant (q0)
Global Pheromone decay rate (pg)
Local Pheromone decay rate (pt)
Pheromone sensitivity (α)
Visibility sensitivity (β)

1e -06
20
1
0.9
0.5
1
5

GA Selection mechanism
Crossover probability
Mutation probability

Roulette wheel
1
0.01

PSO Cognitive constant (c1)
Social constant (c2)
Inertia weight (w)

1
1
0.3

ES Λ
Σ

10
1

PBIL Learning rate
Elitism parameter
Mutation

0.05
1
0.1

BRAIN. Broad Research in June, 2020
Artificial Intelligence and Neuroscience Volume 11, Issue 2

A. Breast Cancer

This dataset was obtained from the UCI Machine Learning
Repository. This dataset contains 699 instances and 9 attributes with 458
benign and 241 malignant instances. The first 599 patterns are used for
training phase, and remaining for testing. The outputs convergences of
different algorithms are presented in Figure 5. Table 3 presents the
experimental results of different algorithms.

Table 3. Experimental results for Breast Cancer dataset

Algorithms MSE error Accuracy

BBO 0.0024175 99.99

GA 0.0025149 98.45

PSO 0.0043705 94.50

ACO 0.0073633 76.25

ES 0.0062843 73.00

PBIL 0.032001 03.99

From the table 3, it can be seen that the MSE value for our BBO-

ENN is less than PSO, GA, ACO, ES and PBIL algorithms, which
demonstrates the efficacy of BBO-ENN for data classification. The
proposed algorithm achieves the small MSE (0.0024175) and the highest
accuracy with 99.99. Meanwhile, the other methods (PSO, ACO, ES, and
PBIL) converge with large MSE and lower accuracy. Whereas, the MSE
value of GA is closer to MSE BBO. As shown in Figure 5, the BBO
technique has a faster and lower convergence curves among all the methods
for Breast Cancer. From the simulation results, the BBO algorithm proves
its superiority in terms of MSE and accuracy.

Biogeography-Based Optimization for Weight Optimization in Elman Neural …
Habib DHAHRI

Fig. 5. Convergence Breast Cancer Elman RNN.

B. Iris dataset

The Iris Plants data set contains 150 samples and four attributes
(sepal length, sepal width, petal length, petal width). It has actually, three
major classes: Setosa, Versicolour and Virginica. In this experiment, we used
four inputs, nine hidden nodes and three outputs . The first 150 patterns are
selected for training phase, and remaining 150 for testing.

Table 4 presents the results of training algorithms. It illustrates the
comparison between performances of BBO-ENN with GA, PSO, ACO, ES
and PBIL algorithms. From table 4, it can be easier to show that the
proposed algorithm achieves with a lower MSE (0.017371) and higher
accuracy (93.96)

Table 4. Experimental results for Iris dataset

Algorithms MSE error Accuracy

BBO 0.017371 93.96
GA 0.029781 91.22
PSO 0.21166 66.53
ACO 0.40017 38.56
ES 0.30877 66.00
PBIL 0.1116 57.66

BRAIN. Broad Research in June, 2020
Artificial Intelligence and Neuroscience Volume 11, Issue 2

Fig. 6. Convergence Iris Elman RNN.

Figure 6 shows the convergence of each algorithm, and illustrates the
success of BBO compared to the other methods. From these results, the
BBO algorithm achieves with higher performance.

C. Mackey–Glass time series prediction

The Mackey-Glass time series prediction is defined using the following
equation:

( )( )

( )
1 ( )

ax tdx t
bx t

dt x t



−
= −

+ −

(7)

In our work, the input of ENN with four data points: x(t), x(t-6), x(t-12)
and x(t-18). The output is defined in equation 8:

( ) ( ) ( ) ( ) ( )( )6 , 6 , 12 , 18 x t f x t x t x t x t+ = − − − (8)
The first 500 samples are selected for training phase, and remaining

500 for testing. After 300 generations of the training process, the outputs
convergences of different algorithms are presented in Figure 7. Table 5
shows the comparison of MSE error of the BBO-ENN to the other used
meta-heuristics algorithms. In this experiment, the BBO and GA algorithm
achieves with the smallest MSE error of 0.009702. In some implementation,
the MSE-GA equal to MSE-BBO. However, BBO-ENN is still promising in
cases where the convergence to the best solution is faster than the other
methods.

Biogeography-Based Optimization for Weight Optimization in Elman Neural …
Habib DHAHRI

Table 5. Experimental results for Mackey-Glass dataset

Algorithms MSE training error MSE Test error

BBO 0.00925 0.009702
PSO 0.01043 0.071985
GA 0.01130 0.009851
ACO 0.04851 0.093598
ES 0.04748 0.082876
PBIL 0.02293 0.091885

Fig. 7. Convergence Mackey-Glass Elman RNN.

D. Lorenz attractor

The Lorenz system was given by the following differential equations:

( ) ( )( )

( ) (

( )

(

) ( ) ( )

( ) ( ) ( )
)

y t x t t
dt

x t y t z t
dt

x t
y t x t

dy t
x t

dz t











− −



−

−



(9)

Where and are positive real parameters.

BRAIN. Broad Research in June, 2020
Artificial Intelligence and Neuroscience Volume 11, Issue 2

In these three equations, the component x denotes the used time series. In
this work, the input of ENN is defined by x(t), x(t-1), x(t-2). The output is
presented in equation 10:

( ) ( ) ( ) ( )( )1 , 1 , 2x t f x t x t x t+ = − − (10)
The first 500 samples from 1000 simulation data points are chosen for
training phase, and the remaining 500 for testing. The convergence curve of
each algorithm summarized in Figure 8. Table 6 illustrates the MSE training
error and MSE testing error of each algorithm. From this table, it can be seen
that BBO achieves with a lower MSE error for both training and testing
phase (0.14278, 0.241291). However, the other algorithms: PSO, GA, ACO,
ES and PBIL have MSE of 0.21271, 0.48498, 1.27504, 0.29753, and 0.28753,
quite larger than BBO. Similarly, Figure 8 represents the MSE convergence
for Lorenz problem. This Figure demonstrates that the proposed BBO-ENN
have better result than the other algorithms. The BBO-ENN shows again its
efficiency for the prediction of Lorenz time series.

Table 6. Experimental results for Lorenz dataset

Algorithms MSE training error

MSE test error

BBO 0.14278 0.241291
PSO 0.21271 0.253047
GA 0.48498 0.242105
ACO 1.27504 0.962869
ES 0.29753 0.293404
PBIL 0.28753 0.283424

Biogeography-Based Optimization for Weight Optimization in Elman Neural …
Habib DHAHRI

Fig. 8. Convergence Lorenz Elman RNN.

During the previous experiments, the BBO proves its good

performances compared to the other applied algorithms. The obtained
results can be explained based on the philosophy of the BBO technique over
the other evolutionary algorithms. During the generation, the BBO solutions
are maintained depending on their emigration rate. At each iteration, the
BBO improves the habits by changing some features. The poor solutions
can be improved from the good solutions by sharing theirs SIVs (attributes).
However, in GA, ACO, PBIL techniques, the worse solutions are discarded
from the populations and only the best candidate’s solutions are maintained.
Thus, the population evolves using the elite solutions. BBO also clearly
similar to PSO and DE approach in maintaining solutions. The solution
learns from theirs neighbours and evolves based on the movements of the
around particles.

Conclusion

In this work, a Biogeography-Based Optimization (BBO) algorithm
proposed to train Elman Neural Network (ENN) for four benchmark
problems. The experiment results show that the BBO-ENN model can
effectively classify the data such as Breast Cancer and Iris data sets. The
method applied to Mackey Glass and Lorentz equations, which produce
chaotic time series. Statistical results show that the proposed algorithm

BRAIN. Broad Research in June, 2020
Artificial Intelligence and Neuroscience Volume 11, Issue 2

outperforms the GA, PSO, EA, ACO and PBIL algorithms. Performance
and success of BBO-ENN is mainly due to the use of the Biogeography-
Based Optimization (BBO) algorithm, which can successfully optimize the
weight parameter of Elman Neural Network. BBO-ENN makes a success in
the convergence time and a great performance in avoiding local minima.
Although, BBO has shown a good performance when being applied to
classification and time series prediction, BBO inherently lacks exploration
ability to increase the diversity of habitats, which lead to slow down the
convergence of the algorithm. The expansion of applying BBO algorithms in
many types of problems open several research areas. One suggested research
for the future work is to automating parameter tuning. Additional study is
to apply the BBO algorithm for complicated problems.

Acknowledgment

The authors extend their appreciation to the Deanship of Scientific Research
and RSSU at King Saud University for funding this work through research
group no. RG-1438–071.

References

Baldi, P. & Pollastri, G. (2003). The Principled Design of Large-Scale Recursive
Neural Network Architectures-DAG-RNNs and the Protein Structure
Prediction Problem. Journal of Machine Learning Research 4(4), 575-602.
https://doi.org/10.1162/153244304773936054

Cao, Q. Ewinga, B. T. & Thompson, M. A. (2012). Forecasting wind speed with
recurrent neural networks. European Journal of Operational Research, 221(1),
148-154. https://doi.org/10.1016/j.ejor.2012.02.042

Chandra, R. (2015). Competition and Collaboration in Cooperative Coevolution of
Elman Recurrent Neural Networks for Time-Series Prediction. IEEE
Trans Neural Netw Learn Syst, 26(12), 3123-3136.
https://doi.org/10.1109/TNNLS.2015.2404823

Elman, J. L. (1990). Finding structure in time. Cognitive science, 4(2), 179-211.
https://doi.org/10.1016/0364-0213(90)90002-E

Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems.
Annals of human genetics, 7(2), 179-188. https://doi.org/10.1111/j.1469-
1809.1936.tb02137.x

Hopfield, J. J. (1982). Neural networks and physical systems with emergent
collective computational abilities. Proceedings of the National Academy of Sciences
of the United States of America, 79(8), 2554–2558.
https://doi.org/10.1073/pnas.79.8.2554

https://doi.org/10.1162/153244304773936054
https://doi.org/10.1016/j.ejor.2012.02.042
https://doi.org/10.1109/TNNLS.2015.2404823
https://doi.org/10.1016/0364-0213(90)90002-E
https://doi.org/10.1111/j.1469-1809.1936.tb02137.x
https://doi.org/10.1111/j.1469-1809.1936.tb02137.x
https://doi.org/10.1073/pnas.79.8.2554

Biogeography-Based Optimization for Weight Optimization in Elman Neural …
Habib DHAHRI

100

Hordri, N. F., Yuhaniz, S. S. & Nasien, D. (2013). A Comparison Study of
Biogeography based. Optimization for Optimization Problems. International
Journal of Advances in Soft Computing and its Applications, 5(1), 1-16.

Hori, T., Hori, C., Watanabe, S. & Hershey, J. R. (2016). Minimum word error
training of long short-term memory recurrent neural network language
models for speech recognition. IEEE International Conference on Acoustics,
Speech and Signal Processing (ICASSP), Shanghai, 5990-5994.
https://doi.org/10.1109/ICASSP.2016.7472827

Jordan. M. I. (1997). A parallel distributed processing approach. Advances in
Psychology, 121, 471-495. https://doi.org/10.1016/S0166-4115(97)80111-2

Karaboga, D., & Basturk, B. (2007). A powerful and efficient algorithm for
numerical function optimization: artificial bee colony (ABC) algorithm.
Journal of Global Optimization, 39, 459–471.
https://doi.org/10.1007/s10898-007-9149-x

Kawada, K., Yamamoto, T., & Mada, Y. (2004). A design of evolutionary recurrent
neural-net based controllers for an inverted pendulum. 5th Asian Control
Conference (IEEE Cat. No.04EX904), 3, 1419-1422.

Kechriotis, G., Zervas, E. & Manolakos, E. S. (1994). Using recurrent neural
networks for adaptive communication channel equalization. IEEE
Transactions on Neural Networks, 5(2), 267-278.
https://doi.org/10.1109/72.279190

Kennedy, J. & Eberhart, R. (1995). Particle swarm optimization. Proceedings of
ICNN'95 - International Conference on Neural Networks, Perth, WA, Australia,
4, 1942-1948. https://doi.org/10.1109/ICNN.1995.488968

Koskela, T., Lehtokangas, M., Saarinen, J. & Kaski, K. (1996). Time Series
Prediction with Multilayer Perceptron, FIR and Elman Neural Networks.
Proceedings of the World Congress on Neural Networks, 491-496.
https://pdfs.semanticscholar.org/82c8/e5d0cd4a7467f7f54ad823b2136b9
73eeb6e.pd

Kosko, B. (1988). Bidirectional associative memories. IEEE Transactions on Systems,
Man, and Cybernetics, 18(1), 49-60. https://doi.org/10.1109/21.87054.

Li, X. L., Shao, Z. J. & Qian, J. X. (2002). An Optimizing Method Based on
Autonomous Animats: Fish-swarm Algorithm. Systems Engineering - Theory
& Practice, 22(11), 32-38. https://doi.org/10.12011/1000-6788(2002)11-32

Lorenz, E. N. (1963). Deterministic Nonperiodic Flow. Journal of the Atmospheric
Sciences, 20, 130–141. https://doi.org/10.1175/1520-
0469(1963)020<0130:DNF>2.0.CO;2

Ma, H., Fei, M., Simon, D., & Chen, Z. (2015). Biogeography-based optimization in
noisy environments. Transactions of the Institute of Measurement and Control,
37(2), 190–204. https://doi.org/10.1177/0142331214537015

https://doi.org/10.1109/ICASSP.2016.7472827
https://doi.org/10.1016/S0166-4115(97)80111-2
https://doi.org/10.1007/s10898-007-9149-x
https://doi.org/10.1109/72.279190
https://doi.org/10.1109/ICNN.1995.488968
https://pdfs.semanticscholar.org/82c8/e5d0cd4a7467f7f54ad823b2136b973eeb6e.pd
https://pdfs.semanticscholar.org/82c8/e5d0cd4a7467f7f54ad823b2136b973eeb6e.pd
https://doi.org/10.1109/21.87054
https://doi.org/10.12011/1000-6788(2002)11-32
https://doi.org/10.1175/1520-0469(1963)020%3c0130:DNF%3e2.0.CO;2
https://doi.org/10.1175/1520-0469(1963)020%3c0130:DNF%3e2.0.CO;2
https://doi.org/10.1177/0142331214537015

BRAIN. Broad Research in June, 2020
Artificial Intelligence and Neuroscience Volume 11, Issue 2

101

Mackey, M. C. & Glass, L. (1977). Oscillation and chaos in physiological control
systems. Science, 197(4300), 287-289.
https://doi.org/10.1126/science.267326

Marvuglia, A. & Messineo, A. (2012). Using Recurrent Artificial Neural Networks
to Forecast Household Electricity Consumption. Energy Procedia, 14, 45-55.

Mehrabian, A. R., & Lucas, C. (2006). A novel numerical optimization algorithm
inspired from weed colonization. Ecological Informatics, 1(4), 355-366.
https://doi.org/10.1016/j.ecoinf.2006.07.003

Mirjalili, S., Mirjalili, S. & Lewis, A. (2014). Let a biogeography-based optimizer
train your multi-layer perceptron. Information Sciences, 269, 188-209.
https://doi.org/10.1016/j.ins.2014.01.038

Müller-Navarra, M., Lessmann, S. & Voß, S. (2015). Sales Forecasting with Partial
Recurrent Neural Networks: Empirical Insights and Benchmarking Results.
48th Hawaii International Conference on System Sciences, Kauai, HI, 1108-1116.
https://doi.org/10.1109/HICSS.2015.135

Nawi, N. M. & Khan, A. & Rehman, G., Syed, M., Chiroma, H. & Herawan, T.
(2014). Weight Optimization in Recurrent Neural Networks with Hybrid
Metaheuristic Cuckoo Search Techniques for Data Classification.
Mathematical Problems in Engineering. https://doi.org/10.1155/2015/868375

Osman, I. H., & Laporte, G. (1996). Metaheuristics: A bibliography. Annals of
Operations Research, 63, 511–623. https://doi.org/10.1007/BF02125421

Palafox, L. & Iba, H. (2012). On the use of Population Based Incremental Learning
to do Reverse Engineering on Gene Regulatory Networks. IEEE Congress
on Evolutionary Computation, Brisbane, QLD, 1-8.
https://doi.org/10.1109/CEC.2012.6256580

Pan, W-S. (2012). A new fruit fly optimization algorithm: taking the financial
distress model as an example. Knowledge-Based Systems, 26, 69-74.
https://doi.org/10.1016/j.knosys.2011.07.001

Passino, K. M. (2002). Biomimicry of bacterial foraging for distributed optimization
and control. IEEE Control Systems Magazine, 22(3), 52-67.
https://doi.org/10.1109/MCS.2002.1004010

Pham, D. T. & Karaboga, D. (1999). Training Elman and Jordan networks for
system identification using genetic algorithms. Artificial Intelligence in
Engineering, 13(2), 107-117. https://doi.org/10.1016/S0954-1810(98)00013-
2

Robinson, T. & Fallside, F. (1991). A recurrent error propagation network speech
recognition system. Computer Speech & Language, 5(3), 259-274.
https://doi.org/10.1016/0885-2308(91)90010-N

Rodan, A., Faris, H. & Alqatawna, J. (2016). Optimizing Feedforward Neural
Networks Using Biogeography Based Optimization for E-Mail Spam

https://doi.org/10.1126/science.267326
https://doi.org/10.1016/j.ecoinf.2006.07.003
https://doi.org/10.1016/j.ins.2014.01.038
https://doi.org/10.1109/HICSS.2015.135
https://doi.org/10.1155/2015/868375
https://doi.org/10.1007/BF02125421
https://doi.org/10.1109/CEC.2012.6256580
https://doi.org/10.1016/j.knosys.2011.07.001
https://doi.org/10.1109/MCS.2002.1004010
https://doi.org/10.1016/S0954-1810(98)00013-2
https://doi.org/10.1016/S0954-1810(98)00013-2
https://doi.org/10.1016/0885-2308(91)90010-N

Biogeography-Based Optimization for Weight Optimization in Elman Neural …
Habib DHAHRI

102

Identification. International Journal of Communications, Network and System
Sciences, 9, 19-28. https://doi.org/10.4236/ijcns.2016.91002

Senjyu, T., Yona, A., Urasaki, N. & Funabashi, T. (2006). Application of Recurrent
Neural Network to Long-Term-Ahead Generating Power Forecasting for
Wind Power Generator. IEEE PES Power Systems Conference and Exposition,
Atlanta, GA, 1260-1265. https://doi.org/10.1109/PSCE.2006.296487

Shamsuddin, M. (2004). Lecture note advanced artificial intelligence: Number of hidden
neurons [Unpublished Doctoral Thesis]. Universiti Teknologi Malaysia.

Sharmaa, R., Kumar, V., Gaur, P. & Mittal, A. P. (2016). An adaptive PID like
controller using mix locally recurrent neural network for robotic
manipulator with variable payload. ISA Transactions, 62, 258-267.
https://doi.org/10.1016/j.isatra.2016.01.016

Simon, D. (2008). Biogeography-based optimization. 2009 IEEE International
Conference on Systems, Man and Cybernetics, San Antonio, TX (702-713).
https://doi.org/10.1109/TEVC.2008.919004

Simon, D., Ergezer, M. & Du, D. (2009). Population distributions in biogeography-
based optimization algorithms with elitism. 2009 IEEE International
Conference on Systems, Man and Cybernetics, San Antonio, TX, 991-996.
https://doi.org/10.1109/ICSMC.2009.5346058

Simon, D., Rarick, R., Ergezer, M. & Du, D. (2011). Analytical and numerical
comparisons of biogeography-based optimization and genetic algorithms.
Information Sciences,181(7), 1224-1248.
https://doi.org/10.1016/j.ins.2010.12.006

Storn, R., & Price, K. (1997). Differential Evolution – A Simple and Efficient
Heuristic for global Optimization over Continuous Spaces. Journal of Global
Optimization, 11, 341–359. https://doi.org/10.1023/A:1008202821328

Venayagamoorthy, G. A., Welch, R. L. & Ruffing, S. M. (2009). Comparison of
Feedforward and Feedback Neural Network. Architectures for Short Term
Wind Speed Prediction. IJCNN'09: Proceedings of the 2009 international joint
conference on Neural Networks pp. 3141–3146.
https://doi.org/10.1109/IJCNN.2009.5179034

Wolberg, W. H., & Mangasarian, O. L. (1990). Multisurface method of pattern
separation for medical diagnosis applied to breast cytology. Proceedings of the
National Academy of Sciences of the United States of America, 87(23), 9193–9196.
https://doi.org/10.1073/pnas.87.23.9193

Xiao, P., Venayagamoorthy, G. K. & Corzine, K. A. (2007). Combined Training of
Recurrent Neural Networks with Particle Swarm Optimization and
Backpropagation Algorithms for Impedance Identification. IEEE Swarm
Intelligence Symposium, Honolulu, HI, 9-15.
https://doi.org/10.1109/SIS.2007.368020

https://doi.org/10.4236/ijcns.2016.91002
https://doi.org/10.1109/PSCE.2006.296487
https://doi.org/10.1016/j.isatra.2016.01.016
https://doi.org/10.1109/TEVC.2008.919004
https://doi.org/10.1109/ICSMC.2009.5346058
https://doi.org/10.1016/j.ins.2010.12.006
https://doi.org/10.1023/A:1008202821328
https://doi.org/10.1109/IJCNN.2009.5179034
https://doi.org/10.1073/pnas.87.23.9193
https://doi.org/10.1109/SIS.2007.368020

BRAIN. Broad Research in June, 2020
Artificial Intelligence and Neuroscience Volume 11, Issue 2

103

Yang, X-S. (2009). Firefly algorithms for multimodal optimization. International
Symposium on Stochastic Algorithms, 169-178. https://doi.org/10.1007/978-3-
642-04944-6_14

Yang, X. S. (2010). A New Metaheuristic Bat-Inspired Algorithm. In J. R.
González, D.A. Pelta, C. Cruz, G. Terrazas, & N. Krasnogor (Eds.), Nature
Inspired Cooperative Strategies for Optimization (NICSO 2010). Studies in
Computational Intelligence, vol 284. Springer.

Yang, X-S. & Deb, S. (2009). Cuckoo search via Levy flights. World Congress on
Nature & Biologically Inspired Computing (NaBIC), Coimbatore, 210-214.
https://doi.org/10.1109/NABIC.2009.5393690

Yoo, D. G., & Kim, J. H. (2014). Meta-heuristic algorithms as tools for hydrological
science. Geoscience Letters, 1, 4. https://doi.org/10.1186/2196-4092-1-4

Zhang, X., Kang, Q., Tu, Q., Cheng, J. & Wang, X. (2019). Efficient and merged
biogeography-based optimization algorithm for global optimization
problems. Soft Computing, 23, 4483–4502. https://doi.org/10.1007/s00500-
018-3113-1

Zhipeng, Y. Minfang, P., Hao, H. & Xianfeng, L. (2012). Fault Locating of
Grounding Grids Based on Ant colony Optimizing Elman Neural
Network. 2012 Third International Conference on Digital Manufacturing &
Automation, GuiLin, 406-409. https://doi.org/10.1109/ICDMA.2012.97

https://doi.org/10.1007/978-3-642-04944-6_14
https://doi.org/10.1007/978-3-642-04944-6_14
https://doi.org/10.1109/NABIC.2009.5393690
https://doi.org/10.1186/2196-4092-1-4
https://doi.org/10.1007/s00500-018-3113-1
https://doi.org/10.1007/s00500-018-3113-1
https://doi.org/10.1109/ICDMA.2012.97