Paper Title (use style: paper title) BRAIN. Broad Research in Artificial Intelligence and Neuroscience ISSN: 2068-0473 | e-ISSN: 2067-3957 Covered in: Web of Science (WOS); PubMed.gov; IndexCopernicus; The Linguist List; Google Academic; Ulrichs; getCITED; Genamics JournalSeek; J-Gate; SHERPA/RoMEO; Dayang Journal System; Public Knowledge Project; BIUM; NewJour; ArticleReach Direct; Link+; CSB; CiteSeerX; Socolar; KVK; WorldCat; CrossRef; Ideas RePeC; Econpapers; Socionet. 2020, Volume 11, Issue 2, pages: 82-103 | https://doi.org/10.18662/brain/11.2/76 Biogeography-Based Optimization for Weight Optimization in Elman Neural Network Compared with Meta- Heuristics Methods Habib DHAHRI1,2 1 College of Applied Computer Sciences (CACS), King Saud University, Riyadh, Saudi Arabia; 2 Faculty of Sciences and Techniques of Sidi Bouzid, University of Kairouan, Tunisia. Abstract: In this paper, we present a learning algorithm for the Elman Recurrent Neural Network (ERNN) based on Biogeography-Based Optimization (BBO). The proposed algorithm computes the weights, initials inputs of the context units and self-feedback coefficient of the Elman network. The method applied for four benchmark problems: Mackey Glass and Lorentz equations, which produce chaotic time series, and to real life classification; iris and Breast Cancer datasets. Numerical experimental results show improvement of the performance of the proposed algorithm in terms of accuracy and MSE eror over many heuristic algorithms. Keywords: Biogeography-Based Optimization; Time series predictions; classification. How to cite: Dhahri, H. (2020). Biogeography-Based Optimization for Weight Optimization in Elman Neural Network Compared with Meta-Heuristics Methods. BRAIN. Broad Research in Artificial Intelligence and Neuroscience, 11(2), 82- 103. https://doi.org/10.18662/brain/11.2/76 https://doi.org/10.18662/brain/11.2/76 https://doi.org/10.18662/brain/11.2/76 BRAIN. Broad Research in June, 2020 Artificial Intelligence and Neuroscience Volume 11, Issue 2 83 1. Introduction Among the varied types of Neural Nets, Recurrent Neural Network (RNN) is able to forecast the most accurate results (Senjyu et al., 2006). In case of the RNN, the fixed back-connections save a copy of the previous values of the hidden units in the context units. Actually, the RNN was applied in various applications, such as, pattern recognition (Hori et al., 2016), robotic control (Sharma et al., 2016), and genetic data prediction (Baldi & Pollastri, 2003). Therefore, RNN has been widely used as a tool in data classification (Nawi et al., 2015) and time series prediction (Chandra, 2015; Koskela et al., 1996). In fact, there are two types of RNN; a fully RNN used by Kechriotis Zervas and Manolakos (1994), and a Partially RNN used by Robinson and Fallside (1991). Concerning the fully RNN, each unit of the NN is connected to every other unit. BAM (Bidirectional Associative Memory) (Kosko, 1988) and Hopfield (1982) are examples of FRNN. The recurrent networks are still complicated in dealing with complex problems. While the partially training is faster compared to globally recurrent NNs. Recent researches exhibit that PRNN can be a highly effective forecasting method in fields like Electricity Consumption and Wind Speed (Cao, Ewing, & Thompson, 2012; Marvuglia & Messineo, 2012). PRNN offer both features. This topology is considered for non-linear applications and also to modulate time series data ( Müller-Navarra, Lessmann & Voß 2015). Elman Neural Network (ENN) (Elman, 1990) is the most widely used PRNN architecture. Its structure chosen over the Jordan network (Jordan, 1997) thanks to its hidden layer being wider than the output layer. This wider layer allows more values to be feedback to the input, consequently authorizing more information to be available to the network (Venayagamoorthy, Welch & Ruffing, 2009). Optimisation can be performed by metaheuristic methods (Yao & Kim, 2014). This class of network could be trained with heuristics algorithms because of the inconveniences gradient-based algorithms such as suffering from the local minima. Generally, it was three tasks for RNN optimization; weight and bias optimization, architecture optimisation and parameter gradient optimization. This work concerns the first optimization task in order to find the minimum training error. A metaheuristic is officially known as an iterative generation process, which guides a subordinate heuristic by combining rationally different concepts for exploring and exploiting the search space, and learning the strategies which are used for organizing information to find efficiently near- Biogeography-Based Optimization for Weight Optimization in Elman Neural … Habib DHAHRI 84 optimal solutions (Osman & Laporte, 1996). In general, the Nature Inspired Algorithms is mainly classified in three major’s groups: Evolutionary Algorithms, Ecology-Based Algorithms and Bio-Inspired Algorithm. The Evolutionary computation algorithms are based on biological Darwinian evolution to design a solution and it include the Genetic Algorithms (Pham & Karaboga, 1999), the Differential Evolution (Storn & Price, 1997) and Evolutionary Strategies (Kawada, Yamamoto, & Mada, 2004). The Ecology-Based Algorithms is based on the ecosystems to solve the problem. This group include Biogeography-Based Optimization (BBO) and Invasive Weed colony Optimization (IWO) algorithms (Mehrabian & Lucas, 2006). Bio-Inspired Algorithm are inspired from the interactions between / with the species. Based on the behaviours of species, different algorithms have been invented. This category includes Particle Swarm Optimization (Kennedy & Eberhart, 1995), Artificial Bee Colony (Karaboga & Basturk, 2007), Fish Swarm Algorithm (Li, Shao & Qian, 2002), Firefly Algorithm (Yang, 2009), Bacterial Foraging Algorithm (Passino, 2002), Ant Colony Optimization (Zhipeng et al., 2012), Cuckoo Search Optimization (Yang, & Deb, 2009), Fruit fly Algorithm (Pan, 2012) and Bat Algorithm (Yang, 2010). Despite the generation of an evolving solution is common for the most of approaches, they have their distinctive way to exploit and explore the search space of the problem. BBO algorithm considered one of the powerful algorithms due to its exploration and exploitation strategies depend on the two BBO operators; migration and mutation. The main objective of the mutation operator is to enhance the diversity of the population. Based on this operator, the solution with low HSI can improved as well as the solutions with high HSI. Consequently, the probabilistic operator can be applied for any candidate solution. Unlike the evolutionary algorithms, at each generation the solutions are the combination of the parents ‘solutions and their offspring. The rate of emigrations is evolved from one generation to another. The habitat with a high emigration rate can share the information with the one with low emigration rate. Several techniques have been used to optimize Elman RNN performance such as Genetic Algorithm (GA) (Pham & Karaboga, 1999), Particle Swarm Optimization PSO (Xiao, Venayagamoorthy, & Corzine, 2007), Ant Colony Optimization (ACO) (Zhipeng et al., 2012), Evolutionary BRAIN. Broad Research in June, 2020 Artificial Intelligence and Neuroscience Volume 11, Issue 2 85 Strategies (ES) (Kawada, Yamamoto, & Mada, 2004) and Population Based Incremental Learning (PBIL) (Palafox & Iba, 2012). Both BBO and GA are evolutionary algorithm, but each of them has a specific characteristic. In (Simon et al., 2011), the authors claim that BBO and GA have the same chances of finding the optimal solution, but BBO able to conserve this optimum once it found. Thanks to immigration rate, which help to retain good solutions in the population and reduces with fitness. In addition, the use of mutation for each individual in a population enhances the exploitation capability of BBO compared to GA, which applies a single mutation for the entire population. In fact, (Simon, Ergezer & Du, 2009) prove that the advantage of BBO compared to GA is more marked with larger and higher dimensional problems. PSO is based on the behaviour of birds seeking their feeds while BBO uses the principle of migration to the islands, despite this difference, these two algorithms have similar characteristics as the sharing of information between populations but the strength of BBO is that it retains solutions from one iteration to another and ameliorate the solutions by the migration mechanism. BBO uses the mutation mechanism that represents a strong point compared to the Swarm Intelligence techniques (PSO, ACO). In (Hordri, Yuhaniz, & Nasien, 2013), the author compares the performance of BBO, PSO and GA, treating fourteen benchmark functions, and finds that BBO makes a success in the convergence time and a great performance in avoiding local minima. In this work, the used BBO algorithm is for optimizing the weight of the ENN. We also examine the advantages of this algorithm on the training ENN for the classification and prediction of benchmark problems. The performance of our algorithm will be compared also with other well-known heuristics algorithms. The results indicate that BBO algorithm proves its effectiveness on training Elman Neural Network. The used methodology in our analysis is as follows: Section II presents a broad description of Elman Neural Network (ENN); Section III explains the basic concept of the BBO algorithm and its use for the design ENN. The experimental results will be given in the fourth Section. Finally, the last Section gives the conclusions. 2. Elman Neural Network Elman Neural Network (ENN) proposed in (Elman, 1990) designed with the input layer, the hidden layer, the recurrent link known as context Biogeography-Based Optimization for Weight Optimization in Elman Neural … Habib DHAHRI 86 layer and the output layer. It is based on the context layer that contains a copy of the hidden layer, which are subsequently used as input. The main advantage of this layer is to store the information in the hidden layer and to preserve the memory, which gives more information entered as input. As is well known, this simple Recurrent Network has many advantages, such as faster convergence, more accurate mapping and nonlinear prediction capability (Chandra, 2015). Let assume xi( i = 1 .. m ) the input vector, yk the output of ENN and zj( j = 1 .. n ) the output of hidden layer. bj and bk are the biases in the hidden layer and the output layer respectively. uj denotes the context layer neurons. wij is the weight that connects between the input nodes (i) and the hidden nodes (j). cj denotes the weight that connects between the hidden nodes and the context nodes. vjk is the weight that connects the node j in the hidden layer to the output nodes. (1) uj is the context node value, calculated by (2) 1) (2) The activation function selected in hidden layer is the sigmoid function defined as follows: (3) The output of ENN is given as follows: (4) The architecture of ENN is presented in Figure 1. Fig. 1. Elman Neural Network architecture. BRAIN. Broad Research in June, 2020 Artificial Intelligence and Neuroscience Volume 11, Issue 2 87 3. BBO Trained Elman RNN Biogeography-Based Optimization (BBO) proposed in (Simon, 2008) is an Evolutionary Algorithm (EA), which is based on migration and immigration to the islands. Recently BBO algorithm has proved its efficiency and success to supply global optimal solution in different problems such as (Ma et al., 2015; Mirjalili, Mirjalili, & Lewis, 2014; Rodan, Faris & Alqatawna, 2016; Zhang et al., 2019). The general idea of this algorithm is to get the relation between the species by emigration, immigration and mutation. Similarly, to GA, BBO employs the habitats as chromosome. Each habitat is assigned by a vector of habitants (genes in a GA), which are used as changeable variables to optimize the process problem. To achieve this objective, BBO offers Habitat Suitability Index (HSI) as performance index. High HSI represents a good solution, so a large number of habitants, which are more likely to immigrate to other islands with low HIS. Those poor solutions have a low HSI and a higher immigration rate. The BBO algorithm is characterised by emigration, immigration and mutation rates. The time complexity analysis of BBO depends on numbers of used resources. Based on the O-notation the time complexity expressed as function describing the asymptotic upper bound. The big O notation is defined as follows: The computational complexity of the BBO algorithm depends on the number of species (habitats), the number of generations, the migration (selection of the solutions) and mutation operator and finding the best solution. Therefore, at each iteration, the computational complexity of BBO is as follows O(BBO) = O(O(Initialization) + O(migration) + O(mutation) + O (best habitat)) The time complexity of Initialization is of O(nmd) where d is the dimension of habitats, m is the number of habitants and n is the number of habitats but in our implementation d is one dimension. In the migration operation, the roulette wheel selection is used to select of a candidate solution from which to immigrate so the computation complexity of migration is of O(mn2). For each habitant, the mutation operation has been applied, thus the computational complexity of the mutation is of O(nm). The selection of the best solution is based on the fitness value of each habitat. Consequently, the computational complexity of best habitat is of O(n2). Therefore, the final computational complexity of the proposed method is as follows: O(BBO)= O(g(mn2 +mn+ n2) where g is the number Biogeography-Based Optimization for Weight Optimization in Elman Neural … Habib DHAHRI 88 of generations. In the expression of time complexity, all variables with zero space dimension are ignored because to their constant complexity time. Generally, the main steps of BBO in (Simon, Ergezer & Du, 2009) defined as follows: 1- Initialize habitats values and BBO parameters. 2- Calculate HSI for each island. 3- Update the immigration, emigration and mutation rates. 4- Modify habitats according to the last rates. 5- Mutate some habitants of different habitats, which selected randomly. 6- Save the best habitats as elites. 7- Replace the worst habitats with the elites. 8- If satisfying condition, terminate, else, repeat from step 2. The BBO algorithm is used to an ENN to finding the best combination of biases and weights based on two phases: i. Encoding strategy: represent weights and biases in the proper scheme (habitats) for BBO. ii. Fitness evaluation: Calculate HSI as fitness function defined by the error of ENN to evaluate habitats performance. Encoding scheme of ENN trained by BBO The optimization algorithm evolves the parameter of ENNs. Thus, in BBO, a structure habitat is encoded based on vector scheme which is defined as follows ENN = [ W12 W32 W24 b1 b2 ] Fig. 2. ENN with the structure of 1-1-1. In Figure 2 example, each layer (input, hidden and output) is composed by only one node. W12 denotes the weight between input node and hidden node. W32 denotes the weight between context node and hidden node. W24 denotes the weight between hidden node and output node. b1 and b2 are the biases value of hidden node and output node respectively. So the encoding vector scheme contains the list of weights between input and hidden layer, the list of weights between the context and hidden layer, the list of weights between hidden and output layer and biases values. Fitness function BRAIN. Broad Research in June, 2020 Artificial Intelligence and Neuroscience Volume 11, Issue 2 89 The fact remains that training RNNs is a challenging optimization problem. So each training set should be evaluated by a fitness measure. Thus, for each individual, the HSI function should be assigned depending to the desired optimization. In this work, the Mean Square Error (MSE) is used to compute the output error as HSI function: (5) Where S is the number of training samples, m denotes the number of output, is the obtained output of the ith input unit and d denotes the desired output. In this study, the proposed algorithm aims to minimize network performance. The computational complexity can have written as follows: O(BBO-ENN) = O ( i( x(z+y) + hH2 + Hh + H2 )) (6) Where i is the number of iterations, x is the number of input training sets, z and y are the number of nodes in the hidden layer and the output layer respectively, h is the number of habitants (weights and biases) and H is the number of habitats (ENNs). Denotes that H2 represent the elitism complexity, Hh is the mutation complexity, hH2 is the migration complexity and x(z+y) is our ENN complexity. The proposed model BBO-ENN given in Figure 3. Fig. 3. BBO-ENN model. Biogeography-Based Optimization for Weight Optimization in Elman Neural … Habib DHAHRI 90 In fact, the first step of the proposed model is to generate a random set of ENNs as habitats and initialize randomly weights and biases values as habitants. The second step is to calculate MSE for each ENN by Eq. (1) to distinguish between the best and the worst habitat. The third step is to update and modify the emigration, immigration and mutation rates. After having an idea about the good and poor solutions, we must make the combination between different islands then select some habitats to mutate various habitants. The last step is to select the good solution as elitism for future generations. These steps repeated until satisfaction of the stop condition, which can be the number of iterations or the error rate. Figure 4 presents a conceptual picture of the BBO-ENN. Fig. 4. Conceptual picture of BBO-ENN model. As seen in this figure, there are three habitats (ENNs). Habitat 1 provides a lower HIS, highest emigration and lower immigration. It presents the good solution, so it is more likely to share weights and biases with Habitat 2 and Habitat 3. Whereas, Habitat 2 provides a highest HIS, llower emigration and a highest immigration, it presents the poor solution, then, it is more likely to accept shared features (weights and biases) from Habitat 1 and Habitat 3. Theoretically, the proposed BBO-ENN model can improve the training phase according to the various advantages of emigration and immigration rates, which are evolutionary mechanisms for each habitat, which encourages exploration. Thus, BBO forced not to fall in local optima. BRAIN. Broad Research in June, 2020 Artificial Intelligence and Neuroscience Volume 11, Issue 2 91 In addition, thanks to the migration of the better weights/biases towards the worse ENNs, the error rate MSE (HSI) of ENN (habitat) can be improved during the generation. Not to forget that the mutation mechanism helps each habitat to show the various exploitations mechanisms. Finally, elitism phase helps the proposed method to keep some of the best solutions, which are never lost. After having an idea about the theoretical functionality of the proposed method, in the following section, we will see the results of the practical handling, followed by a comparative study between different algorithms. 4. Experiments To verify the performance of BBO algorithm for training Elman NN, it’s necessary to compare it with PSO, GA, ACO, ES and PBIL over four benchmark problems: Breast Cancer (Wolberg & Mangasarian, 1990), Iris (Fisher, 1936) for classification and Mackey and Glass (1977), and Lorenz Attractor (1963) for time series prediction. The classification datasets based on two performance criteria: (a) MSE value and (b) classification accuracy. In fact, the increase in population size and the number of iterations, could improve the performance of the algorithms, but in this work, we are interested on comparing the six algorithms during a fixed number of iterations. Thus, we are not forced to find the best parameters. Just use the same network parameters such as number of nodes, the value of weight initialization and size of population. In this architecture, the log-sigmoid is used as activation function. For all algorithms, we initialise the habitat randomly in the range [- 10, 10]. The population size is 200 for each dataset. For all the experiments, the performance was computed after 30 executed runs with 300 generations for all the used methods. According to (Shamsuddin, 2004) there is no standard rule for determining the suitable number of hidden nodes. We fixed it based on this theorem "One hidden layer and 2N + 1 hidden neuron sufficient for N inputs". Table 1 show the different number of input, hidden and output node of each datasets. Biogeography-Based Optimization for Weight Optimization in Elman Neural … Habib DHAHRI 92 Table 1. Structure of each datasets Classification datasets Number of input nodes Number of hidden nodes Number of output nodes Iris 4 9 3 Breast Cancer 9 19 1 Mackey Glass 4 9 1 Lorenz 3 7 1 The initial parameters of meta-heuristics algorithms fixed in table 2; it shows various initialization settings for the optimization methods. All the parameters are chosen based on literature used value. Table 2. Parameters settings of algorithms Method Parameter Value BBO Max immigration/emigration Mutation immigration bounds per gene 1 0.005 [0,1] ACO Initial pheromone (τ0) Pheromone update constant (Q) Pheromone constant (q0) Global Pheromone decay rate (pg) Local Pheromone decay rate (pt) Pheromone sensitivity (α) Visibility sensitivity (β) 1e -06 20 1 0.9 0.5 1 5 GA Selection mechanism Crossover probability Mutation probability Roulette wheel 1 0.01 PSO Cognitive constant (c1) Social constant (c2) Inertia weight (w) 1 1 0.3 ES Λ Σ 10 1 PBIL Learning rate Elitism parameter Mutation 0.05 1 0.1 BRAIN. Broad Research in June, 2020 Artificial Intelligence and Neuroscience Volume 11, Issue 2 93 A. Breast Cancer This dataset was obtained from the UCI Machine Learning Repository. This dataset contains 699 instances and 9 attributes with 458 benign and 241 malignant instances. The first 599 patterns are used for training phase, and remaining for testing. The outputs convergences of different algorithms are presented in Figure 5. Table 3 presents the experimental results of different algorithms. Table 3. Experimental results for Breast Cancer dataset Algorithms MSE error Accuracy BBO 0.0024175 99.99 GA 0.0025149 98.45 PSO 0.0043705 94.50 ACO 0.0073633 76.25 ES 0.0062843 73.00 PBIL 0.032001 03.99 From the table 3, it can be seen that the MSE value for our BBO- ENN is less than PSO, GA, ACO, ES and PBIL algorithms, which demonstrates the efficacy of BBO-ENN for data classification. The proposed algorithm achieves the small MSE (0.0024175) and the highest accuracy with 99.99. Meanwhile, the other methods (PSO, ACO, ES, and PBIL) converge with large MSE and lower accuracy. Whereas, the MSE value of GA is closer to MSE BBO. As shown in Figure 5, the BBO technique has a faster and lower convergence curves among all the methods for Breast Cancer. From the simulation results, the BBO algorithm proves its superiority in terms of MSE and accuracy. Biogeography-Based Optimization for Weight Optimization in Elman Neural … Habib DHAHRI 94 Fig. 5. Convergence Breast Cancer Elman RNN. B. Iris dataset The Iris Plants data set contains 150 samples and four attributes (sepal length, sepal width, petal length, petal width). It has actually, three major classes: Setosa, Versicolour and Virginica. In this experiment, we used four inputs, nine hidden nodes and three outputs . The first 150 patterns are selected for training phase, and remaining 150 for testing. Table 4 presents the results of training algorithms. It illustrates the comparison between performances of BBO-ENN with GA, PSO, ACO, ES and PBIL algorithms. From table 4, it can be easier to show that the proposed algorithm achieves with a lower MSE (0.017371) and higher accuracy (93.96) Table 4. Experimental results for Iris dataset Algorithms MSE error Accuracy BBO 0.017371 93.96 GA 0.029781 91.22 PSO 0.21166 66.53 ACO 0.40017 38.56 ES 0.30877 66.00 PBIL 0.1116 57.66 BRAIN. Broad Research in June, 2020 Artificial Intelligence and Neuroscience Volume 11, Issue 2 95 Fig. 6. Convergence Iris Elman RNN. Figure 6 shows the convergence of each algorithm, and illustrates the success of BBO compared to the other methods. From these results, the BBO algorithm achieves with higher performance. C. Mackey–Glass time series prediction The Mackey-Glass time series prediction is defined using the following equation: ( )( ) ( ) 1 ( ) c ax tdx t bx t dt x t − = − + − (7) In our work, the input of ENN with four data points: x(t), x(t-6), x(t-12) and x(t-18). The output is defined in equation 8: ( ) ( ) ( ) ( ) ( )( )6 , 6 , 12 , 18 x t f x t x t x t x t+ = − − − (8) The first 500 samples are selected for training phase, and remaining 500 for testing. After 300 generations of the training process, the outputs convergences of different algorithms are presented in Figure 7. Table 5 shows the comparison of MSE error of the BBO-ENN to the other used meta-heuristics algorithms. In this experiment, the BBO and GA algorithm achieves with the smallest MSE error of 0.009702. In some implementation, the MSE-GA equal to MSE-BBO. However, BBO-ENN is still promising in cases where the convergence to the best solution is faster than the other methods. Biogeography-Based Optimization for Weight Optimization in Elman Neural … Habib DHAHRI 96 Table 5. Experimental results for Mackey-Glass dataset Algorithms MSE training error MSE Test error BBO 0.00925 0.009702 PSO 0.01043 0.071985 GA 0.01130 0.009851 ACO 0.04851 0.093598 ES 0.04748 0.082876 PBIL 0.02293 0.091885 Fig. 7. Convergence Mackey-Glass Elman RNN. D. Lorenz attractor The Lorenz system was given by the following differential equations: ( ) ( )( ) ( ) ( ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ) z d dt y t x t t dt x t y t z t dt x t y t x t dy t x t dz t − − − − = = = (9) Where and are positive real parameters. BRAIN. Broad Research in June, 2020 Artificial Intelligence and Neuroscience Volume 11, Issue 2 97 In these three equations, the component x denotes the used time series. In this work, the input of ENN is defined by x(t), x(t-1), x(t-2). The output is presented in equation 10: ( ) ( ) ( ) ( )( )1 , 1 , 2x t f x t x t x t+ = − − (10) The first 500 samples from 1000 simulation data points are chosen for training phase, and the remaining 500 for testing. The convergence curve of each algorithm summarized in Figure 8. Table 6 illustrates the MSE training error and MSE testing error of each algorithm. From this table, it can be seen that BBO achieves with a lower MSE error for both training and testing phase (0.14278, 0.241291). However, the other algorithms: PSO, GA, ACO, ES and PBIL have MSE of 0.21271, 0.48498, 1.27504, 0.29753, and 0.28753, quite larger than BBO. Similarly, Figure 8 represents the MSE convergence for Lorenz problem. This Figure demonstrates that the proposed BBO-ENN have better result than the other algorithms. The BBO-ENN shows again its efficiency for the prediction of Lorenz time series. Table 6. Experimental results for Lorenz dataset Algorithms MSE training error MSE test error BBO 0.14278 0.241291 PSO 0.21271 0.253047 GA 0.48498 0.242105 ACO 1.27504 0.962869 ES 0.29753 0.293404 PBIL 0.28753 0.283424 Biogeography-Based Optimization for Weight Optimization in Elman Neural … Habib DHAHRI 98 Fig. 8. Convergence Lorenz Elman RNN. During the previous experiments, the BBO proves its good performances compared to the other applied algorithms. The obtained results can be explained based on the philosophy of the BBO technique over the other evolutionary algorithms. During the generation, the BBO solutions are maintained depending on their emigration rate. At each iteration, the BBO improves the habits by changing some features. The poor solutions can be improved from the good solutions by sharing theirs SIVs (attributes). However, in GA, ACO, PBIL techniques, the worse solutions are discarded from the populations and only the best candidate’s solutions are maintained. Thus, the population evolves using the elite solutions. BBO also clearly similar to PSO and DE approach in maintaining solutions. The solution learns from theirs neighbours and evolves based on the movements of the around particles. Conclusion In this work, a Biogeography-Based Optimization (BBO) algorithm proposed to train Elman Neural Network (ENN) for four benchmark problems. The experiment results show that the BBO-ENN model can effectively classify the data such as Breast Cancer and Iris data sets. The method applied to Mackey Glass and Lorentz equations, which produce chaotic time series. Statistical results show that the proposed algorithm BRAIN. Broad Research in June, 2020 Artificial Intelligence and Neuroscience Volume 11, Issue 2 99 outperforms the GA, PSO, EA, ACO and PBIL algorithms. Performance and success of BBO-ENN is mainly due to the use of the Biogeography- Based Optimization (BBO) algorithm, which can successfully optimize the weight parameter of Elman Neural Network. BBO-ENN makes a success in the convergence time and a great performance in avoiding local minima. Although, BBO has shown a good performance when being applied to classification and time series prediction, BBO inherently lacks exploration ability to increase the diversity of habitats, which lead to slow down the convergence of the algorithm. The expansion of applying BBO algorithms in many types of problems open several research areas. One suggested research for the future work is to automating parameter tuning. Additional study is to apply the BBO algorithm for complicated problems. Acknowledgment The authors extend their appreciation to the Deanship of Scientific Research and RSSU at King Saud University for funding this work through research group no. RG-1438–071. References Baldi, P. & Pollastri, G. (2003). The Principled Design of Large-Scale Recursive Neural Network Architectures-DAG-RNNs and the Protein Structure Prediction Problem. Journal of Machine Learning Research 4(4), 575-602. https://doi.org/10.1162/153244304773936054 Cao, Q. Ewinga, B. T. & Thompson, M. A. (2012). Forecasting wind speed with recurrent neural networks. European Journal of Operational Research, 221(1), 148-154. https://doi.org/10.1016/j.ejor.2012.02.042 Chandra, R. (2015). Competition and Collaboration in Cooperative Coevolution of Elman Recurrent Neural Networks for Time-Series Prediction. IEEE Trans Neural Netw Learn Syst, 26(12), 3123-3136. https://doi.org/10.1109/TNNLS.2015.2404823 Elman, J. L. (1990). Finding structure in time. Cognitive science, 4(2), 179-211. https://doi.org/10.1016/0364-0213(90)90002-E Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of human genetics, 7(2), 179-188. https://doi.org/10.1111/j.1469- 1809.1936.tb02137.x Hopfield, J. J. (1982). Neural networks and physical systems with emergent collective computational abilities. Proceedings of the National Academy of Sciences of the United States of America, 79(8), 2554–2558. https://doi.org/10.1073/pnas.79.8.2554 https://doi.org/10.1162/153244304773936054 https://doi.org/10.1016/j.ejor.2012.02.042 https://doi.org/10.1109/TNNLS.2015.2404823 https://doi.org/10.1016/0364-0213(90)90002-E https://doi.org/10.1111/j.1469-1809.1936.tb02137.x https://doi.org/10.1111/j.1469-1809.1936.tb02137.x https://doi.org/10.1073/pnas.79.8.2554 Biogeography-Based Optimization for Weight Optimization in Elman Neural … Habib DHAHRI 100 Hordri, N. F., Yuhaniz, S. S. & Nasien, D. (2013). A Comparison Study of Biogeography based. Optimization for Optimization Problems. International Journal of Advances in Soft Computing and its Applications, 5(1), 1-16. Hori, T., Hori, C., Watanabe, S. & Hershey, J. R. (2016). Minimum word error training of long short-term memory recurrent neural network language models for speech recognition. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, 5990-5994. https://doi.org/10.1109/ICASSP.2016.7472827 Jordan. M. I. (1997). A parallel distributed processing approach. Advances in Psychology, 121, 471-495. https://doi.org/10.1016/S0166-4115(97)80111-2 Karaboga, D., & Basturk, B. (2007). A powerful and efficient algorithm for numerical function optimization: artificial bee colony (ABC) algorithm. Journal of Global Optimization, 39, 459–471. https://doi.org/10.1007/s10898-007-9149-x Kawada, K., Yamamoto, T., & Mada, Y. (2004). A design of evolutionary recurrent neural-net based controllers for an inverted pendulum. 5th Asian Control Conference (IEEE Cat. No.04EX904), 3, 1419-1422. Kechriotis, G., Zervas, E. & Manolakos, E. S. (1994). Using recurrent neural networks for adaptive communication channel equalization. IEEE Transactions on Neural Networks, 5(2), 267-278. https://doi.org/10.1109/72.279190 Kennedy, J. & Eberhart, R. (1995). Particle swarm optimization. Proceedings of ICNN'95 - International Conference on Neural Networks, Perth, WA, Australia, 4, 1942-1948. https://doi.org/10.1109/ICNN.1995.488968 Koskela, T., Lehtokangas, M., Saarinen, J. & Kaski, K. (1996). Time Series Prediction with Multilayer Perceptron, FIR and Elman Neural Networks. Proceedings of the World Congress on Neural Networks, 491-496. https://pdfs.semanticscholar.org/82c8/e5d0cd4a7467f7f54ad823b2136b9 73eeb6e.pd Kosko, B. (1988). Bidirectional associative memories. IEEE Transactions on Systems, Man, and Cybernetics, 18(1), 49-60. https://doi.org/10.1109/21.87054. Li, X. L., Shao, Z. J. & Qian, J. X. (2002). An Optimizing Method Based on Autonomous Animats: Fish-swarm Algorithm. Systems Engineering - Theory & Practice, 22(11), 32-38. https://doi.org/10.12011/1000-6788(2002)11-32 Lorenz, E. N. (1963). Deterministic Nonperiodic Flow. Journal of the Atmospheric Sciences, 20, 130–141. https://doi.org/10.1175/1520- 0469(1963)020<0130:DNF>2.0.CO;2 Ma, H., Fei, M., Simon, D., & Chen, Z. (2015). Biogeography-based optimization in noisy environments. Transactions of the Institute of Measurement and Control, 37(2), 190–204. https://doi.org/10.1177/0142331214537015 https://doi.org/10.1109/ICASSP.2016.7472827 https://doi.org/10.1016/S0166-4115(97)80111-2 https://doi.org/10.1007/s10898-007-9149-x https://doi.org/10.1109/72.279190 https://doi.org/10.1109/ICNN.1995.488968 https://pdfs.semanticscholar.org/82c8/e5d0cd4a7467f7f54ad823b2136b973eeb6e.pd https://pdfs.semanticscholar.org/82c8/e5d0cd4a7467f7f54ad823b2136b973eeb6e.pd https://doi.org/10.1109/21.87054 https://doi.org/10.12011/1000-6788(2002)11-32 https://doi.org/10.1175/1520-0469(1963)020%3c0130:DNF%3e2.0.CO;2 https://doi.org/10.1175/1520-0469(1963)020%3c0130:DNF%3e2.0.CO;2 https://doi.org/10.1177/0142331214537015 BRAIN. Broad Research in June, 2020 Artificial Intelligence and Neuroscience Volume 11, Issue 2 101 Mackey, M. C. & Glass, L. (1977). Oscillation and chaos in physiological control systems. Science, 197(4300), 287-289. https://doi.org/10.1126/science.267326 Marvuglia, A. & Messineo, A. (2012). Using Recurrent Artificial Neural Networks to Forecast Household Electricity Consumption. Energy Procedia, 14, 45-55. Mehrabian, A. R., & Lucas, C. (2006). A novel numerical optimization algorithm inspired from weed colonization. Ecological Informatics, 1(4), 355-366. https://doi.org/10.1016/j.ecoinf.2006.07.003 Mirjalili, S., Mirjalili, S. & Lewis, A. (2014). Let a biogeography-based optimizer train your multi-layer perceptron. Information Sciences, 269, 188-209. https://doi.org/10.1016/j.ins.2014.01.038 Müller-Navarra, M., Lessmann, S. & Voß, S. (2015). Sales Forecasting with Partial Recurrent Neural Networks: Empirical Insights and Benchmarking Results. 48th Hawaii International Conference on System Sciences, Kauai, HI, 1108-1116. https://doi.org/10.1109/HICSS.2015.135 Nawi, N. M. & Khan, A. & Rehman, G., Syed, M., Chiroma, H. & Herawan, T. (2014). Weight Optimization in Recurrent Neural Networks with Hybrid Metaheuristic Cuckoo Search Techniques for Data Classification. Mathematical Problems in Engineering. https://doi.org/10.1155/2015/868375 Osman, I. H., & Laporte, G. (1996). Metaheuristics: A bibliography. Annals of Operations Research, 63, 511–623. https://doi.org/10.1007/BF02125421 Palafox, L. & Iba, H. (2012). On the use of Population Based Incremental Learning to do Reverse Engineering on Gene Regulatory Networks. IEEE Congress on Evolutionary Computation, Brisbane, QLD, 1-8. https://doi.org/10.1109/CEC.2012.6256580 Pan, W-S. (2012). A new fruit fly optimization algorithm: taking the financial distress model as an example. Knowledge-Based Systems, 26, 69-74. https://doi.org/10.1016/j.knosys.2011.07.001 Passino, K. M. (2002). Biomimicry of bacterial foraging for distributed optimization and control. IEEE Control Systems Magazine, 22(3), 52-67. https://doi.org/10.1109/MCS.2002.1004010 Pham, D. T. & Karaboga, D. (1999). Training Elman and Jordan networks for system identification using genetic algorithms. Artificial Intelligence in Engineering, 13(2), 107-117. https://doi.org/10.1016/S0954-1810(98)00013- 2 Robinson, T. & Fallside, F. (1991). A recurrent error propagation network speech recognition system. Computer Speech & Language, 5(3), 259-274. https://doi.org/10.1016/0885-2308(91)90010-N Rodan, A., Faris, H. & Alqatawna, J. (2016). Optimizing Feedforward Neural Networks Using Biogeography Based Optimization for E-Mail Spam https://doi.org/10.1126/science.267326 https://doi.org/10.1016/j.ecoinf.2006.07.003 https://doi.org/10.1016/j.ins.2014.01.038 https://doi.org/10.1109/HICSS.2015.135 https://doi.org/10.1155/2015/868375 https://doi.org/10.1007/BF02125421 https://doi.org/10.1109/CEC.2012.6256580 https://doi.org/10.1016/j.knosys.2011.07.001 https://doi.org/10.1109/MCS.2002.1004010 https://doi.org/10.1016/S0954-1810(98)00013-2 https://doi.org/10.1016/S0954-1810(98)00013-2 https://doi.org/10.1016/0885-2308(91)90010-N Biogeography-Based Optimization for Weight Optimization in Elman Neural … Habib DHAHRI 102 Identification. International Journal of Communications, Network and System Sciences, 9, 19-28. https://doi.org/10.4236/ijcns.2016.91002 Senjyu, T., Yona, A., Urasaki, N. & Funabashi, T. (2006). Application of Recurrent Neural Network to Long-Term-Ahead Generating Power Forecasting for Wind Power Generator. IEEE PES Power Systems Conference and Exposition, Atlanta, GA, 1260-1265. https://doi.org/10.1109/PSCE.2006.296487 Shamsuddin, M. (2004). Lecture note advanced artificial intelligence: Number of hidden neurons [Unpublished Doctoral Thesis]. Universiti Teknologi Malaysia. Sharmaa, R., Kumar, V., Gaur, P. & Mittal, A. P. (2016). An adaptive PID like controller using mix locally recurrent neural network for robotic manipulator with variable payload. ISA Transactions, 62, 258-267. https://doi.org/10.1016/j.isatra.2016.01.016 Simon, D. (2008). Biogeography-based optimization. 2009 IEEE International Conference on Systems, Man and Cybernetics, San Antonio, TX (702-713). https://doi.org/10.1109/TEVC.2008.919004 Simon, D., Ergezer, M. & Du, D. (2009). Population distributions in biogeography- based optimization algorithms with elitism. 2009 IEEE International Conference on Systems, Man and Cybernetics, San Antonio, TX, 991-996. https://doi.org/10.1109/ICSMC.2009.5346058 Simon, D., Rarick, R., Ergezer, M. & Du, D. (2011). Analytical and numerical comparisons of biogeography-based optimization and genetic algorithms. Information Sciences,181(7), 1224-1248. https://doi.org/10.1016/j.ins.2010.12.006 Storn, R., & Price, K. (1997). Differential Evolution – A Simple and Efficient Heuristic for global Optimization over Continuous Spaces. Journal of Global Optimization, 11, 341–359. https://doi.org/10.1023/A:1008202821328 Venayagamoorthy, G. A., Welch, R. L. & Ruffing, S. M. (2009). Comparison of Feedforward and Feedback Neural Network. Architectures for Short Term Wind Speed Prediction. IJCNN'09: Proceedings of the 2009 international joint conference on Neural Networks pp. 3141–3146. https://doi.org/10.1109/IJCNN.2009.5179034 Wolberg, W. H., & Mangasarian, O. L. (1990). Multisurface method of pattern separation for medical diagnosis applied to breast cytology. Proceedings of the National Academy of Sciences of the United States of America, 87(23), 9193–9196. https://doi.org/10.1073/pnas.87.23.9193 Xiao, P., Venayagamoorthy, G. K. & Corzine, K. A. (2007). Combined Training of Recurrent Neural Networks with Particle Swarm Optimization and Backpropagation Algorithms for Impedance Identification. IEEE Swarm Intelligence Symposium, Honolulu, HI, 9-15. https://doi.org/10.1109/SIS.2007.368020 https://doi.org/10.4236/ijcns.2016.91002 https://doi.org/10.1109/PSCE.2006.296487 https://doi.org/10.1016/j.isatra.2016.01.016 https://doi.org/10.1109/TEVC.2008.919004 https://doi.org/10.1109/ICSMC.2009.5346058 https://doi.org/10.1016/j.ins.2010.12.006 https://doi.org/10.1023/A:1008202821328 https://doi.org/10.1109/IJCNN.2009.5179034 https://doi.org/10.1073/pnas.87.23.9193 https://doi.org/10.1109/SIS.2007.368020 BRAIN. Broad Research in June, 2020 Artificial Intelligence and Neuroscience Volume 11, Issue 2 103 Yang, X-S. (2009). Firefly algorithms for multimodal optimization. International Symposium on Stochastic Algorithms, 169-178. https://doi.org/10.1007/978-3- 642-04944-6_14 Yang, X. S. (2010). A New Metaheuristic Bat-Inspired Algorithm. In J. R. González, D.A. Pelta, C. Cruz, G. Terrazas, & N. Krasnogor (Eds.), Nature Inspired Cooperative Strategies for Optimization (NICSO 2010). Studies in Computational Intelligence, vol 284. Springer. Yang, X-S. & Deb, S. (2009). Cuckoo search via Levy flights. World Congress on Nature & Biologically Inspired Computing (NaBIC), Coimbatore, 210-214. https://doi.org/10.1109/NABIC.2009.5393690 Yoo, D. G., & Kim, J. H. (2014). Meta-heuristic algorithms as tools for hydrological science. Geoscience Letters, 1, 4. https://doi.org/10.1186/2196-4092-1-4 Zhang, X., Kang, Q., Tu, Q., Cheng, J. & Wang, X. (2019). Efficient and merged biogeography-based optimization algorithm for global optimization problems. Soft Computing, 23, 4483–4502. https://doi.org/10.1007/s00500- 018-3113-1 Zhipeng, Y. Minfang, P., Hao, H. & Xianfeng, L. (2012). Fault Locating of Grounding Grids Based on Ant colony Optimizing Elman Neural Network. 2012 Third International Conference on Digital Manufacturing & Automation, GuiLin, 406-409. https://doi.org/10.1109/ICDMA.2012.97 https://doi.org/10.1007/978-3-642-04944-6_14 https://doi.org/10.1007/978-3-642-04944-6_14 https://doi.org/10.1109/NABIC.2009.5393690 https://doi.org/10.1186/2196-4092-1-4 https://doi.org/10.1007/s00500-018-3113-1 https://doi.org/10.1007/s00500-018-3113-1 https://doi.org/10.1109/ICDMA.2012.97