Microsoft Word - ETASR_V12_N5_pp9196-9202 Engineering, Technology & Applied Science Research Vol. 12, No. 5, 2022, 9196-9202 9196 www.etasr.com Pandya et al.: Levy Enhanced Cross Entropy-based Optimized Training of Feedforward Neural Networks Levy Enhanced Cross Entropy-based Optimized Training of Feedforward Neural Networks Kartik Pandya M &V Patel Department of Electrical Engineering FTE, CSPIT, CHARUSAT Changa, India kartikpandya.ee@charusat.ac.in Dharmesh Dabhi M &V Patel Department of Electrical Engineering FTE, CSPIT, CHARUSAT Changa, India harmeshdabhi.ee@charusat.ac.in Pratik Mochi M &V Patel Department of Electrical Engineering FTE, CSPIT, CHARUSAT Changa, India vipul.rajput@djmit.ac.in Vipul Rajput Department of Electrical Engineering Dr. Jivraj Mehta Institute of Technology Mogar, India vipulrajput16986@gmail.com Received: 11 July 2022 | Revised: 27 July 2022 | Accepted: 1 August 2022 Abstract-An Artificial Neural Network (ANN) is one of the most powerful tools to predict the behavior of a system with unforeseen data. The feedforward neural network is the simplest, yet most efficient topology that is widely used in computer industries. Training of feedforward ANNs is an integral part of an ANN-based system. Typically an ANN system has inherent non-linearity with multiple parameters like weights and biases that must be optimized simultaneously. To solve such a complex optimization problem, this paper proposes the Levy Enhanced Cross Entropy (LE-CE) method. It is a population-based meta- heuristic method. In each iteration, this method produces a "distribution" of prospective solutions and updates it by updating the parameters of the distribution to obtain the optimal solutions, unlike traditional meta-heuristic methods. As a result, it reduces the chances of getting trapped into local minima, which is the typical drawback of any AI method. To further improve the global exploration capability of the CE method, it is subjected to the Levy flight which consists of a large step length during intermediate iterations. The performance of the LE-CE method is compared with state-of-the-art optimization methods. The result shows the superiority of LE-CE. The statistical ANOVA test confirms that the proposed LE-CE is statistically superior to other algorithms. Keywords-artificial neural networks; cross entropy method; feedforward neural networks; Levy step; training I. INTRODUCTION The Artificial Neural Network (ANN) [1] is the one of the most popular Artificial Intelligence methods. It is inspired from the communication and computation abilities of the human brain and it mimics the learning techniques of human brain to find out relationships between the input and the output (target) variables of a test system. The human brain consists of millions of neurons which are interconnected in a complex network which takes input signal to perform voice recognition, image classifications, etc. at remarkable speed and accuracy. Similarly, an ANN may consist of many neurons which are subjected to input signals through the connection links. Every connection link has an associated weight, which is multiplied with the signal. Weights are the control variables which are used to solve the optimization problem. Weights and biases are updated in every iteration to finally obtain their optimal values which will minimize the Mean Square Error (MSE) function. This procedure is known as the training (learning) of an ANN. A. Related Work Feedforward Neural Networks (FNNs) with two layers are a widely used ANN topology [2]. It has been proved that two- layer FNNs can approximate any nonlinear or linear function with a good accuracy [3]. Training is an integral part of FNNs. Various optimization methods have been suggested in the literature to train FNN. Back Propagation (BP) is a popular gradient-based classical optimization method to train FNNs. But it is susceptible to slow convergence [4] and may get trapped into local minima [5]. Newton’s method [6] is another classical method which has quadratic convergence that largely depends upon the choice of initial starting point (guess). A wrong initial guess will lead to a local optima of the problem under consideration. The last two decades, population-based meta-heuristic methods that solve the FNN training problem effectively have emerged. Authors in [7] proposed the use of ANNs to detect the vibration of the rotor shaft of a gas turbine. Authors in [8] suggested the use of Genetic Algorithm (GA) to precisely recognize sign gesture using feature extraction, but mutation strategy makes GA very time consuming and it is preferred for binary solution sets. Also, it is vulnerable to premature convergence. Authors in [9] proposed a new approach to address the optimal design of a FNN using self- adaptive penalty functions. Authors in [10] proposed a Particle Swarm Optimization (PSO)-based neuro-fuzzy model to Corresponding author: Kartik Pandya Engineering, Technology & Applied Science Research Vol. 12, No. 5, 2022, 9196-9202 9197 www.etasr.com Pandya et al.: Levy Enhanced Cross Entropy-based Optimized Training of Feedforward Neural Networks enhance dynamic voltage stability of a wind connected grid. Authors in [11] proposed a PSO-powered back-propagation neural network load-shedding strategy in the post-fault condition in a microgrid. Another AI method is Gravitational Search Algorithm (GSA) [12], which has been inspired from the law of gravity and mass interactions. Even though it is a simple method, the unbalance between the application of gravitational law and mass interactions may create premature convergence. Differential Evolutionary (DE) method [13] is another powerful method which uses mutation, crossover, and selection operators on various vectors to get optimal solutions, but the selection of crossover rate greatly affects its convergence. PSO [14] is one of the most popular meta- heuristic methods, as it is easily implementable and it has less parameters to be tuned. B. Aims and Objectives The related work (literature survey) reveals that many classical optimization methods suffer from premature convergence, whereas artificial intelligence methods are population-based with generated randomly initial solutions, and as a result, in each simulation run it gives different optimal solutions. So, the aim and objective of this research is to suggest a proper solution methodology which trains an ANN more effectively and with better accuracy. This paper proposes the Levy Enhanced Cross Entropy (LE-CE) optimization method, which is an extension of the Cross Entropy (CE) method. The contributions of this paper are summarized below. C. Research Outline • The LE-CE method is proposed for the optimization of the weights and biases of the ANN with the aim to minimize the MSE function. • The incorporation of Levy flight increases the global exploration capability of the CE method, which improves the quality of the solution. • The LE-CE method has fewer parameters to be tuned, so it is a fast method. • The practical Iris classification system is used to check the effectiveness of the LE-CE method. • The performance of the LE-CE method is compared with the WCCI 2018 award winning EVDEPSO [15] and GECCO 2019 award winning HL_PS_VNSO [16] computational intelligence methods. • The LE-CE method outperformed the compared methods in terms of optimal solutions. • ANOVA statistical test and Tukey’s HSD test also proved that the proposed method is statistically different from the other compared methods. II. LEVY ENHANCED CROSS ENTROPY METHOD FOR TRAINING FNNs AND ENCODING STRATEGY CE method was proposed by Rubinstein [17]. It is a population-based meta-heuristic optimization method similar to the Differential Evolutionary method. However, unlike traditional meta-heuristic methods which directly update prospective solutions (particles) to obtain sub-optimal solutions, this method produces a distribution of prospective solutions and updates it by updating the parameters like mean and standard deviation of each dimension to obtain sub-optimal solutions. As a result, it decreases the probability of getting stuck into local minima. It has a very few parameters to be tuned and it can be easily executed. The basics of LE-CE method are explained below. The population of particles is randomly generated and they obey the probability distribution function (pdf) f (.;Φ), where Φ is the vector of parameters which are to be optimized. Generally, f(.;Φ) is the Gaussian distribution parameterized by its mean m and variance σ 2 , i.e. ( )2,m σΦ = . Secondly, a threshold value (χ) of the fitness function is selected and only those particles whose fitness values are less than the set threshold value, i.e. f(x)<χ are considered in the subsequent iterations. Such particles are known as elite particles μe. Then, the new parameterized distribution function ( ).; nf φ with elite particles is updated in such a way that it coincides the target distribution function ( ).;f φ ∗ by minimizing the Kullback- Leibler (KL) divergence. This procedure finishes one iteration. In the subsequent iterations, a family of distribution functions ( )( ) ( )1 *.; ... .;f fφ φ are produced in accordance with χ(1)....χ(*) to reach the sub-optimal distribution function f(.;φ * ). The following section shows the step-by-step implementation of LE-CE. 1) Step 1: Initialization of Mean and Standard Deviation for Each Dimension Assume iteration iter=0, Total no. of iterations 500 max iter = , No. of particles N, Dimension (control variables) of the problem D, elite particles μe (20% of N), smoothing parameters s α and s β , ( )iter D e m ∈R is the mean value of the search distribution for each dimension at iteration iter, ( )iterσ +∈R is the standard deviation at iteration iter. min x and max x are the minimum and maximum limits of the D th dimension particle. Mean and standard deviation of population are initialized as: ( )(0) 0.5 *e min maxm x + x= (1) ( ) ( ) 0 0.25 * max min x - xσ = (2) 2) Step 2: Generate the Population of Particles The generation of the population of particles from the sampling distribution ( ) ( )( )2Normal ,iter iterem σ occurs as: � � (������) = �� (����) + (����)randn() for � = 1, . . . . . ,� (3) where N is the population size, ( 1)iter D i x + ∈R is the i th particle obtained at iteration ( )1iter + , ( )iter Dem ∈R is the mean value of the search distribution for each dimension at iteration iter, ( )gσ +∈R is step-size (standard deviation) at iteration iter, Engineering, Technology & Applied Science Research Vol. 12, No. 5, 2022, 9196-9202 9198 www.etasr.com Pandya et al.: Levy Enhanced Cross Entropy-based Optimized Training of Feedforward Neural Networks randn() is a normally distributed random variable with parameters Normal(0,1). 3) Step 3: Fitness Function Evaluations The fitness values of the whole population are determined as follows: The FNN consists of i n input nodes, i h hidden nodes, and t o output nodes. In each iteration of learning, the output of each hidden nodes is calculated using (4): ( ) 1 1 / 1 e x p * , 1, 2 ..... in k jk j k j i f y w x k h θ = = + − − = (4) where 1 in k jk j k j y w x θ = = − , jkw is the connection weight from the j th node from the input layer to the k th node in the hidden layer, θk is the bias of the k th hidden node, and xj is the j th input. The final output is calculated following the output of hidden nodes as given by (5): ( ) , 1 * 1, 2,...., ih t lk k l t j o w f y l oθ = = − = (5) where lk w is the connection weight between the kth hidden node to the l th output node and θl is the bias of the l th output node. Eventually, the mean square error (el) is calculated from: ( ) 1 ih l l l tj j j e o d = = − (6) 1 t l l e e t= = (7) where t represents the training samples and l j d is the desired output of the j th input when the l th training sample is considered. So, the fitness function of the j th training sample can be defined as follows: Fitness(xj) = e(xj) (8) 4) Step 4: Ranking of Fitness Functions Sort (rank) all fitness function values in ascending order as given in (9): ( ) ( )1 2 ... ( )f x f x f N< < < (9) where ( ) j f x is the fitness of the th j particle and 1 x is a Global Best ( )BestG particle having the best (minimum) fitness among all particles. Consider the top best 20%-30% of the particles as elite particles. 5) Step 5: Updating Mean and Standard Deviation of the Elite Particles The mean ( )( )1itertm µ + of the selected elite particles is found by: ( ) ( )1 1 : 1 1 eiter iter t j N je m x µ µ µ + + = = (10) where ( )1 : iter j N x + is the th j best particle among the whole population at iter+1iteration. The index j:N denotes the index of the th j rank particle. Standard deviation ( )( )1itert µσ + (step-length) of elite particles is found by (11): ( ) ( ) ( )( ) 2 1 1 1 : 1 1 1 e iter iter iter t j N e je x m µ µ σ µ + + + = = − − (11) 6) Step 6: Apply Smoothing of Mean and Standard Deviation of the Whole Population As elite particles are in the vicinity of optimal solutions, more smoothing (weightage) is applied to their mean value as compared to mean of whole population as per (12): ( ) ( ) ( ) ( ) 1 1 1 1 iter iter iter e t e m m m µ α α + + + = + − (12) where 0.9, 0.1α β= = are the smoothing parameters. Similarly, the standard deviation of the elite particles should be updated to a very small extent as they lie near the optimal solutions. So, less smoothing is applied to them as compared to the standard deviation of all particles which requires more exploration of the search space and thus more smoothing as given in (13): ( ) ( ) ( ) ( ) 1 1 1 1 iter iter iter t tuµ σ βσ β σ + + + = + − (13) 7) Step 7: Apply Levy Flight for Global Exploration Levy flight [18-19] is a random walk whose length is derived from the Levy distribution as described in (14), where u and v are obtained from the normal distribution. Many species (e.g. swordfish and silky sharks) use Levy flights to search for food. The function of Levy step is to efficiently explore the search space by taking a large step size during the intermediate iterations to obtain the global optimum solution. 1 / L evy-step u v β = (14) where: u = rand(0,1)*Sigma (15) v = rand(0,1) (16) 1 / ( 3 ) (1 ) * sin( * ) S igm a [(1 ) / 2 ] * * 2 β β β β β β − Γ + Π = Γ + (17) where (=3/2) is a Levy coefficient and: β Engineering, Technology & Applied Science Research Vol. 12, No. 5, 2022, 9196-9202 9199 www.etasr.com Pandya et al.: Levy Enhanced Cross Entropy-based Optimized Training of Feedforward Neural Networks ( ) ( )1 1iter iter e e m m Levy step + + = + − (18) 8) Step 8: Increment of Iteration Count Set : 1iter iter= + and go to Step 2 until the maximum number of iterations is reached. The detail flowchart of the LE-CE method to train FNNs is shown in Figure 1. Fig. 1. Flowchart of the LE-CE method for training FNNs. III. ANN ENCODING STRATEGY Figure 2 shows the typical structure of an ANN. It consists of 2 input nodes, 3 hidden nodes, and 1 output node. For the training of ANN, the most popular matrix encoding method is used in this paper as follows: (:, ) , , 1, 1 2 2 T particle i w B w B = (20) 13 23 361 , , , 1 14 24 1 2 2 46 2 4 15 25 3 56 w w w T w w w B w w B w w w θ θ θ θ = = = = (21) where w1 is the weight matrix of the hidden layer, B1 is the bias matrix of the hidden layer, w2 is the weight matrix of the output layer, w2 T is the transpose of the w2 matrix, and B2 is the bias matrix of the hidden layer. Fig. 2. ANN typical structure. IV. SIMULATION RESULTS AND DISCUSSION The performance of the proposed LE-CE algorithm is not tested on a small system because, as per the no-free lunch theorem [20], the average performance of all the optimization methods remains the same for small test systems which consist of a small number of control variables. So, in order to check the effectiveness of the proposed LE-CE algorithm, it is tested on the practical Iris classification [21] problem and its output results, convergence, and statistical results are compared with WCCI 2018 international award winning EVDEPSO [15] and GECCO 2019 international award winning HL_PS_VNSO [16] computational intelligence methods. Both these methods had secured 2 nd ranks in the aforementioned conference competitions. EVDEPSO (Enhanced Velocity Differential Evolution Particle Swarm Optimization), is a meta-heuristic iterative method which has been inspired from the behavior of bird flocking and fish schooling. Mathematically, each bird represents the prospective solution of the optimization problem and its position is adjusted depending upon the position of the best bird and their past best positions in every iteration to obtain optimal solutions. The process is continued until no significant improvement in the optimal solutions is observed. The detail theory of EVDEPSO is available in [15]. HL_PS_VNSO (Hybrid Levy Particle Swarm Variable Neighbourhood Search Optimization) algorithm is a hybridization of PSO and variable neighborhood search algorithm. Its key elements are Perception, Cooperation, and the Levy step. It consists of a Perception term for the local exploitation of search space in which a particle follows its personal best position, a Cooperation term in which the particle follows the global best particle with mutated weights, and the Levy step to globally explore the search space. Its detail theory is given in [16]. A. Practical Iris Classification Problem The Iris classification problem was used in [21]. The Iris dataset contains 4 features (length and width of sepals and petals) of 3 species of Iris (Iris setosa, Iris virginica, and Iris versicolor) as shown in [22, 23]. The data contains 150 samples. These measures were used initially to create a linear model to classify the species in machine learning. Different hidden nodes such as H=4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, and 15 were set to test the performance of the algorithms. The simulation results have been performed on Matlab 2017a Engineering, Technology & Applied Science Research Vol. 12, No. 5, 2022, 9196-9202 9200 www.etasr.com Pandya et al.: Levy Enhanced Cross Entropy-based Optimized Training of Feedforward Neural Networks environment, Intel CORE i5 with 16 GB RAM. All tested methods find the optimal combinations of the weights and biases which results into minimum error of the FNN. A total of 20 trials were executed as the LE-CE is a meta-heuristic method and as a result in every simulation run it yields different optimal solutions. Finally, the mean recognition rate was obtained from the results of the 20 trials. Table I shows that the proposed LE-CE outperforms the other methods in recognizing the output correctly with different hidden nodes in all cases, due to fact that LE-CE method’s mean and standard deviation updating using smoothing parameters yield better solutions as the elite particles are in the vicinity of the optimal solution (as per (12) and (13)). Also, CE is powered by Levy flight, which has the ability to take larger step size during optimization. As a result, the search space is being explored more efficiently. So, LE-CE yields better solutions as compared to original CE method and other tested algorithms. TABLE I. MEAN RECOGNITION RATE WITH DIFFERENT HIDDEN NODES Hidden nodes (H) LE-CE (%) CE (%) EVDEPSO (%) HL_PS_VNSO (%) 4 99.92 98.32 90.12 89.14 5 100 97.25 91.18 87.17 6 99.45 95.36 92.78 89.35 7 100 96.65 90.07 90.14 8 99.32 93.45 91.95 87.88 9 100 92.32 90.45 83.41 10 99.12 93.85 88.35 87.12 11 99.85 92.65 91.98 83.45 12 100 96.36 89.45 76.3 13 99.85 94.36 90.42 86.32 14 100 93.95 88.01 83.25 15 99.36 94.15 91.05 87.05 It is cleared from Table II that the proposed LE-CE outperforms the other methods in all 12 cases for average MSE, std. dev MSE, and best MSE. Hidden nodes are increased from 4 to 15 in each case. As a result, the complexity of the ANN increases as more mathematical equations have to be solved by the algorithm. Also, the convergence characteristics of LE-CE for different hidden nodes are better than the compared algorithms' as shown in Figures 3-5, because, unlike other meta-heuristic methods, the LE-CE method is a tuning free algorithm and its adaptive mean and standard deviation updating strategy makes it quite suitable to obtain optimal solutions. Traditional CE has not a Levy flight step, so its performance remains inferior to LE-CE. EVDEPSO's and HLPSVNSO’s performance is worse that LE-CE's because both these methods have a large number of parameters to be tuned which affects convergence, whereas LE-CE has a very small number of parameters to be tuned and it searches solutions by updating the mean and standard deviation of elite particles. As a result, it can achieve better solutions as compared to the other tested algorithms. Figures 3-5 clearly show that the LE-CE has better convergence characteristics as compared to the other methods for the set termination criteria. The other methods have many step length updating operators, which increase the computational burden on the algorithm. TABLE II. AVERAGE, STANDARD DEVIATION, AND BEST MSE IN 20 INDEPENDENT RUNS WITH DIFFERENT HIDDEN NODES Hidden nodes (H) Algorithm Average MSE Std dev MSE Best MSE 4 LE-CE 1.9021e—02 1.1905e—03 1.3538e—02 CE 2.1926e—02 2.0907e—03 1.4538e—02 EVDEPSO 2.7048e—02 2.0825e—02 7.0291e—03 HL_PS_VN SO 4.1936e—02 1.0413e—02 4.0013e—02 5 LE-CE 1.8127e—02 1.6252e—03 1.3085e—02 CE 1.9457e—02 1.7267e—03 1.6525e—02 EVDEPSO 2.4756e—02 1.7638e—02 1.3629e—02 HL_PS_VN SO 4.4355e—02 1.1071e—02 4.0770e—02 6 LE-CE 1.3593e—02 2.2182e—03 1.2013e—02 CE 1.6932e—02 2.0082e—03 1.5113e—02 EVDEPSO 1.8453e—02 7.2653e—03 1.6253e—02 HL_PS_VN SO 1.0441e—01 1.0374e—01 4.4510e—02 7 LE-CE 1.3682e—02 1.2446e—03 1.2128e—02 CE 2.6122e—02 1.0746e—03 2.4358e—02 EVDEPSO 1.7691e—02 4.0023e—03 1.4523e—02 HL_PS_VN SO 5.5226e—02 6.6222e—03 5.0238e—02 8 LE-CE 1.7042e—02 3.2045e—03 1.2511e—02 CE 1.8142e—02 6.2545e—03 1.3011e—02 EVDEPSO 2.1454e—02 6.0198e—03 2.0378e—02 HL_PS_VN SO 4.7541e—02 7.2863e—03 3.8607e—02 9 LE-CE 1.3294e—02 3.2993e—03 1.0070e—02 CE 1.6302e—02 6.5093e—03 1.0670e—02 EVDEPSO 5.0615e—02 1.2549e—02 5.0301e—02 HL_PS_VN SO 3.8314e—02 1.3757e—02 3.0293e—02 10 LE-CE 1.2521e—02 1.0021e—03 1.1021e—02 CE 1.8221e—02 1.1821e—03 1.4321e—02 EVDEPSO 3.9265e—02 1.3342e—02 3.0427e—02 HL_PS_VN SO 4.4245e—02 1.3204e—02 4.3230e—02 11 LE-CE 1.2646e—02 1.8009e—03 1.0840e—02 CE 1.5246e—02 2.1709e—03 1.0880e—02 EVDEPSO 1.4203e—02 1.2120e—02 1.0393e—03 HL_PS_VN SO 4.1536e—02 6.7323e—03 3.7538e—02 12 LE-CE 1.2006e—02 2.2091e—03 1.0049e—02 CE 1.6806e—02 2.9391e—03 1.3849e—02 EVDEPSO 1.7889e—02 6.9515e—03 1.3409e—02 HL_PS_VN SO 1.2098e—01 1.3702e—01 1.0316e—02 13 LE-CE 1.1753e—02 1.3090e—03 1.0021e—02 CE 2.3453e—02 1.8740e—03 2.2422e—02 EVDEPSO 1.8013e—02 5.0128e—03 1.5113e—02 HL_PS_VN SO 3.1724e—01 1.3744e—01 3.0250e—02 14 LE-CE 1.1270e—02 1.0086e—03 1.0592e—02 CE 2.6250e—02 1.0186e—03 2.3592e—02 EVDEPSO 1.5241e—02 1.0171e—03 1.1297e—02 HL_PS_VN SO 4.3257e—02 1.0214e—02 4.1200e—02 15 LE-CE 1.2189e—02 1.3530e—03 1.1509e—02 CE 3.3589e—02 2.7530e—03 3.1725e—02 EVDEPSO 2.145e—02 2.5846e—03 2.1153e—02 HL_PS_VN SO 5.3539e—02 6.2441e—02 5.2546e—02 Engineering, Technology & Applied Science Research Vol. 12, No. 5, 2022, 9196-9202 9201 www.etasr.com Pandya et al.: Levy Enhanced Cross Entropy-based Optimized Training of Feedforward Neural Networks Fig. 3. Convergence curve with 5 hidden nodes. Fig. 4. Convergence curve with 11 hidden nodes. Fig. 5. Convergence curve with 15 hidden nodes. V. STATISTICAL ANALYSIS A. One-Way ANOVA Statistical Test The ANOVA statistical test [24] is used to verify whether the MSE of all algorithms evaluated for each simulation run shows any significant difference. In this test, the degree of significance is set to 0.05 to check the statistical difference between the tested algorithms. During the comparison, if it is found that the P-value is less than 0.05, then it inferred that all tested algorithms are substantially different from one another. From Table III, it is seen that the F-ratio value is 6.75541. The P-value is 0.000758. So, the result is significant at p<0.05. This implies that at least one of the means of the groups is significantly different from the others. However, it is not known which group(s) contribute to this difference, hence Tukey’s Honestly Significant Difference (HSD) test was carried out. B. Tukey's Hones Honestly Significant Difference Test To further check the statistical difference between two algorithms, the pairwise comparison test entitled Tukey's HSD test [25] was carried out. In this test, the first step is to find the critical value (Qcrit) from the studentized range distribution table [26] based on the a=4 treatments (algorithms) and DF=44 Degrees of Freedom. The critical value obtained from the studentized range distribution table is 3.777. Then, the Qi,j is calculated as per (22) for all the pairwise comparisons with the proposed algorithm. The result values are given in Table IV. , ( ) i j i j Runs y y Q MS N − = (22) where i, j = 1,..a, i ≠ j. is the difference between the optimal fitness values of the compared pair of algorithms. TABLE III. RESULTS OF THE ONE-WAY ANOVA STATISTICAL TEST Variation Sum of square due to source (SS) DF Mean sum of square due to source (MS) F ratio value P-value F crit Between algorithms 0.03278 3 0.0109 6.7554 0.00075 2.816 Within algorithms 0.07117 44 0.0016 Total 0.10395 47 TABLE IV. TUKEY HSD TEST RESULTS Pairwise comparisons Q (Statistic) LE-CE and CE 0.61 LE-CE and EVDEPSO 0.85 LE-CE and HL_PS_VNSO 5.64 Table IV reveals that for all pairwise treatments, the proposed algorithm is substantially statistically different from the other algorithms. The algorithm has been developed in Matlab environment and the source codes can be provided to the enthusiastic learner by the main author upon request. VI. CONLCUSION The Levy Enhanced Cross Entropy method to train feedforward neural network has been proposed in this paper. The LE-CE method has less parameters to be tuned and its adaptive updating for mean and standard deviation of the solutions make it quite powerful in terms of local exploitation and global exploration of the solution space. To further improve its global search exploration, the CE method is powered by the Levy flight. The simulation on the practical Iris test system reveals that the proposed LE-CE method outperforms the contemporary optimization methods in terms of solution quality, iterations, average MSE, standard deviation MSE, and best MSE. The proposed method is confirmed to be statistically different from the other compared optimization methods. The proposed method can be used in highly complex nonlinear ANN systems. In the future, the same algorithm will be applied to train more realistic training sets of neural networks with hundreds of variables and constraints. Engineering, Technology & Applied Science Research Vol. 12, No. 5, 2022, 9196-9202 9202 www.etasr.com Pandya et al.: Levy Enhanced Cross Entropy-based Optimized Training of Feedforward Neural Networks REFERENCES [1] C.-J. Lin, C.-H. Chen, and C.-Y. Lee, "A self-adaptive quantum radial basis function network for classification applications," in International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541), Budapest, Hungary, Jul. 2004, vol. 4, pp. 3263–3268, https://doi.org/ 10.1109/IJCNN.2004.1381202. [2] S. Mirjalili, S. Z. Mohd Hashim, and H. Moradian Sardroudi, "Training feedforward neural networks using hybrid particle swarm optimization and gravitational search algorithm," Applied Mathematics and Computation, vol. 218, no. 22, pp. 11125–11137, Jul. 2012, https://doi.org/10.1016/j.amc.2012.04.069. [3] K. Hornik, M. Stinchcombe, and H. White, "Multilayer feedforward networks are universal approximators," Neural Networks, vol. 2, no. 5, pp. 359–366, Jan. 1989, https://doi.org/10.1016/0893-6080(89)90020-8. [4] J.-R. Zhang, J. Zhang, T.-M. Lok, and M. R. Lyu, "A hybrid particle swarm optimization–back-propagation algorithm for feedforward neural network training," Applied Mathematics and Computation, vol. 185, no. 2, pp. 1026–1037, Feb. 2007, https://doi.org/10.1016/j.amc.2006.07.025. [5] M. Gori and A. Tesi, "On the Problem of Local Minima in Backpropagation," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 14, no. 1, pp. 76–86, Jan. 1992, https://doi.org/10.1109/ 34.107014. [6] L. V. Kantorovich, "Functional analysis and applied mathematics," Uspekhi Matematicheskikh Nauk, vol. 3, no. 6, pp. 89–185, 1948. [7] E. A. Ogbonnaya, E. M. Adigio, H. U. Ugwu, and M. C. Anumiri, "Advanced Gas turbine rotor shaft fault diagnosis using artificial neural network," International Journal of Engineering and Technology Innovation, vol. 3, no. 1, pp. 58–69, 2013. [8] R. Kaluri and P. Reddy CH, "Optimized feature extraction for precise sign gesture recognition using self-improved genetic algorithm," International Journal of Engineering and Technology Innovation, vol. 8, no. 1, pp. 25–37, 2018. [9] M. Njah and R. E. Hamdi, "A Constrained Multi-Objective Learning Algorithm for Feed-Forward Neural Network Classifiers," Engineering, Technology & Applied Science Research, vol. 7, no. 3, pp. 1685–1693, Jun. 2017, https://doi.org/10.48084/etasr.968. [10] D. N. Truong and V. T. Bui, "Hybrid PSO-Optimized ANFIS-Based Model to Improve Dynamic Voltage Stability," Engineering, Technology & Applied Science Research, vol. 9, no. 4, pp. 4384–4388, Aug. 2019, https://doi.org/10.48084/etasr.2833. [11] L. T. H. Nhung, T. T. Phung, H. M. V. Nguyen, T. N. Le, T. A. Nguyen, and T. D. Vo, "Load Shedding in Microgrids with Dual Neural Networks and AHP Algorithm," Engineering, Technology & Applied Science Research, vol. 12, no. 1, pp. 8090–8095, Feb. 2022, https://doi.org/ 10.48084/etasr.4652. [12] E. Rashedi, H. Nezamabadi-pour, and S. Saryazdi, "GSA: A Gravitational Search Algorithm," Information Sciences, vol. 179, no. 13, pp. 2232–2248, Jun. 2009, https://doi.org/10.1016/j.ins.2009.03.004. [13] R. Storn and K. Price, "Differential Evolution – A Simple and Efficient Heuristic for global Optimization over Continuous Spaces," Journal of Global Optimization, vol. 11, no. 4, pp. 341–359, Dec. 1997, https://doi.org/10.1023/A:1008202821328. [14] J. Kennedy and R. Eberhart, "Particle swarm optimization," in International Conference on Neural Networks, Perth, WA, Australia, Dec. 1995, vol. 4, pp. 1942–1948, https://doi.org/10.1109/ICNN.1995. 488968. [15] D. Dabhi and K. Pandya, "Enhanced Velocity Differential Evolutionary Particle Swarm Optimization for Optimal Scheduling of a Distributed Energy Resources With Uncertain Scenarios," IEEE Access, vol. 8, pp. 27001–27017, 2020, https://doi.org/10.1109/ACCESS.2020.2970236. [16] D. Dabhi and K. Pandya, "Uncertain Scenario Based MicroGrid Optimization via Hybrid Levy Particle Swarm Variable Neighborhood Search Optimization (HL_PS_VNSO)," IEEE Access, vol. 8, pp. 108782–108797, 2020, https://doi.org/10.1109/ACCESS.2020.2999935. [17] R. Y. Rubinstein, "Optimization of computer simulation models with rare events," European Journal of Operational Research, vol. 99, no. 1, pp. 89–112, May 1997, https://doi.org/10.1016/S0377-2217(96)00385-2. [18] C. T. Brown, L. S. Liebovitch, and R. Glendon, "Levy Flights in Dobe Ju/’hoansi Foraging Patterns," Human Ecology, vol. 35, no. 1, pp. 129– 138, Feb. 2007, https://doi.org/10.1007/s10745-006-9083-4. [19] X. S. Yang, "Random Walks and Levy Flights," in Nature-Inspired Metaheuristic Algorithms, 2nd ed., Cambridge, UK: Luniver Press, 2010, pp. 11–19. [20] D. H. Wolpert and W. G. Macready, "No free lunch theorems for optimization," IEEE Transactions on Evolutionary Computation, vol. 1, no. 1, pp. 67–82, Apr. 1997, https://doi.org/10.1109/4235.585893. [21] R. A. Fisher, "The Use of Multiple Measurements in Taxonomic Problems," Annals of Eugenics, vol. 7, no. 2, pp. 179–188, 1936, https://doi.org/10.1111/j.1469-1809.1936.tb02137.x. [22] "The Iris Dataset," Gist. https://gist.github.com/curran/a08a1080b88344 b0c8a7. [23] K. Thirunavukkarasu, A. S. Singh, P. Rai, and S. Gupta, "Classification of IRIS Dataset using Classification Based KNN Algorithm in Supervised Learning," in 4th International Conference on Computing Communication and Automation, Greater Noida, India, Dec. 2018, pp. 1–4, https://doi.org/10.1109/CCAA.2018.8777643. [24] E. Ostertagova and O. Ostertag, "Methodology and Application of Oneway ANOVA," American Journal of Mechanical Engineering, vol. 1, no. 7, pp. 256–261, Jan. 2013, https://doi.org/10.12691/ajme-1-7-21. [25] H. Abdi and L. J. Williams, "Tukey’s Honestly Significant Difference (HSD) Test," in Encyclopedia of Research Design, Thousand Oaks, CA, USA: SAGE, 2010. [26] H. L. Harter, "Critical Values for Duncan’s New Multiple Range Test," Biometrics, vol. 16, no. 4, pp. 671–685, 1960, https://doi.org/10.2307/ 2527770.