INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL ISSN 1841-9836, 12(2):265-275, April 2017. Balancing Between Exploration and Exploitation in ACO A.E. Negulescu, S.C. Negulescu, I. Dzitac Alina Eugenia Negulescu University POLITEHNICA of Bucharest Romania, 060042 Bucharest, sector 6, Splaiul Independentei, 313 alina.lascu@gmail.com Sorin Constantin Negulescu Lucian Blaga University of Sibiu Romania, 550024 Sibiu, Victoriei Bvd., 10 sorin.negulescu@ulbsibiu.ro Ioan Dzitac 1. Aurel Vlaicu University of Arad Romania, 310330 Arad, Elena Dragoi, 2 ioan.dzitac@uav.ro 2. Agora University of Oradea Romania, 410526 Oradea, Piata Tineretului, 8 rector@univagora.ro Abstract: In order to balance the preference of the artificial entities towards ex- ploration or exploitation (in their transition rule), a novel technique is proposed for replacing the random function used by the classical Ant Colony Optimization (ACO) algorithms for solving the Traveling Salesman Problem (TSP). The proposed Beta Distribution function (B), or random.betavariate(a,b) has the proven capability (de- picted through test-runs) of influencing the algorithm’s solution quality and conver- gence speed. Consequently, this paper will introduce in the related work section the classical ACO algorithm, with a focus on the transition rule used for choosing the next node in the problem’s associated graph, followed by the related work on this topic, and it will continue with the introduction of the B function which will be pre- sented both from a theoretical and practical perspective in relation with the scope: balancing between exploration and exploitation in order to improve the performance of the ACO algorithm for the TSP. The paper concludes that the B-EAS has the ability to find better solution than EAS for a set of benchmarks from the TSPLib library. Keywords: ACO, TSP, B, Elitist Ant System (EAS), Beta-Elitist Ant System (B- EAS), transition rule. 1 Introduction Swarm inspired algorithms, and specifically the ACO area, represent a steady interest over the past five years, as reflected in searches and recorded by Google Trends [7]. The constant focus on this family of algorithms is due to their simplicity in terms of modeling and, implicitly, the ease of understanding their functionality, which makes them, in turn, very appealing for computer scientists aiming to solve optimization problems. Since they are inspired from nature, the principles of the ACO algorithms are easy to grasp. Their well-known applicability is for solving the TSP, which can also be very easily explained but, unfortunately, not that simple to solve: a traveling salesman has a list of cities which he must visit and return back to the starting point; there are several conditions that must be met: he must pass through each city only once, and he must find the shortest route to fulfill this task, as Copyright © 2006-2017 by CCC Publications 266 A.E. Negulescu, S.C. Negulescu, I. Dzitac otherwise, longer routes will be associated with bigger costs. This problem can be extrapolated to the world of ants which, foraging for food, need to find the shortest path between the nest and the food source. Because of these resemblances in terms of tasks, the ant colony inspired algorithms are directly applicable for solving the TSP. Pertaining the parameters, by mirroring the natural ant colony model, the classical ACO algorithm operates with artificial entities (i.e. the ants), positive feedback using pheromone deposit in the environment, heuristic information regarding distances in the search-space and negative feedback resulted from pheromone evaporation. For all intents and purposes, these parameters, which are inspired from nature, are the most influential ones in any ACO algorithm. Other common traits between ACO algorithm variants are the initialization steps that in- clude: • loading the problem’s associated graph; • setting up the initial parameter’s values; • initialization of the pheromone trails on the graph’s arches (using a minimum quantity of pheromone) as to "kick-start" the algorithm as otherwise, it would take a couple of iterations until the artificial ants would do this instead and this would result in additional computational costs. Even though the last described step is obviously not characteristic to real ants, the model has been slightly altered so as to increase the convergence speed of the algorithm towards a solution. After performing the initialization, as long as the algorithm’s stop condition is not met (represented by, for instance, a maximum number of iterations or time limitation), every artificial ant needs to perform a set of steps, part of the construct solutions function: • choose a starting node on the graph – predictably (ant 1 in node 1, ant 2 in node 2 and so on) or randomly; • choose next node until all nodes are visited only once (function on which this paper is centered), during which the "artificial ant has to decide" which nodes are more attractive and choose one of them based on a probability formula that will be presented shortly; After all artificial ants have constructed solutions: • update pheromone trail function is employed for depositing a pheromone quantity on all the arches of the graph visited by the artificial ants during a tour, which, following the natural model, represents the positive feedback, or reinforcement; in this way the arches that have been used by multiple ants will have a greater pheromone intensity; • evaporate pheromone trail function will act as negative feedback in order for the algorithm to "forget" old and inefficient trails; this step is paramount as without it, the algorithm would converge too fast and get trapped in local optima; • store best solution function will compare all tour lengths of the artificial ants and record the smallest length value along with the order in which the nodes have been visited, for future comparisons. Having presented the context, the paper will focus on the transition rule of ACO algorithms which is based on a probability formula (as mentioned earlier). The state of the art is presented in "Origins and related work", while the "Rationale and approach" section pinpoints the drawbacks of current approaches and presents the beta distribution function as a possible solution. The Balancing Between Exploration and Exploitation in ACO 267 section entitled "The model", describes the changes that have been applied to the EAS algorithm in order to support the identified approach for improvement: Beta-Elitist Ant System (B-EAS) algorithm. The "Conclusions and intentions" section rounds up the paper by presenting the strengths and weaknesses of the B-EAS algorithm, together with the identified future research intentions in this direction. 2 Origins and related work Initial experiments, conducted with real ants (Linepithaema humile), proved that when ants have to choose between two possible paths of different lengths (when going between a food source and the nest) they will initially travel on any of them with an equal probability but, as time passes, the shorter one will have a higher amount of deposited pheromone (reinforced by multiple, shorter trips) whereas the longer one will have a lower amount of pheromone (due to fewer, longer trips). Moreover, considering that the evaporation speed of the pheromone is equal, the intensity of the pheromone on the longer path will decrease faster. Even though the pheromone intensity on the two paths was different, a very small number of ants were still choosing the one with smaller pheromone intensity from time to time, which lead to the conclusion that the ants were deciding in a probabilistic manner. The findings in this "double-bridge experiment" [8] were adopted when the initial Ant System (AS) algorithm was modeled [5]. As such, almost all ACO algorithms are using a probability formula for determining the next node. The formula coined by AS, which is reproduced below (unaltered) defines, as called by its authors, a random-proportional rule: pki,j (t) =   [τij(t)] α·[ηij(t)]β∑ l∈Nk i [τil(t)] α·[ηil(t)] β ifj ∈ Nki 0 ifj /∈ Nki (1) Formula 1 corresponds to the probability p at the moment t for an ant k visiting node i to choose node j as the next node in which it will go. α and β are two general parameters used for weighting the pheromone intensity τij and the visibility ηij (defined as 1/distance) between nodes i and j when computing (at the moment t) the cost of choosing node j. The set Nki contains the nodes the ant k is allowed to choose (i.e. all the nodes which were not yet visited, as constrained by the TSP). So, the formula can be expressed as being the ratio between the cost of choosing node j and the sum of costs for all the other nodes not yet visited. Of course, this probability is 0 if node j is already visited. Before the above presented formula is applied, a random real number between 0 and 1 is generated, representing the probability value. Afterwards, pkij (t) is calculated for as many nodes j ∈ Nki as necessary, until the sum of these computed costs is equal or greater than the selected probability and the last j node is chosen to move to. In this way, in many cases the formula is not applied for all the nodes not yet visited, sparing some time. The steps above are repeated until no more nodes are left to visit, resulting a list of nodes visited by the ant k. As a further improvement of the initial AS algorithm, Dorigo and Gambardella proposed in the Ant Colony System (ACS) algorithm a new transition rule that included exploration and exploitation traits [6]. This improvement, as described by its authors, is due to the pseudo- random-proportional rule for choosing the next node in the itinerary, where the ant k in the node i selects the next node j with a probability of 0% ≤ q0 ≤ 100%. As such, a q0 value is a priori selected when the algorithm starts, whereas the next node is selected by applying one of the following formulas depending on a randomly generated value q: • if q < q0 then the node for which τij (t) · [ηij (t)]β is maximum, is used; 268 A.E. Negulescu, S.C. Negulescu, I. Dzitac • else, the node is selected by using the same formula introduced in the AS algorithm. As described in [6], "ACS algorithm’s state transition rule provides a direct way to balance between exploration of new edges and exploitation of a priori and accumulated knowledge about the problem". Along with other several improvements, ACS’ proven efficiency was demonstrated by its authors through a series of experiments whose results had shown that this algorithm outperformed other algorithms inspired from nature, as for instance, simulated annealing and evolutionary computation. Another improvement of the AS was the Elitist Ant System (EAS) [5] that was reinforcing the pheromone intensity on the arches belonging to the tour of the ants that found the best solutions at the moment t. In this way one can state that the elitist ants can deposit a greater amount of pheromone. The experiments of the before mentioned authors indicated that a relatively small number of elitist ants can help the performance of the algorithm. Nevertheless, as depicted in [1], the author claims that "the transition rule suggested in the original Ant System and used extensively in modified versions [...] uses a power law scaling procedure to define the relative importance of the pheromone intensity and local information. [...] This form of transition rule, however, often leads to premature convergence of the ant algorithms to suboptimal solution when used with elitist strategy of pheromone updating". The author proposed, as a result, an alternative transition rule to be used with elitist strategy of pheromone updating in ant algorithms. Moreover, the author of [1] considers that the classical transition rule presented in formula 1 would be the source of the local optima issue which called for the design of the MAX-MIN Ant System (MMAS) algorithm. Further explained in [1], the reason for which the transition rule is not efficient is that the influence of the pheromone intensity (α) will play a main role in the decision making process of an ant choosing the next node to visit, while the heuristic information will have no impact. This is determining, as a result, the stagnation of the search. Consequently, the author proposes an alternative method of transition that would prevent the pheromone intensity from "dominating the ants decision table at all stages of computation". This formula is represented as: pki,j (t) =   ατij(t)·βηij∑ l∈Nk i [ατil(t)·βηil] ifj ∈ Nki 0 ifj /∈ Nki (2) The author in [1] claims that, this new transition rule based on addition, will ensure that for “the properly adjusted values of the pheromone, and local heuristics sensitivity parameters, the local heuristic term ηij will always have a saying in forming the decision table, irrespective of the distribution of pheromone intensities”. Moreover, this method has the advantage of "reducing the number of controlling parameters, as one can always assume the value of unity for one of these parameters and adjust the value of the other one for the best performance of the method", as described by the aforementioned author. Other solutions that were targeting the improvement of the original AS algorithm were the Rank-Based Ant System (RBAS) [3], Best-Worst Ant System (BWAS) [4], HyperCube- Ant Colony Optimization (HC-ACO) [2] and Grahp-Based Ant System (GBAS) [9], to name a few of the most common. Also, a respectable number of approaches went beyond the ini- tial paradigm, implementing pheromone correction strategies [14] or creating hybrid solutions involving evolutionary computation [15] and neural networks [13]. Whereas in the classical ACO algorithm, α and β parameters are global, in the model pre- sented in [11], they become local and "personal" to each individual ant. In this way, ants can behave differently in their transition rule. Considering that the resource costs associated with this approach is minimal (in terms of memory) the benefits are various: this variant of ACO Balancing Between Exploration and Exploitation in ACO 269 algorithm has the ability to self-tune depending on the problem type and size (if local α and β parameters are adjusted based on the tour lengths found by each ant), the convergence speed can be improved and the parameters values can be easily adjusted automatically. All these benefits can assist in addressing the stagnation issue (i.e. local optima) that is typical to all meta-heuristic algorithms. 3 Rationale and approach As it can be depicted from the selected examples provided in the previous section, there is an on-going interest regarding the improvement of the ACO algorithms, due to their proven potential for solving combinatorial optimization problems. Some improvement approaches are focusing on the update pheromone intensities rule, others are looking towards the parameters’ values setting, while this paper investigates the transition rule which the ants use to move through the nodes in the graph. All algorithms from the meta-heuristic class are relaying on a probability that implies the use of randomly generated numbers in different manners. Without this intrinsic randomness factor the behavior would tend to be the same as the best-first algorithm and the solutions quality would be poor. The approach presented in this paper, that would allow to change the algorithm’s behavior towards exploration or exploitation, is based on another type of probability: B, that would allow a beta-proportional rule. Beta distribution can be defined, according to [10] as "a mean of distribution probabilities", that essentially, represents "all the possible values of an unknown probability". This method is reasonably employed by statistics and probability theory and is defined in [0, 1] domain, having two positive shape parameters: a and b. Its application is widely used in diverse areas as a technique that allows influencing the behavior of random variables limited to ranges of limited lengths. Given these features, it can be used as a proper model for random behavior (pertaining the uncertainty of an outcome). For all practical purposes, a test-run may only have two possible results: either success (with a probability of x) or failure (with a probability of 1−x). As explained by the same author, this can be probabilistically illustrated by assigning to x a uniform distribution in the [0, 1] domain, as x being a probability, it can only take values between 0 and 1. Also, the uniform distribution assigns respective probability density to every entity in the domain, which effectively means, that no possible value of x is a priori, more likely than the other. As such, for n independent iterations performed during an experiment, there can result both k successful outcomes, as well as n−k failed ones. Consequently, the conditional distribution of x, based on the observation of k successful outcomes resulted from n iterations, represents, actually, a beta distribution with k + 1 and n−k + 1 parameters. Therefore, as depicted in [10], for x, absolutely continuous random variable: Rx = [0, 1] and a,b ∈ R++, x has a beta distribution with a,b as shape parameters of x’s probability density function, presented through the following formula, where B (a,b) represents the B function: f (x) = { 1 B(a,b) xa−1 ( 1 −xb−1 ) ifx ∈ Rx 0 ifx /∈ Rx (3) To visually grasp the behavior of this function for different parameter values, below are presented several histograms. The data set is represented by 10000 samples with values within the [0, 500] domain on the abscisa and the number of randomly generated samples is displayed on the ordinate. 270 A.E. Negulescu, S.C. Negulescu, I. Dzitac 0 200 400 600 800 0 100 200 300 400 500 N u m b er o f g en er a te d sa m p le s Value a = 0.7, b = 1.0 a = 1.0, b = 0.7 Figure 1: Histogram for beta distribution function with one sub-unitary and one unitary argu- ment 0 100 200 300 400 500 600 0 100 200 300 400 500 N u m b er o f g en er a te d sa m p le s Value a = 1.0, b = 1.0 a = 0.7, b = 0.7 a = 5.0, b = 5.0 Figure 2: Histogram for beta distribution function with equal arguments values Balancing Between Exploration and Exploitation in ACO 271 According to the visual representation of B in figures 1 and 2, one can easily depict the fact that this function is highly flexible since it allows, depending on the values of its parameters, to randomly generate a higher number of values at one end or the other of the interval, a uniform distribution and a normal (inverse) distribution. Also, it is important to mention that, if the a and b parameters equal 1, this function is behaving exactly like the classic random function where all the values have an equal probability of being generated and the algorithm would perform exactly like the AS algorithm. 4 The model Based on the previously introduced approach, consisting in replacing the random-proportional transition rule with the beta-proportional one, this section describes in detail, the model and the method used for testing it. The foundation of the model is the EAS algorithm where the herein proposed transition rule has been modified to include the following steps: • Determine the costs of moving from the current node to all the nodes not yet visited by applying the formula for computing pkji (t) using the same formula used by EAS (as presented in the origins and related work section). • Store these costs together with the corresponding node number in a vector by inserting them in such a way that the vector is ordered increasingly (so the nodes that have smaller costs are at the beginning of the vector). Obviously, generating this vector in an ordered fashion has the advantage that it is much faster than generating and then sorting it. • Generate a random number according to B that will indicate the position (in the vector) of the node selected as the next node to move to. The values of the parameters for the function must be chosen in such a way that it will generate with a higher probability small values so the elements at the beginning of the vector have a higher chance to be selected. 1 2 3 4 5 �1,5 �1,5 � 1 ,2 � 1 ,2 �1,3 �1,3 � 1,4 � 1,4 Figure 3: Example of associated graph with heuristic and pheromone information For a better understanding of the steps involved in the new model called Beta-Elitist Ant System (B-EAS), the graph in figure 3 is provided as an example. Presuming that an ant starts in node 1, the cost of going to nodes 2, 3, 4 and 5 is taking into consideration the pheromone intensity and the visibility on each arch. The ordered vector, resulted after computing the costs, is presented in figure 4, together with the probability of selection according to B function. As it can be depicted in the example, node 3 which involves the smallest cost in terms of visibility and pheromone intensity, has a higher probability of being selected than 2, 4 and 5 if the arguments of B are properly chosen. Test-runs were conducted for a full factorial experiment in order to determine appropriate values for the a and b parameters of B. Subsequently, in order to generate smaller values with 272 A.E. Negulescu, S.C. Negulescu, I. Dzitac Ordered vector of nodes with computed values P ro b a b il it y o f se le c ti o n 100% 75% 50% 25% 0% 3 2 4 5 ∑ ��13+��13 ��1l+��1l)5 l=2 ∑ ��12+��12 (��1l+��1l)5 l=2 ∑ ��14+��14 (��1l+��1l)5 l=2 ∑ ��15+��15 (��1l+��1l)5 l=2 ∑ Figure 4: Example of selection probability for the ordered vector’s elements 1s t B et a D ist rib ut io n Fu nc tio n Pa ra m et er Iterations T o u r L e n g th D e v ia tio n fro m O p tim u m (% ) < 100% < 50% < 40% < 30% < 20% < 10% < 60% < 70% < 80% < 90% 10 0 90 80 70 60 50 40 30 20 10 0 Figure 5: Solution length versus 1st parameter of beta distribution function) Balancing Between Exploration and Exploitation in ACO 273 425 450 475 500 525 550 0 200 400 600 800 1000 S o lu ti o n L en g th Iterations B-EAS EAS Figure 6: Comparison between EAS and B-EAS algorithms running on eil51 benchmark 1.5 · 104 1.6 · 104 1.7 · 104 1.8 · 104 1.9 · 104 2 · 104 2.1 · 104 0 1000 2000 3000 4000 5000 S o lu ti o n L en g th Iterations B-EAS EAS Figure 7: Comparison between EAS and B-EAS algorithms running on d198 benchmark 274 A.E. Negulescu, S.C. Negulescu, I. Dzitac a higher probability, the values for parameter a have been tested within [0.1, 1] domain, while b was maintained constant at 1. The results are presented further on in figure 5, where, for a set of problems with known solutions, the algorithm was constrained to run for 100 iterations. The quality of the obtained solutions were averaged and compared in a percentual manner to the known solutions so, the optimal solution obtained is 0% deviating from the best known ones (represented in darker color in the graph). The conclusion of this experiment is that the optimal parameters’ values of B function, used for generating random numbers, are a = 0.4 and b = 1. As displayed in figure 5 (where the dependent variable tour length is expressed versus parameter a and the number of iterations dependent variables), these values are driving the algorithm towards finding better solutions (with shorter lengths), and with a higher convergence speed. A comparison between the EAS and B-EAS algorithms using standard benchmarks from the TSPLib [12] in regards to solutions quality and convergence speed is presented in figures 6 and 7. We have observed that without modifying the a and b parameters during the test-runs, the B-EAS was converging slower than EAS, but it continued to find better solutions, whereas EAS would have stopped in a local optima. As stated before, when parameters have the values a < 1 and b = 1, ants tend to choose with a higher probability the nodes that give smaller costs of travel (exploitation), but there is also the probability to choose nodes that imply higher costs (exploration). This is why the graphs are displaying large improvements that are not further more developed over a high number of iterations, especially when a is much smaller than 1. Conclusions and intentions The paper presents a new algorithm (B-EAS) that can be stirred towards exploitation or exploration by modifying the a and b parameters used in the beta-proportional transition rule. Specifically, the hereby authors propose replacing the classical random function for choosing the next node with the beta distribution function which allows a more flexible way of generating random numbers. To validate the proposed model, the optimal parameter values of B were determined for a set of benchmark problems using a full factorial experiment. The next logical step was to compare through test-runs the performance of B-EAS with the original algorithm from which it was derived (EAS). The results have shown that B-EAS finds better solutions than EAS, given enough time or iterations. Considering the approach introduced by the same authors in [11], where the global parameters of the algorithm (α, β and � ) were transformed into local parameters for each artificial ant, creating thus a "population" of agents with individual characteristics (personalities), the same approach could be applied for the a, and b parameters of the B function, as well. We consider that this research direction has the potential to create a genetic-featured ACO, where the algorithm could evolve and "tune" itself. Bibliography [1] Afshar, M.H. (2005); A new transition rule for ant colony optimization algorithms: appli- cation to pipe network optimization problems, Engineering Optimization, 37 (5): 525-540. [2] Blum, C.; Roli, A.; Dorigo, M. (2001); HC-ACO: The Hyper-Cube Framework for Ant Colony Optimization, IEEE Transactions on Systems, Man, and Cybernetics-Part B, 399- 403. Balancing Between Exploration and Exploitation in ACO 275 [3] Bullnheimer, B.; Hartl, R.F.; Strauss, C. (1999); A New Rank Based Version of the Ant System: A Computational Study, Central European Journal for Operations Research and Economics, 25-38. [4] Cordón, O.; de Viana, I.F.; Herrera, F.; Moreno, L. (2000); A new ACO model integrat- ing evolutionary computation concepts: The best-worst Ant System. In M. Dorigo, M. Middendorf, & T. Stuützle (Eds.), Abstract Proceedings of ANTS 2000-From Ant Colonies to Artificial Ants: Second International Workshop on Ant Algorithms, Brussels, IRIDIA, Université Libre de Bruxelles, 22-29. [5] Dorigo, M.; Maniezo, V.; Colorni, A. (1996); Ant system: optimization by a colony of cooperating agents, IEEE Transactions on Systems, Man, and Cybernetics - Part B, 26 (1): 29-41. [6] Dorigo, M.; Gambardella, L. (1997); Ant Colony System: A Cooperative Learning. IEEE Transactions of Evolutionary Computation, 1 (1): 53-66. [7] Google (2016). Google Trends, Retrieved from https://www.google.com/trends/explore? q=AntColonyOptimization. [8] Goss, S.; Aron, S.; Deneubourg, J.L.; Pasteels, J.M. (1989); Self-organized shortcuts in the Argentine ant, Naturwissenschaften, 76, 579-581, Springer-Verlag. [9] Gutjahr, W.J. (2000). A Graph-based Ant System and its convergence, Future Generation Computer Systems, 873-888. [10] Kerns, G. (2011); Introduction to Probability and Statistics Using R, IPSUR, ISBN: 978-0- 557-24979-4. [11] Negulescu, S.; Dzitac, I.; Lascu, A. (2010); Synthetic Genes for Artificial Ants. Diversity in Ant Colony Optimization Algorithms, International Journal of Computers Communication & Control, 5 (2): 216-223. [12] Reinelt, G. (1991); TSPLIB - A Traveling Salesman Problem Library. ORSA Journal on Computing, 376-384. [13] Susnea, I.; Axenie, C. (2015); Cognitive maps for indirect coordination of intelligent agents, Studies in Informatics and Control, 24(1): 111-118. [14] Tuba, M.; Jovanovic, R.(2013); Improved ACO Algorithm with Pheromone Correction Strat- egy for the Traveling Salesman Problem, International Journal of Computers Communica- tions & Control, 8(3): 477-485. [15] Wei, X.; Han, L.; Hong, L. (2014); A modified ant colony algorithm for traveling salesman problem, International Journal of Computers Communications & Control, 9(5): 633-643.