Acta Polytechnica doi:10.14311/APP.2016.56.0018 Acta Polytechnica 56(1):18–26, 2016 © Czech Technical University in Prague, 2016 available online at http://ojs.cvut.cz/ojs/index.php/ap AD HOC TEAMWORK BEHAVIORS FOR INFLUENCING A FLOCK Katie Genter∗, Peter Stone Department of Computer Science, The University of Texas at Austin, Austin, TX, USA 78712 ∗ corresponding author: katie@cs.utexas.edu Abstract. Ad hoc teamwork refers to the challenge of designing agents that can influence the behavior of a team, without prior coordination with its teammates. This paper considers influencing a flock of simple robotic agents to adopt a desired behavior within the context of ad hoc teamwork. Specifically, we examine how the ad hoc agents should behave in order to orient a flock towards a target heading as quickly as possible when given knowledge of, but no direct control over, the behavior of the flock. We introduce three algorithms which the ad hoc agents can use to influence the flock, and we examine the relative importance of coordinating the ad hoc agents versus planning farther ahead when given fixed computational resources. We present detailed experimental results for each of these algorithms, concluding that in this setting, inter-agent coordination and deeper lookahead planning are no more beneficial than short-term lookahead planning. Keywords: ad hoc teamwork; agent cooperation; coordination; flocking. 1. Introduction Consider a flock of migrating birds that is flying di- rectly towards a dangerous area, such as an airport or a wind farm. It will be better for both the flock and the humans if the path of the migratory birds is altered slightly such that the flock can avoid the dangerous area but still reach its migratory point at approximately the same time. The above scenario is a motivating example for our work in orienting a flock using ad hoc teamwork. We assume that each bird in the flock dynamically adjusts its heading based on that of its immediate neighbors. We assume further that we control one or more ad hoc agents — perhaps in the form of robotic birds or ultralight aircraft1 — that are perceived by the rest of the flock as one of their own. Flocking is an emergent behavior found in different species in nature including flocks of birds, schools of fish, and swarms of insects. In each of these cases, the animals follow a simple local behavior rule that results in a group behavior that appears well organized and stable. Research on flocking behavior has appeared in various disciplines such as physics [1], graphics [2], biology [3, 4], and distributed control theory [5–7]. In each of these disciplines, the research has focused mainly on characterizing the emergent behavior. In this paper, we consider the problem of leading a team of flocking agents in an ad hoc teamwork setting. An ad hoc teamwork setting is one in which a teammate — which we call an influencing agent — must determine how to best achieve a team goal given a set of possibly suboptimal teammates. In this work, we are given a team of flocking agents following a known, well-defined rule characterizing their flocking 1www.operationmigration.org behavior, and we wish to examine how the influencing agents should behave. Specifically, the main question addressed in this paper is: how should influencing agents behave so as to orient the rest of the flock towards a target heading as quickly as possible? The remainder of this paper is organized as follows. Section 2 introduces our problem and necessary termi- nology for this paper. The main contribution of this paper is the 1-step lookahead algorithm for orienting a flock to travel in a particular direction. This algo- rithm is presented in Section 3, while variations of this algorithm are presented in Sections 4 and 5. We present the results of running experiments using these algorithms in the MASON simulator [8] in Section 6. Section 7 situates this research in the literature, and Section 8 concludes. 2. Problem Definition In this work we use a simplified version of the Reynolds’ Boid algorithm for flocking [2]. We assume that each agent calculates its orientation for the next time step to be the average heading of its neighbors. Throughout this paper, an agent’s neighbors are the agents located within some set radius of the agent. In order to calculate its orientation for the next time step, each agent computes the vector sum of the velocity vectors of each of its neighbors and adopts a scaled version of the resulting vector as its new orientation. An agent is not considered to be a neighbor of itself, so an agent’s current heading is not considered when calculating its orientation for the next time step. Figure 1 shows an example of how an agent’s new velocity vector is calculated. At each time step, each agent moves one step in the direction of its current vector and then calculates its new heading based on those of its neighbors, keeping a constant speed. 18 http://dx.doi.org/10.14311/APP.2016.56.0018 http://ojs.cvut.cz/ojs/index.php/ap www.operationmigration.org vol. 56 no. 1/2016 Ad Hoc Teamwork Behaviors for Influencing a Flock =++ ≈ Figure 1. An example of how an agent’s new velocity vector would be calculated. In this example, the black dot represents the agent in question, the solid arrows represent the velocity vectors of the agent’s neighbors, and the dotted circle represents the area of the agent’s neighborhood. The agent’s new velocity vector is calculated as shown at the bottom of the figure — in this calculation, the three vectors are first summed and then scaled to maintain constant speed. Over time, agents behaving as described above will gather into one or more groups, and these groups will each travel in some direction. However, in this work we add a small number of influencing agents to the flock. These influencing agents attempt to influence the flock to travel in a pre-defined direction — throughout this paper we refer to this desired direction as θ∗. Note that the challenge of designing influencing agent behaviors in a dynamic flocking system is difficult because the action space is continuous. Hence, in our work we make the simplifying assumption of only considering a limited number (numAngles) of discrete angle choices for each influencing agent. 2.1. Simulation Environment We situate our research on flocking using ad hoc team- work within the MASON simulator, a concrete sim- ulation environment [8]. A picture of the Flockers domain is shown in Figure 2. Each agent points and moves in the direction of its current velocity vector. The MASON Flockers domain is toroidal, so agents that move off one edge of our domain reappear on the opposite edge, moving in the same direction. We conclude that the flock has converged to θ∗ when every agent (that is not an influencing agent) is facing within 0.1 radians of θ∗. Other stopping criteria, such as when 90 % of the agents are facing within 0.1 radians of θ∗, could also have been utilized. (a) Start (b) Finish Figure 2. Pictures of (a) the start of a trial and (b) the end of a trial in the MASON Flockers simulation environment. The grey agents are influencing agents, while the black agents are other members of the flock. Variable Definition bestDiff the smallest difference found so far be- tween the average orientation vectors of neighOfIA and θ∗ bestOrient the vector representing the orienta- tion adopted by the influencing agent to obtain bestDiff neighOfIA the neighbors of the influencing agent nOrient the predicted next step orientation vector of neighbor n of the influencing agent if the influencing agent adopts iaOrient nOrients a set of the predicted next step orien- tation vectors of all of the neighbors of the influencing agent, assuming the influencing agent adopts iaOrient Table 1. Variables used in Algorithm 1. 3. 1-Step Lookahead Behavior In this section we present Algorithm 1, a 1-step looka- head algorithm for determining the individual behav- ior of each influencing agent. This algorithm considers all of the influences on neighbors of the influencing agent at a particular point in time, such that the influencing agent can determine the best orientation to adopt based on this information. The 1-step lookahead algorithm is a greedy, myopic approach for determining the best individual behavior for each influencing agent, where ‘best’ is defined as the behavior that will exert the most influence on the next time step. Note that if the algorithm only considered the current orientations of the neighbors (instead of the influences on these neighbors) when determining the next orientation for the influencing agent to adopt, it would only be estimating the state of each neighbor and hence the resulting orientation adopted by the influencing agent would not be ‘best’. The variables used throughout Algorithm 1 are 19 Katie Genter, Peter Stone Acta Polytechnica Algorithm 1 bestOrient = 1StepLookahead(neighOfIA) 1: bestOrient ← (0, 0) 2: bestDiff ←∞ 3: for each influencing agent orient vector iaOrient do 4: nOrients ←∅ 5: for n ∈ neighOfIA do 6: nOrient ← (0, 0) 7: for n’ ∈ n.neighbors do 8: if n’ is an influencing agent then 9: nOrient ← nOrient + iaOrient 10: else 11: nOrient ← nOrient + n’.vel 12: nOrient ← nOrient|n.neighbors| 13: nOrients ← {nOrient}∪nOrients 14: diff ← avg diff between vects nOrients and θ∗ 15: if diff < bestDiff then 16: bestDiff ← diff 17: bestOrient ← iaOrient 18: return bestOrient defined in Table 1. Two functions are used in Algo- rithm 1: neighbor.vel returns the velocity vector of neighbor and neighbor.neighbors returns a set contain- ing the neighbors of neighbor. Note that Algorithm 1 is called on each influencing agent at each time step, and that the neighbors of the influencing agent at that time step are provided as a parameter to the algorithm. The output from the algorithm is the orientation that, if adopted by this in- fluencing agent, is predicted to influence its neighbors to face closer to θ∗ than any of the other numAngles discrete influencing orientations considered. Conceptually, Algorithm 1 is concerned with how the neighbors of the influencing agent are influenced if the influencing agent adopts a particular orienta- tion at this time step. Algorithm 1 considers each of the numAngles discrete influencing agent orientation vectors. For each orientation vector, the algorithm considers how each of the neighbors of the influenc- ing agent will be influenced if the influencing agent adopts that orientation vector (lines 3–13). Hence, Algorithm 1 considers all of the neighbors of each neighbor of the influencing agent (lines 7–11) — if the neighbor of the neighbor of the influencing agent is an influencing agent, the algorithm assumes that it has the same orientation as the influencing agent (even though, in fact, each influencing agent orients itself based on a different set of neighbors, line 9). On the other hand, if it is not an influencing agent, the algorithm calculates its orientation vector based on its current velocity (line 11). Using this infor- mation, the algorithm calculates how each neighbor of the influencing agent will be influenced by aver- aging the orientation vectors of the each neighbor’s neighbors (lines 12–13). The algorithm then picks the influencing agent orientation vector that results in the Variable Definition n’Orient the predicted next step orientation vec- tor of a neighbor n’ of a neighbor of the influencing agent if the influencing agent adopts iaOrient nOrient2 the predicted ‘2 steps in the future’ ori- entation vector of neighbor n of the in- fluencing agent if the influencing agent adopts iaOrient on the first time step and iaOrient2 on the second time step nOrients2 a set containing the predicted ‘2 steps in the future’ orientation vectors of all of the neighbors of the influencing agent, assuming the influencing agent adopts iaOrient on the first time step and iaOrient2 on the second time step Table 2. Variables used in Algorithm 2 that were not used in Algorithm 1. least difference between θ∗ and the neighbors’ current orientation vectors (lines 14–18). If there are numAgents agents in the flock, the worst- case complexity of Algorithm 1 is calculated as follows. Line 3 executes numAngles times, line 5 executes at most numAgents times, and line 7 executes at most numAgents. Hence, the complexity for Algorithm 1 is O(numAngles∗numAgents2). Results regarding how Algorithm 1 performs in terms of the number of time steps needed for the flock to converge to θ∗ can be found in Section 6. 4. 2-Step Lookahead Behavior Whereas the 1-step lookahead behavior presented in the previous section optimizes each influencing agent’s orientation to best influence its neighbors on the next step, it fails to consider more long-term effects. Hence, in this section we present a 2-step lookahead behavior in Algorithm 2. This 2-step lookahead behavior con- siders influences on the neighbors of the neighbors of the influencing agent, such that the influencing agent can make a more informed decision when determining the best orientation to adopt. The variables used in Algorithm 2 that were not used in Algorithm 1 are defined in Table 2. Like Algorithm 1, Algorithm 2 is called on each influencing agent at each time step, takes in the neighbors of the influencing agent at each time step, and returns the orientation that, if adopted by this influencing agent, will influence the flock to face closer to θ∗ than any of the other numAngles influencing orientations considered. Conceptually, Algorithm 2 is concerned with (1) how the neighbors of each neighbor of the influencing agent are influenced if the influencing agent adopts a particular orientation at this time step (lines 5–13 in Algorithm 2) and (2) how the neighbors of the 20 vol. 56 no. 1/2016 Ad Hoc Teamwork Behaviors for Influencing a Flock Algorithm 2 bestOrient = 2StepLookahead(neighOfIA) 1: bestOrient ← (0, 0) 2: bestDiff ←∞ 3: for each influencing agent orientation iaOrient do 4: nOrients ←∅ 5: for n ∈ neighOfIA do 6: nOrient ← (0, 0) 7: for n’ ∈ n.neighbors do 8: if n’ is an influencing agent then 9: nOrient ← nOrient + iaOrient 10: else 11: nOrient ← nOrient + n’.vel 12: nOrient ← nOrient|n.neighbors| 13: nOrients ← {nOrient}∪nOrients 14: for each influencing agent orientation iaOrient2 do 15: nOrients2 ←∅ 16: for n ∈ neighOfIA do 17: nOrient2 ← (0, 0) 18: for n’ ∈ n.neighbors do 19: n’Orient ← (0, 0) 20: for n” ∈ n’.neighbors do 21: if n” is an influencing agent then 22: n’Orient ← n’Orient + iaOrient 23: else 24: n’Orient ← n’Orient + n”.vel 25: n’Orient ← n’Orient|n’.neighbors| 26: if n’ is an influencing agent then 27: nOrient2 ← nOrient2 + iaOrient2 28: else 29: nOrient2 ← nOrient2 + n’Orient 30: nOrient2 ← nOrient2|n.neighbors| 31: nOrients2 ← {nOrient2}∪nOrients2 32: diff ← the avg diff between vects nOrients and θ∗ and between vects nOrients2 and θ∗ 33: if diff < bestDiff then 34: bestDiff ← diff 35: bestOrient ← iaOrient 36: return bestOrient neighbors of each neighbor of the influencing agent are influenced if the influencing agent adopts a par- ticular orientation at this time step (lines 19–25 in Algorithm 2), since they will influence the neighbors of each neighbor of the influencing agent on the next time step (lines 16–31 in Algorithm 2). Algorithm 2 starts by considering each of the nu- mAngles discrete influencing agent orientation vectors, and by considering how each of the neighbors of the influencing agent will be influenced if the influenc- ing agent adopts that particular orientation vector. For each neighbor of the influencing agent, this re- quires considering all of its neighbors and calculating how each neighbor of the influencing agent will be influenced on the first time step (lines 5–13). Next, Al- gorithm 2 considers the effect of the influencing agent adopting each of the numAngles influencing agent ori- entation vectors on a second time step (lines 14–31). As before, this requires considering all of the neighbors of each neighbor of the influencing agent, and calcu- lating how each neighbor of the influencing agent will be influenced (lines 18–31). However, in order to do this the algorithm must first consider how the neigh- bors of the neighbors of the influencing agent were influenced by their neighbors on the first time step (lines 20–25). Finally, Algorithm 2 selects the first step influencing agent orientation vector that results in the least difference between θ∗ and the neighbors’ orientation vectors after both the first and second time steps (lines 32–36). In Algorithm 2 we make the simplifying assumption that agents do not change neighborhoods within the horizon of our planning. Due to the fact that move- ments are relatively small with respect to each agent’s neighborhood size, the effects of this simplification are negligible for the relatively small number of future steps that the 2-step lookahead behavior considers. The complexity of Algorithm 2 can be calculated 21 Katie Genter, Peter Stone Acta Polytechnica as follows. Line 3 executes numAngles times, line 14 executes at most numAngles times, line 16 executes at most numAgents times, line 18 executes at most numAgents times, and line 20 executes at most num- Agents times. Hence, the complexity for Algorithm 2 is O(numAngles2 ∗numAgents3). 5. Coordinated Behavior The influencing agent behaviors presented in Sec- tions 3 and 4 were for individual influencing agents, where each influencing agent calculated its behavior independent of any other influencing agents. In this section, we consider whether influencing agents can exert more influence on the flock by working in a coordinated fashion. In particular, coordination is potentially useful in cases where a flocking agent is in the neighborhoods of multiple influencing agents. Ideally, all of the influencing agents would coordi- nate their behaviors to influence the flock to reach θ∗ as quickly as possible. However, due to computational considerations, in this work this is infeasible due to the complexity of such a calculation. Instead, we pair influencing agents that share some neighbors. These pairs then work in a coordinated fashion to influence their neighbors to orient towards θ∗. We opted to use pairs for simplicity and for computational considera- tions, but our approach could also be applied to larger groups of influencing agents that share neighbors. We select the influencing agents to pair by first finding all pairs of influencing agents with one or more neighbors in common. Then we do a brute-force search and find every possible disjoint combination of these pairs. For each such combination, we calculate the sum of the number of shared neighbors across all the pairs and select the combination with the greatest sum of shared neighbors. This combination of chosen pairs is called the selectedPairs. Note that selectedPairs is recalculated at each time step. The behavior of each influencing agent depends on whether it is part of a pair in selectedPairs or not. If it is part of a pair, it follows Algorithm 3 and coordinates with a partner influencing agent. If it is not part of a pair, it follows Algorithm 1 and performs a 1-step lookahead search for the best individual behavior. Only one new variable and one new function are used in Algorithm 3 that are not used in Algorithm 1 or Algorithm 2. The variable is “nOrientsP”, which is a set used to hold the predicted next step orien- tation vectors of all the neighbors of the influencing agent’s partner, assuming the influencing agent adopts iaOrient and the influencing agent’s partner adopts iaOrientP. The function is neighbors.get(x), which returns the xth element in the set neighbors. Algorithm 3 is called on influencing agents that are part of a pair in selectedPairs at each time step. Algo- rithm 3 takes in the neighbors of the influencing agent and the neighbors of the partner of the influencing agent, and returns the orientation that, if adopted by the influencing agent, is guaranteed to influence the flock to face closer to θ∗ than any of the other numAngles influencing agent orientations considered for both the influencing agent and its partner. Conceptually, Algorithm 3 considers each of the nu- mAngles influencing agent orientations for the influenc- ing agent and for the influencing agent’s partner and performs two 1-step lookahead searches. The main difference between Algorithm 1 and Algorithm 3 is that the coordinated algorithm takes into account that another influencing agent is also influencing all of the agents that are both in the influencing agent’s neigh- borhood and in the influencing agent’s partner’s neigh- borhood. Hence, the influencing agent may choose to behave in a way that influences the other agents in its neighborhood closer to θ∗ while relying on its partner to more strongly influence the agents that exist in both of the paired influencing agents’ neighborhoods towards θ∗. Specifically, Algorithm 3 executes as follows. For each potential influencing agent orientation, the al- gorithm considers how each of the neighbors of the influencing agent will be influenced if the influenc- ing agent adopts that orientation (lines 6–16). Then Algorithm 3 considers how each of the neighbors of the influencing agent’s partner will be influenced if the influencing agent’s partner adopts each potential influencing agent partner orientation (lines 18–29). Finally, the algorithm selects the influencing agent orientation that results in the least difference between θ∗ and the current orientations of the neighbors of both the influencing agent and the influencing agent’s partner (lines 30–34). Note that agents that are neigh- bors of both the influencing agent and its partner are only counted once (lines 28–29). The complexity of Algorithm 3 can be calculated as follows. Line 3 executes numAngles times, line 4 executes numAngles times, line 6 executes at most numAgents times, line 8 executes at most numAgents, line 18 executes at most numAgents times, and line 20 executes at most numAgents. Hence, the complexity for Algorithm 3 is O(numAngles2 ∗numAgents2). Results for how Algorithm 3, as well as Algorithms 1 and 2, performed in our experiments can be found in the next section. 6. Experiments In this section we describe our experiments testing the three influencing agent behaviors presented in Sections 3, 4, and 5 against some baseline methods described in this section. Our original hypothesis was that Algorithms 1, 2, and 3 would all perform significantly better than the baseline methods. We also believed that Algorithms 2 and 3 would perform better than Algorithm 1. 6.1. Baseline Ad Hoc Agent Behaviors In this subsection we describe two behaviors which we use as comparison baselines for the lookahead and 22 vol. 56 no. 1/2016 Ad Hoc Teamwork Behaviors for Influencing a Flock Algorithm 3 bestOrient = 1StepCoordinated(neighOfIA, neighOfP) 1: bestOrient ← (0, 0) 2: bestDiff ←∞ 3: for each influencing agent orient iaOrient do 4: for each influencing agent orient iaOrientP do 5: nOrients ←∅ 6: for n ∈ neighOfIA do 7: nOrient ← (0, 0) 8: for n’ ∈ n.neighbors do 9: if n’ is the influencing agent then 10: nOrient ← nOrient + iaOrient 11: else if n’ is the influencing agent’s partner then 12: nOrient ← nOrient + iaOrientP 13: else 14: nOrient ← nOrient + n’.vel 15: nOrient ← nOrient|n.neighbors| 16: nOrients ← {nOrient}∪nOrients 17: nOrientsP ←∅ 18: for n ∈ neighOfP do 19: nOrient ← (0, 0) 20: for n’ ∈ n.neighbors do 21: if n’ is the influencing agent then 22: nOrient ← nOrient + iaOrient 23: else if n.neighbors.get(n’) is influencing agent’s partner then 24: nOrient ← nOrient + iaOrientP 25: else 26: nOrient ← nOrient + n’.vel 27: nOrient ← nOrient|n.neighbors| 28: if n 6∈ neighOfIA then 29: nOrientsP ← {nOrient}∪nOrientsP 30: diff ← the avg diff between vectors nOrients and θ∗ and between vectors nOrientsP and θ∗ 31: if diff < bestDiff then 32: bestDiff ← diff 33: bestOrient ← iaOrient 34: return bestOrient coordinated influencing agent behaviors presented in Sections 3, 4 and 5. 6.1.1. Face Desired Orientation Behavior When following this behavior, the influencing agents always orient towards θ∗. Note that under this be- havior the influencing agents do not consider their neighbors or anything about their environment when determining how to behave. This behavior is modeled after work by Jadbabaie, Lin, and Morse [6]. They show that a flock with a controllable agent will eventually converge to the controllable agent’s heading. Hence, the Face Desired Orientation influencing agent behavior is essentially the behavior described in their work, except that in our experiments we include multiple controllable agents facing θ∗. 6.1.2. Offset Momentum Behavior Under this behavior, each influencing agent calculates the vector sum V of the velocity vectors of its neigh- bors and then adopts an orientation along the vector V ′ such that the vector sum of V and V ′ points to- wards θ∗. See Figure 3 for an example calculation. In Figure 3, the velocity vectors of each neighbor are summed in the first line of calculations. In the second line of calculations, the vector sum of the influencing agent’s orientation and the results of the first line must equal θ∗, which in this example is pointing di- rectly south. From the equation on the second line of calculations, the new influencing agent orientation vector can be found by vector subtraction. This vec- tor is displayed and then scaled to maintain constant velocity on the third line of calculations. This influencing agent behavior was inspired by our previous work [9]. In this work, we showed how to optimally orient a stationary agent to a desired orientation using a set of stationary influencing agents. In particular, we presented an algorithm which the influencing agents could utilize to orient the agent to the desired orientation in the least number of steps possible. Our Offset Momentum influencing agent behavior implements this algorithm. However, this algorithm assumes that the agent is only influenced 23 Katie Genter, Peter Stone Acta Polytechnica Orientation Ad Hoc Orientation Ad Hoc =+ =+ = + ≈ Figure 3. An example of how the Offset Momentum influencing agent behavior works. The influencing agent is the black dot, the circle represents the in- fluencing agent’s neighborhood, and the three arrows inside the circle represent the influencing agent’s neigh- bors. by influencing agents within its neighborhood. Hence, it is not optimal in our experimental setting because each agent being influenced by an influencing agent is usually also being influenced by other agents. 6.2. Experimental Setup We utilize the MASON simulator [8] for our experi- ments in this paper. The MASON simulator was in- troduced in Section 2.1, but in this section we present the details of the environment that are important for completely understanding our experimental setup. We use the default simulator setting of 150 units for the height and width of our experimental domain. Likewise, we use the default setting in which each agent moves 0.7 units during each time step. The number of agents in our simulation (num- Agents) is 200, meaning that there are 200 agents in our flock. 10 % of the flock, or 20 agents, are influ- encing agents. The neighborhood for each agent is 20 units in diameter. numAgents and the neighborhood size were both default values for MASON. We chose for 10 % of the flock to be influencing agents as a trade- off between providing enough influencing agents to influence the flock and keeping the influencing agents few enough to require intelligent behavior in order to influence the flock effectively. We only consider numAngles discrete angle choices for each influencing agent. In all of our experiments, numAngles is 50, meaning that the unit circle is equally divided into 50 segments beginning at 0 radians and each of these orientations is considered as a possible orientation for each influencing agent. numAngles=50 was chosen after some experimentation using the 1- step lookahead algorithm, in which numAngles=20 resulted in a higher average number of steps for the Algorithm Time Steps Face Desired Orientation 34.82±3.85 Offset Momentum 36.70±4.63 1-Step Lookahead 26.02±3.10 2-Step Lookahead 25.94±3.16 Coordinated 25.76±3.15 Table 3. The number of time steps required for the flock to converge to θ∗ using the experimental setup described in Section 2.1. We show the 95 % confidence intervals. flock to converge to θ∗ and numAngles=100 and nu- mAngles=150 did not require significantly fewer steps for convergence than numAngles=50. In all of our experiments, we run 50 trials for each experimental setting. We use the same 50 random seeds to determine the starting positions and orienta- tions of both the flocking agents and the influencing agents for each set of experiments for the purpose of variance reduction. 6.3. Experimental Results Table 3 shows the number of time steps needed for the flock to converge to θ∗ for the two baseline al- gorithms, the 1-step lookahead algorithm presented in Algorithm 1, the 2-step lookahead algorithm pre- sented in Algorithm 2, and the coordinated algorithm presented in Algorithm 3 using the experimental setup described in Section 2.1. The results shown in Table 3 clearly show that the 1-Step Lookahead Behavior, the 2-Step Lookahead Behavior, and the Coordinated Behavior all perform significantly better than the two baseline methods. However, these results did not show the 2-Step Looka- head Behavior and the Coordinated Behavior perform- ing significantly better than the 1-Step Lookahead Behavior as we had expected. Hence, we present ad- ditional experimental results below in which we alter the percentage of the flock that are influencing agents and the number of agents in the flock (numAgents) one by one to further investigate the dynamics of this domain. 6.3.1. Altering the Composition of the Flock Now we consider the effect of decreasing the percent- age of influencing agents in the flock to 5 % as well as increasing the percentage of influencing agents in the flock to 20 %. In both cases, the remainder of the experimental setup is as described in Section 2.1. Al- tering the percentage of influencing agents in the flock clearly alters the amount of agents we can control, which affects the amount of influence we can exert over the flock. Hence, as can be seen in Figure 4, flocks with higher percentages of influencing agents will, on average, converge to θ∗ in a smaller number of time steps than flocks with lower percentages of influencing agents. 24 vol. 56 no. 1/2016 Ad Hoc Teamwork Behaviors for Influencing a Flock Figure 4. Results from experiments using the exper- imental setup described in Section 2.1, except that we varied the percentage of influencing agents in the flock. The values in the table are averaged over 50 trials and the error bars represent the 95 % confidence interval. 6.3.2. Altering the Size of the Flock In this section we evaluate the effect of changing the size of the flock while keeping the rest of the experi- mental setup as presented in Section 2.1. Changing the flock size will alter the number of influencing agents, but not the ratio of influencing agents to non influencing agents. We expected that increasing the flock size would lead to the Coordinated Behavior per- forming better comparatively, as, with a larger flock, more agents are likely to be in multiple influencing agents’ neighborhoods at any given time. However, the coordinated behavior did not perform significantly differently than the lookahead behaviors, and actually performed slightly worse in the experiment with a larger flock size. The results of our experiments in altering the flock size can be seen in Figure 5. The difference between the 1-Step Lookahead Be- havior, the 2-Step Lookahead Behavior, and the Co- ordinated Behavior versus the baseline behaviors was not significant in the experiment utilizing a smaller flock. This may have been caused by the agents being more sparse in the environment, and hence having less of an effect on each other. 6.4. Discussion Our hypothesis was that Algorithms 1, 2, and 3 would all perform significantly better than the baseline meth- ods. This was indeed the case in all of our experiments except when the flock size was decreased from 200 agents to 100 agents. Apparently having 100 agents in a 150 by 150 unit environment resulted in the agents being too distributed for our lookahead and coordinated behaviors to be effective. Our original research question, which was to deter- mine how influencing agents should behave so as to orient the rest of the flock towards a target heading as quickly as possible, was partially answered by this work. Although it is possible that better algorithms could be designed, given the algorithms and the ex- perimental setting presented in this paper, we found that it is best for influencing agents to perform the Figure 5. Results from experiments using the experi- mental setup described in Section 2.1, except that we varied the number of agents in the flock. The values in the table are averaged over 50 trials and the error bars represent the 95 % confidence interval. 1-step lookahead behavior presented in Algorithm 1. This behavior is more computationally efficient than the other two algorithms presented, and performed significantly better than the baseline methods in most cases. In many cases, the coordinated behavior and the 1-step lookahead behavior led the flock to converge to θ∗ in the same number of time steps. This is because the behaviors were identical when no agents were in the neighborhoods of two paired influencing agents at the same time. Additionally, even when a pair of influencing agents shared one or more neighbors, these influencing agents often behaved similarly, and hence did not exert significantly different types of influence. There are, of course, cases in which each of the lookahead and coordinated behaviors performs notice- ably better than the others. For example, when the flock size is decreased to 100, the 2-step lookahead only takes 44 time steps to converge to θ∗ when a particular random seed (93) is used in the simula- tor, but the 1-step lookahead takes 67 steps and the coordinated approach takes 61 steps. 7. Related Work Although there has been a significant amount of work in the field of multiagent teamwork, there has been relatively little work towards getting agents to col- laborate with teammates that cannot be explicitly controlled. Most prior multiagent teamwork research requires explicit coordination protocols or commu- nication protocols (e.g. SharedPlans, STEAM, and GPGP) [10–12]. However, in our work we do not assume that any protocol is known by all agents. Han, Li and Guo studied how one agent can influ- ence the direction in which an entire flock of agents is moving [5]. Similarly to our work, in their work each agent follows a simple control rule based on its neigh- bors. However, unlike our work, they only consider one influencing agent with unlimited, non-constant velocity. This allows their influencing agent to move to any position in the environment within one time step, which we believe is unrealistic. 25 Katie Genter, Peter Stone Acta Polytechnica As we mention in Section 2, Reynolds introduced the original flocking model [2]. However, his work focused on creating graphical models that looked and behaved like real flocks, and hence he did not address adding controllable agents to the flock, as we do. Vicsek et al. considered just the alignment aspect (also called flock centering) of Reynolds’ model [1]. Hence, like in our work, they use a model where all of the particles move at a constant velocity and adopt the average direction of the particles in their neighborhood. However, like Reynolds’ work, they were only concerned with simulating flock behavior and not with adding controllable agents to the flock. Jadbabaie, Lin, and Morse build on Vicsek et al.’s work [6]. They use a simpler direction update than Vicsek et al. and they show that a flock with a control- lable agent will eventually converge to the controllable agent’s heading. Like us, they show that a control- lable agent can be used to influence the behavior of the other agents in a flock. However, they are only concerned with getting the flock to converge even- tually, whereas we attempt to do so as quickly as possible. Su, Wang, and Lin also present work that is concerned with using a controllable agent to make the flock converge eventually [7]. 8. Conclusion In this work, we have set out to determine how influ- encing agents should behave in order to orient a flock towards a target heading as quickly as possible. Our work is situated in a limited ad hoc teamwork domain, so although we have knowledge of the behavior of the flock, we are only able to influence them indirectly via the behavior of the influencing agents within the flock. This paper introduces three algorithms that the influencing agents can use to influence the flock — a greedy lookahead behavior, a deeper lookahead behavior, and a coordinated behavior. We ran exten- sive experiments using these algorithms in a simulated flocking domain, where we observed that in such a setting, a greedy lookahead behavior is an effective behavior for the influencing agents to adopt. Although we have begun to consider coordinated al- gorithms in this work, there is room for more extensive coordination as well as different types of coordination. Additionally, as this work focused on a limited version of Reynolds’ flocking model, a promising direction for future work is to extend the algorithms presented in this work to Reynolds’ complete flocking model. Finally, it would be interesting to empirically consider the effect of influencing agent placement. Acknowledgements This work has been carried out in the Learning Agents Research Group (LARG) at the Artificial Intelligence Lab- oratory, The University of Texas at Austin. LARG re- search is supported in part by grants from the National Science Foundation (CNS-1330072, CNS-1305287), ONR (21C184-01), AFRL (FA8750-14-1-0070), AFOSR (FA9550- 14-1-0087), and from Yujin Robot. References [1] T. Vicsek, A. Czirok, E. Ben-Jacob, et al. Novel type of phase transition in a system of self-driven particles. Physical Review Letters 75(6), 1995. doi:10.1103/PhysRevLett.75.1226. [2] C. W. Reynolds. Flocks, herds and schools: A distributed behavioral model. SIGGRAPH 21:25–34, 1987. doi:10.1145/37401.37406. [3] W. Bialeka, A. Cavagnab, I. Giardinab, et al. Statistical mechanics for natural flocks of birds. Proceedings of the National Academy of Sciences 109(11), 2012. doi:10.1073/pnas.1118633109. [4] H. H. Charlotte K. Hemelrijk. Some causes of the variable shape of flocks of birds. PLoS ONE 6(8), 2011. doi:10.1371/journal.pone.0022479. [5] J. Han, M. Li, L. Guo. Soft control on collective behavior of a group of autonomous agents by a shill agent. Journal of Systems Science and Complexity 19(1):54–62, 2006. doi:10.1007/s11424-006-0054-z. [6] A. Jadbabaie, J. Lin, A. Morse. Coordination of groups of mobile autonomous agents using nearest neighbor rules. IEEE Transactions on Automatic Control 48(6):988–1001, 2003. doi:10.1109/TAC.2003.812781. [7] H. Su, X. Wang, Z. Lin. Flocking of multi-agents with a virtual leader. IEEE Transactions on Automatic Control 54(2):293–307, 2009. doi:10.1109/TAC.2008.2010897. [8] S. Luke, C. Cioffi-Revilla, L. Panait, et al. Mason: A multi-agent simulation environment. Simulation: Transactions of the Society for Modeling and Simulation International 81(7):517–527, 2005. doi:10.1177/0037549705058073. [9] K. Genter, N. Agmon, P. Stone. Ad hoc teamwork for leading a flock. In Proceedings of the 12th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2013). 2013. [10] B. J. Grosz, S. Kraus. Collaborative plans for complex group action. Artifical Intelligence Journal 86(2):269– 357, 1996. doi:10.1016/0004-3702(95)00103-4. [11] M. Tambe. Towards flexible teamwork. Journal of Artificial Intelligence Research 7:83–124, 1997. [12] K. S. Decker, V. R. Lesser. Readings in agents. chap. Designing a family of coordination algorithms, pp. 450–457. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1998. 26 http://dx.doi.org/10.1103/PhysRevLett.75.1226 http://dx.doi.org/10.1145/37401.37406 http://dx.doi.org/10.1073/pnas.1118633109 http://dx.doi.org/10.1371/journal.pone.0022479 http://dx.doi.org/10.1007/s11424-006-0054-z http://dx.doi.org/10.1109/TAC.2003.812781 http://dx.doi.org/10.1109/TAC.2008.2010897 http://dx.doi.org/10.1177/0037549705058073 http://dx.doi.org/10.1016/0004-3702(95)00103-4 Acta Polytechnica 56(1):18–26, 2016 1 Introduction 2 Problem Definition 2.1 Simulation Environment 3 1-Step Lookahead Behavior 4 2-Step Lookahead Behavior 5 Coordinated Behavior 6 Experiments 6.1 Baseline Ad Hoc Agent Behaviors 6.1.1 Face Desired Orientation Behavior 6.1.2 Offset Momentum Behavior 6.2 Experimental Setup 6.3 Experimental Results 6.3.1 Altering the Composition of the Flock 6.3.2 Altering the Size of the Flock 6.4 Discussion 7 Related Work 8 Conclusion Acknowledgements References