Acta Polytechnica doi:10.14311/AP.2015.55.0154 Acta Polytechnica 55(3):154–161, 2015 © Czech Technical University in Prague, 2015 available online at http://ojs.cvut.cz/ojs/index.php/ap COMPARISON OF CLASSICAL AND INTERACTIVE MULTI-ROBOT EXPLORATION STRATEGIES IN POPULATED ENVIRONMENTS Nassim Kaldea, b, c, ∗, Olivier Simonind, François Charpilleta, b, c a Inria, Villers-lès-Nancy, 54600, France b Université de Lorraine, LORIA, UMR 7503, Vandoeuvre-lès-Nancy, 54506, France c CNRS, LORIA, UMR 7503, Vandoeuvre-lès-Nancy, 54506, France d CITI-Inria Laboratory, INSA Lyon, 69621 Villeurbanne, France ∗ corresponding author: nassim.kalde@loria.fr Abstract. Multi-robot exploration consists in coordinating robots for mapping an unknown environment. It raises several issues concerning task allocation, robot control, path planning and communication. We study exploration in populated environments, in which pedestrian flows can severely impact performances. However, humans have adaptive skills for taking advantage of these flows while moving. Therefore, in order to exploit these human abilities, we propose a novel exploration strategy that explicitly allows for human-robot interactions. Our model for exploration in populated environments combines the classical frontier-based strategy with our interactive approach. We implement interactions where robots can locally choose a human guide to follow and define a parametric heuristic to balance interaction and frontier assignments. Finally, we evaluate to which extent human presence impacts our exploration model in terms of coverage ratio, travelled distance and elapsed time to completion. Keywords: artificial intelligence, multi-agent system, robotic exploration, human-robot interaction. 1. Introduction Mobile robots intervene in our daily life and pro- vide services (e.g., guidance, assistance [1, 2]) or even leisure activities (e.g., providing company, dancing [3, 4]). This intrusion of mobile robots into citizens’ day-to-day lives must take people into account, whilst seeking social compliance. Human activities and mo- tion patterns are already studied [5], so that a robot can learn a model of the human behaviour to generate a socially compliant control and apply it. For exam- ple, by observing pedestrians walking nearby, a robot could model the pedestrian dynamics and generate its own navigation control for efficiently navigating in populated environments. Multi-Robot Exploration (MRE) consists in recon- structing all the reachable space of an unknown en- vironment by controlling mobile robots. Introducing human presence awareness into a robotic exploration system for populated environments can constitute an interesting route for study purposes. Indeed, it paves the way for human-robot interaction (HRI)-based ex- ploration approaches. In populated environments, the exploration task raises new concerns regarding clean reconstruction and efficient robot coordination. Concerning reconstruction quality, it is particularly difficult to separate static aspects (background) from dynamic aspects (people, robots) of the scene [6]. Ob- viously, mobile robot perceptions are biased due to the dynamics of the environment, thus hindering lo- calisation and mapping. Regarding the selection of targets to explore, pedestrian movements create spatio- temporal reachability of known/unknown areas mak- ing exploration tricky. In fact, the reachable space evolves dynamically according to the density of the human presence. Nevertheless, humans can understand the dynamics of their environment, and can sense, decide and act adequately. In this sense, we can assume that every person has an adaptive heuristic, depending on the lo- cal environment, that allows him or her to walk readily through dense areas (e.g., crowds). We are interested in exploiting human skills as possible heuristics for the exploration task. We propose a weighted heuristic that incorporates human presence for selecting areas to explore or for initiating human-robot interactions. This is followed by a brief state of the art in MRE, and we also situate our approach among HRI appli- cations in mobile robotics. In the third section, we formalise the multi-agent system for exploration in populated environments, and present the framework of our study. The fourth section defines the mixed exploration approach (robot-frontier/interaction) and proposes a human-aware exploration heuristic for es- tablishing human following interactions. We then perform several experiments of our mixed approach in simulation, to underline the variability of the per- formance depending on the environment. Finally, we discuss our results and perspectives regarding machine learning for adaptive heuristic parameterisation. 154 http://dx.doi.org/10.14311/AP.2015.55.0154 http://ojs.cvut.cz/ojs/index.php/ap vol. 55 no. 3/2015 Comparison of Classical and Interactive Multi-Robot Exploration Figure 1. Multi-Agent System simulated in V-REP [19]. 2. Related Work First, this section presents previous work in the field of MRE. Then, we situate our study among mobile robotic applications of HRI. 2.1. Multi-Robot Exploration The MRE problem consists in acquiring an accurate representation of an environment by efficiently coor- dinating the actions of robots within it. Representa- tion accuracy refers to the degree of closeness to the ground truth. Coordination of the robots arises from the teamwork involved in solving the task. Coordina- tion efficiency can be evaluated at several levels, e.g., energy consumption, trajectory overlapping, etc. Thus, MRE solutions design efficient control of robots for accurately completing a chosen representa- tion of the environment (e.g., graph). The proposed solutions can be roughly classified into reactive, goal- based and utility-based agent design. Within reactive and bio-inspired approaches, the actions of an agent are hardwired to its perceptions, and simple navigation rules can be created, e.g., as following walls, circular or boustrophedon patterns [7–10]. These approaches are usually only concerned with coverage. They do not always consider mapping, as typical reactive agents are memory-less. For goal-based agents, frontier-based methods offer a good computation/performance tradeoff, making them particularly suitable for deployment in real em- bedded systems. The idea is to incrementally assign robots to frontiers (separating the known and free space from the unknown space), thus serializing the exploration task into subgoals. Various frontiers eval- uation and target selection methods are discussed in the literature [11–13]. In utility-based approaches, an agent makes deci- sions according to the value of world states. The information gain value is proposed in [14], and [15] considers curiosity, surprise and hunger as motiva- tional agents. In our present study, we consider goal-based agents with frontiers to reach and humans to interact with as subgoals during exploration. We are looking for a parametric heuristic that evaluates/balances frontiers and interactions for exploiting human adaptive skills. 2.2. Human-Robot Interaction HRI is defined as the study of humans, robots and their mutual influences [16]. To the best of our knowl- edge, the office-conversant robot Jijo-2 is the only HRI application of mobile robotic exploration. This robot exhibits socially embedded learning mechanisms [17] by gathering information while conversing with other people. Thus, it realises semi-supervised learning by incorporating local oracle heuristics while exploring. We present an application in mobile robotics con- sidering close interactions established by proximity or direct perception between humans and robots. This type of interactions belongs to the Intimate Interaction class defined by Takeda into his HRI classification [18]. Our study bridges together intimate HRI applica- tions and MRE goal-based algorithms. 3. Multi-Agent System Formalisation 3.1. Modelling the Environment and the Agents We propose a model for representing the multi-agent system for populated environments (Fig. 1). In (1), the environment to explore is described by a navigation map E, which evolves over time. This evolution results from the actions of the agents (hu- mans H and robots R). At each time t, a robot Ri from R has a configuration from which it observes the environment. Oti is an observed subset of E, it corresponds to the robot’s observation at time t. Formally, let: E be a navigation map, R = {R1, . . . ,Rn} be a set of robots, H = {H1, . . . ,Hm} be a set of humans, Oi ⊂ E be the observation of Ri. (1) 3.2. Exploration and Completion For the exploration task, we must represent the en- vironment explored by the robots over time (2). Let θ0:ti be the set of observations, namely the local t-time history, that agent Ri has experienced up to time t. Similarly, Θ0:t is the global t-time history, which aggregates the local t-time histories from R. 155 N. Kalde, O. Simonin, F. Charpillet Acta Polytechnica ? ! ? unknown occupied free ! animated Figure 2. Cell states transition diagram. 4 ! 1 2 3 4 3 ! 1 2 3 4 2 1 2 3 4 1 1 2 3 4 w l (a) E, R1, H1 4 R1 1 2 3 ? 4 3 H1 1 2 3 ? 4 2 ? 1 2 ? 3 ? 4 1 ? 1 ? 2 ? 3 ? 4 w l (b) θ0:11 = O11 Figure 3. Full grid and robot observation at t = 1. Thus, we have: θ0:ti = θ 0:t−1 i ∪ O t i, Θ0:t = n⋃ i=1 θ0:ti . (2) It is of fundamental importance for the robots to know when the exploration is finished. The completion criterion determines this moment and can be defined locally on each robot. Robots determine exploration completion based upon already explored space Θ. The mission is over as soon as there is no configuration in the already explored space that allows for new observations. 3.3. Instantiating the Multi-Agent System We represent E as a discrete occupancy grid of l×w square cells. Each cell has 4 possible states: the unknown (not observed), occupied (walls, objects), animated (humans, robots) and free (empty) states. States transitions are illustrated in Fig. 2. In this grid representation, R becomes the set of cells animated by the robots and Ri describes the position of one robot on the grid. The observation area of each robot is within a limited circle. An environment, a robot and a human are repre- sented in Fig. 3a. The robot is located on cell R1 at (1, 1) and the human on cell H1 at (1, 2). The maxi- mum field of view of R1 is within the dashed arc in Fig. 3b. O11 consists of 7 cells: 3 are free, 2 are occu- pied and 2 are animated. The explored environment θ0:11 is limited to this first observation. We have provided an instance of a multi-agent sys- tem for exploration in populated environments. The environment is represented with a discrete occupancy grid, agents are characterised by their identifier and their coordinates on the grid, and observations are cRiTj T R CRT aRiTj T R ART opt. Figure 4. Multi-Robot Exploration as a Task Allo- cation Problem. made by casting rays within the viewing range of a robot. Our study is based on this representation of a multi-agent system. In the next section, we present the frontier/interaction exploration approach. 4. Mixed Exploration Approach by Frontiers and Interactions First, let us consider the MRE problem defined as a target allocation problem of robots in an unknown environment [12–14]. A solution to the MRE problem defines a way to explore an unknown space, i.e., how to assign robots from R to tasks/targets from T. To achieve this, we can look for an assignment matrix ART that optimises the cost matrix CRT (cf. Fig. 4). 4.1. Various Approaches for Multi-Robot Exploration We show how different sets of targets define the classi- cal frontier-based exploration, our new interactive ap- proach and the mixed approach (frontier/interaction). 4.1.1. Frontier-Based Exploration A frontier is the observed boundary between an ex- plored space and an unexplored space [11]. Classical frontier-based exploration is defined by choosing the targets from the set of frontiers F (3). Let: cRiFj be the cost for Ri to reach Fj, aRiFj = { 1 if Ri must go to Fj, 0 otherwise. (3) In populated environments this approach can fail when the path to a chosen frontier is congested by humans. 4.1.2. Interactive Exploration Human-robot interaction is defined as the reciprocal influence between a human and a robot, followed by one or more effects. We introduce an interactive ap- proach that takes into account the presence of humans for establishing human-robot interactions (opening a door, guiding though a crowd, etc.). Targets are now chosen from the set of humans H (4). Let: cRiHj be the cost for Ri to interact with Hj, aRiHj = { 1 if Ri must interact with Hj, 0 otherwise. (4) A purely interactive approach can be inefficient in sparsely populated environments. Indeed, without any perception of human presence, the robots adopt a wait-and-see policy and pause the exploration. 156 vol. 55 no. 3/2015 Comparison of Classical and Interactive Multi-Robot Exploration R F H d R i F j d R i H k (a) Distances. R F H (σ ) p R i F j (1 − σ) p R i H k (b) Penalties. Figure 5. Distances and penalties considered in the system. 4.1.3. Mixed Exploration Mixed exploration enables to initiate interactions and also to reach frontiers. Thus, we combine the two target sets (frontiers and humans) to define a new set G (5). Let: cRiGj be the mixed cost for Ri to Gj, aRiGj = { 1 if Ri is assigned to Gj, 0 otherwise. (5) This approach requires to smartly adjust interac- tion and frontier assignments to overcome the two above-mentioned issues (wait-and-see policy and the congested frontier). 4.2. Mixed Cost Model In this study, robots can interact only by following pedestrians. The optimisation criterion is to explore a possibly populated environment with minimum dis- tance and time. Thus, we define mixed costs using distances and weighted penalties, see Fig. 5. The weight σ balances interaction and frontier penalties. First we detail distances, then we introduce penal- ties and explain the different weights used in the cost formula. 4.2.1. Distance First, we incorporate distances between robots and tar- gets as immediate costs (Fig. 5a). Thus, we initialise CRG with normalised robot-frontier and robot-human distances (DRF, DRH) in (6). DRG = (DRF | DRH), DRX = dR1X1 dR1X|X| dRnX1 dRnX|X|     . (6) Distance costs have multiple drawbacks. Two exam- ples follow: If a robot travels towards a frontier but a crowd hinders its navigation: the robot cannot adapt the exploration depending on navigation feasibility. Re- mote but reachable frontiers are not reevaluated as good options. The distance cost is prohibitive and the next target is always chosen between the last frontier and close humans who are nearby. A solution is to CRG α · DRG + (1 −α) · (PRF | 0) α · DRG + (1 −α) · (0 | PRH) P R G D R G α σ Figure 6. CRG according to α and σ. use a planned distance, which is set to infinity when a target is momentarily unreachable. If a robot follows a pedestrian walking nearby but the person stops to discuss with other people: the robot cannot decide either to maintain or stop the current interaction depending on the human activity. Due to distances, the robot will resume exploration only if one person moves again. This also causes a growing unease for the people. A solution is to make an a priori evaluation of an interaction and to update the evaluation of the interaction a posteriori, while it is taking place. 4.2.2. Penalty We tackle these two drawbacks with a heuristic that associates penalties to each robot-frontier/human pair (Fig. 5b). A penalty pRiXj is defined as the sum of a time penalty and an orientation penalty. The time penalty tRiXj is the time elapsed since a frontier discovery or a human remains idle. The orientation penalty oRiXj is the smallest unsigned angle between the orientation of a robot and the orientation of a frontier/human (a frontier is oriented towards the unknown). Thus, we define PRG with normalised robot-frontier and robot-human penalties (PRF, PRH) in (7). PRG = ( σ · PRF ∣∣ (1 −σ) · PRH), σ ∈ [0, 1], PRX = pR1X1 pR1X|X| pRnX1 pRnX|X|     , pRiXj = tRiXj + oRiXj . (7) Parameter σ sets more or less weight on the fron- tier penalties or on the interaction penalties. When this parameter is high, it increases the frontier costs and decreases the interaction costs. This results in favouring interactions over frontiers. 4.2.3. Distance and Penalty The mixed cost matrix CRG which incorporates dis- tances DRG and penalties PRG is represented in (8). CRG = α · DRG + (1 −α) · PRG, α ∈ [0, 1]. (8) 157 N. Kalde, O. Simonin, F. Charpillet Acta Polytechnica (a) empty (100 m2) (b) cave (144 m2) (c) structured (242 m2) Figure 7. Environments. 0.9 1 1.1 1.2 e m p ty coverage (%) 25 30 35 40 45 distance (m) 50 60 70 80 90 time (s) 0.82 0.84 0.86 0.88 0.9 0.92 u n s t r u c t u r e d 40 60 80 100 120 140 100 150 200 250 0 0.25 0.5 0.75 1 0.6 0.65 0.7 0.75 0.8 α s t r u c t u r e d 0 0.25 0.5 0.75 1 60 80 100 120 α 0 0.25 0.5 0.75 1 100 150 200 α Figure 8. Local Greedy not dense: empty (top); unstructured (middle); structured (bottom). Parameter α modulates the immediate distance cost and the information coming from the penalty heuristic. When α is high (resp. low), the importance of the penalties is reduced (resp. increased). Distances and penalties are counterbalanced with α, while σ sets more or less focus on frontiers or on interactions. We present the influence of α and σ on the cost formula in (Fig. 6). The values range from 0 to 1 for each parameter and the formula written on each side is obtained when one parameter is set to its extreme value. We have adopted a mixed approach, and have de- fined a parametric cost matrix based on a penalty heuristic. Now, we evaluate the exploration perfor- mance of this heuristic for two greedy optimisation methods, assuming different values of α and σ. 5. Experimental Framework We use the V-REP robotic simulator for our experi- ments [19]. The environment is discretised with 0.5 m square cells. The robots share their exploration map, so the frontiers that are discovered are known by every robot. Contiguous frontier cells are grouped together into a frontier area. Inside a frontier area, the targeted cell minimises the sum of distances to the other cells. Assignments are locally computed by each robot. To optimise its assignment, it takes into account the entire set of frontiers known until now, but only the robots and pedestrians perceived locally (within a 2 m radius). Planning is done using a potential field propagated on the grid. 5.1. Protocol The parameters are as follows: • Map: Three environments are considered. The first contains no obstacle (empty), the second has unstructured obstacles (unstructured) and the third environment is composed of a corridor and three rooms (structured). Maps are shown in (Fig. 7). 158 vol. 55 no. 3/2015 Comparison of Classical and Interactive Multi-Robot Exploration 0.9 1 1.1 1.2 e m p ty coverage (%) 24 26 28 30 32 34 distance (m) 50 55 60 65 70 time (s) 0.8 0.9 1 u n s t r u c t u r e d 40 60 80 100 120 100 150 200 0 0.25 0.5 0.75 1 0.66 0.68 0.7 α s t r u c t u r e d 0 0.25 0.5 0.75 1 40 60 80 100 120 140 160 α 0 0.25 0.5 0.75 1 100 150 200 250 α Figure 9. Group Greedy not dense: empty (top); unstructured (middle); structured (bottom). • Population density: Environments are human pop- ulated with 0 or 30% of occupation. Each human agent moves in a straight line and avoid obstacles by stopping and rotating. • Number of robots: Two explorers are used for each experiment, they are represented as cylinders. • Optimisation method: We use two different cost optimisation methods: The first one is a local greedy method, where each robot chooses the minimum cost target among only its own possible targets as in [11] for distances. The second one is a group greedy method, where for all locally visible robots at each time step, the robot-target assignment with minimum cost is re- cursively discarded until the local robot is assigned. • Modulators: α and σ are discretised from 0 to 1 with a step of 0.25. 5.2. Metrics Each scenario is evaluated with classical MRE metrics: coverage, distance and time. In addition, we use a common metric in HRI, called the Robotic Attention Demand (RAD), which measures the autonomy of a robot during its task [20, 21]. Here we consider the number of interactions initiated during exploration. 5.3. Results First, let us consider environments without humans. We study the influence of α by fixing σ to 1. This al- lows to only adjust the distance and frontiers penalty. This is legitimate, since no human implies no inter- action penalty. The performances averaged over 10 runs are plotted in Fig. 8 for local greedy, and Fig. 9 for group greedy. For local greedy in Fig. 8, regarding the empty and structured maps, we distinguish two steps. In the first step, distance and time increase, and in the second step they both decrease until α = 1. The unstructured map for local greedy and all maps for group greedy (Fig. 9) present only one step where distance and time decrease when increasing α. In these cases, when α is high, penalties fade and robots do less round trips between remote frontiers in the scene. Thus, in non- populated environments, our heuristic does not give better performances. Now we consider the presence of pedestrians. The maps are populated at 30 % up to 1 human/m2, thus enabling human-robot interactions. Figs. 10 and 11 give the mean performances of local greedy and group greedy, respectively for 10 runs of each (α,σ) combina- tion. When σ increases, the penalty of the interactions is reduced, favouring interactions to the detriment of frontiers. For the empty case (Fig. 10), full coverage with the shortest distance and time are at (α,σ) = (0, 0). Only penalties are used, and frontiers are preferred over interactions. No interaction was initiated (RAD), but an average of 28 frontiers were assigned. In the unstructured case, the best average performance is at (0.75,0). Distances are overweighted compared to penalties, and interactions are heavily penalised. Nevertheless, an average of 18 interactions (RAD) were initiated against 26 frontiers assignments. In the structured environment, the best performances 159 N. Kalde, O. Simonin, F. Charpillet Acta Polytechnica 0.96 0.98 1 e m p ty coverage (%) 50 100 distance (m) 100 200 time (s) 0.8 0.85 u n s t r u c t u r e d 50 100 100 200 300 0 0.25 0.5 0.75 1 0 0.25 0.5 0.75 1 0.5 0.6 0.7 α σ s t r u c t u r e d 0 0.25 0.5 0.75 1 0 0.25 0.5 0.75 1 100 α σ 0 0.25 0.5 0.75 1 0 0.25 0.5 0.75 1 200 300 α σ Figure 10. Local Greedy dense: empty (top); unstructured (middle); structured (bottom). and lowest standard deviations are at (0.5,0). Dis- tances and penalties are equally weighted, and again interactions are penalised. An average of 31 frontier assignments and 18 interaction assignments took place. Here, interactions are interesting, because a robot can discover the corridor by following someone. For group greedy (Fig. 11), the best performance is located at (0.25, 0) for the empty scene. Penalties have more weight than distances; frontiers are pre- ferred over interactions. An average of 16 frontier assignments and 7 interaction assignments is noticed. The unstructured environment has maximum cover- age, with minimum travelled distance and time at (0.5,0). The average number of frontiers assigned is 8 times the number of interactions (only 4). For the last map, the best average performance is at (0.25,0) for a frontier/interaction ratio of 29/4. With these new results, the distance does not suffice for choosing the best targets with human presence. Instead, a smart equilibrium with our penalty heuristic always gives the best performance (α 6= 1). Here with (σ = 0), the frontiers were chosen considering only distances, but interactions were chosen carefully by adding heavy penalties. Thus our heuristic is already sufficient for selecting interactions only if necessary, but also it is not yet able to promote them. 6. Conclusion In this paper, we have defined the interaction-based exploration by targeting the humans perceived by the robots. Interactive exploration paves the way for exploiting human natural heuristics, for a better understanding of the dynamics of a populated envi- ronment. The mixed approach, based upon frontier and interactive exploration, aims at bringing out the best of both approaches. For this purpose, we de- signed a parametric heuristic to equilibrate frontiers and interactions (pedestrian following) assignments. This heuristic considers penalties for the idle state of the targets (frontier, human) and their orientation. We have shown in simulation that, in some cases, incorporating an interactive aspect into exploration can be beneficial, even with this simplistic heuristic. To enable efficient dynamic exploration, it is therefore paramount to discover these particular cases. In this sense, machine learning and online tuning of weights might be of interest for achieving a robotic heuristic adaptation. This work opens up prospects for ex- ploiting human adaptiveness in robotic exploration of populated environments. References [1] W. Burgard, A. B. Cremers, D. Fox, et al. Experiences with an interactive museum tour-guide robot. Artificial intelligence 114(1):3–55, 1999. [2] K. Kosuge, Y. Hirata. Human-robot interaction. In Proceedings of the IEEE International Conference on Robotics and Biomimetics, pp. 8–11. 2004. [3] R. C. Arkin, M. Fujita, T. Takagi, R. Hasegawa. An ethological and emotional basis for human-robot interaction. Robotics and Autonomous Systems 42(3):191–201, 2003. [4] M. P. Michalowski, S. Sabanovic, H. Kozima. A dancing robot for rhythmic social interaction. In Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction. 2007. 160 vol. 55 no. 3/2015 Comparison of Classical and Interactive Multi-Robot Exploration 0.98 0.99 1 e m p ty coverage (%) 50 100 distance (m) 100 200 time (s) 0.8 0.85 u n s t r u c t u r e d 50 100 100 200 300 0 0.25 0.5 0.75 1 0 0.25 0.5 0.75 1 0.5 0.6 0.7 α σ s t r u c t u r e d 0 0.25 0.5 0.75 1 0 0.25 0.5 0.75 1 50 100 α σ 0 0.25 0.5 0.75 1 0 0.25 0.5 0.75 1 200 300 α σ Figure 11. Group Greedy dense: empty (top); unstructured (middle); structured (bottom). [5] M. Bennewitz, W. Burgard, G. Cielniak, S. Thrun. Learning motion patterns of people for compliant robot motion. The International Journal of Robotics Research 24(1):31–48, 2005. [6] A. Dubois, A. Dib, F. Charpillet. Using hmms for discriminating mobile from static objects in a 3d occupancy grid. In Tools with Artificial Intelligence (ICTAI), 2011 23rd IEEE International Conference on. 2011. [7] D. Baronov, J. Baillieul. Reactive exploration through following isolines in a potential field. In Proceedings of the American Control Conference. 2007. [8] R. Morlok, M. Gini. Dispersing robots in an unknown environment. In Distributed Autonomous Robotic Systems, pp. 253–262. 2007. [9] M. Andries, F. Charpillet. Multi-robot exploration of unknown environments with identification of exploration completion and post-exploration rendez-vous using ant algorithms. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. 2013. [10] E. Ferranti, N. Trigoni, M. Levene. Brick&Mortar: an on-line multi-agent exploration algorithm. In Proceedings of the IEEE International Conference on Robotics and Automation. 2007. [11] B. Yamauchi. A frontier-based approach for autonomous exploration. In Proceedings of the IEEE International Symposium on Computational Intelligence in Robotics and Automation. 1997. [12] J. Faigl, M. Kulich, L. Preucil. Goal assignment using distance cost in multi-robot exploration. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. 2012. [13] A. Bautin, O. Simonin, F. Charpillet. MinPos : A Novel Frontier Allocation Algorithm for Multi-robot Exploration. In Proceedings of the 5th International Conference on Intelligent Robotics and Applications. 2012. [14] W. Burgard, M. Moors, C. Stachniss, F. E. Schneider. Coordinated multi-robot exploration. Robotics, IEEE Transactions on 21(3):376–386, 2005. [15] L. Macedo, A. Cardoso. Exploration of unknown environments with motivational agents. In Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems. 2004. [16] M. A. Goodrich, A. C. Schultz. Human-robot interaction: a survey. Foundations and trends in human-computer interaction 2007. [17] H. Asoh, Y. Motomura, F. Asano, et al. Jijo-2: An office robot that communicates and learns. IEEE Intelligent Systems 16(5):46–55, 2001. [18] H. Takeda, N. Kobayashi, Y. Matsubara, T. Nishida. Towards ubiquitous human-robot interaction. In Proceedings of the Working Notes for IJCAI Workshop on Intelligent Multimodal Systems. 1997. [19] E. Rohmer, S. P. Singh, M. Freese. V-rep: A versatile and scalable robot simulation framework. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. 2013. [20] D. R. Olsen, M. A. Goodrich. Metrics for evaluating human-robot interactions. In Proceedings of PERMIS. 2003. [21] A. Steinfeld, T. Fong, D. Kaber, et al. Common metrics for human-robot interaction. In Proceedings of the 1st ACM SIGCHI/SIGART conference on Human-robot interaction. 2006. 161 Acta Polytechnica 55(3):154–161, 2015 1 Introduction 2 Related Work 2.1 Multi-Robot Exploration 2.2 Human-Robot Interaction 3 Multi-Agent System Formalisation 3.1 Modelling the Environment and the Agents 3.2 Exploration and Completion 3.3 Instantiating the Multi-Agent System 4 Mixed Exploration Approach by Frontiers and Interactions 4.1 Various Approaches for Multi-Robot Exploration 4.1.1 Frontier-Based Exploration 4.1.2 Interactive Exploration 4.1.3 Mixed Exploration 4.2 Mixed Cost Model 4.2.1 Distance 4.2.2 Penalty 4.2.3 Distance and Penalty 5 Experimental Framework 5.1 Protocol 5.2 Metrics 5.3 Results 6 Conclusion References