Acta Polytechnica doi:10.14311/AP.2015.55.0169 Acta Polytechnica 55(3):169–176, 2015 © Czech Technical University in Prague, 2015 available online at http://ojs.cvut.cz/ojs/index.php/ap DECENTRALIZED MULTI-ROBOT PLANNING TO EXPLORE AND PERCEIVE Laetitia Matignona, ∗, Laurent Jeanpierreb, Abdel-Illah Mouaddibb a Université Lyon 1, LIRIS UMR5205, F-69622, France b Université Caen Basse-Normandie, GREYC UMR6072, F-14032, France ∗ corresponding author: laetitia.matignon@univ-lyon1.fr Abstract. In a recent French robotic contest, the objective was to develop a multi-robot system able to autonomously map and explore an unknown area while also detecting and localizing objects. As a participant in this challenge, we proposed a new decentralized Markov decision process (Dec-MDP) resolution based on distributed value functions (DVF) to compute multi-robot exploration strategies. The idea is to take advantage of sparse interactions by allowing each robot to calculate locally a strategy that maximizes the explored space while minimizing robots interactions. In this paper, we propose an adaptation of this method to improve also object recognition by integrating into the DVF the interest in covering explored areas with photos. The robots will then act to maximize the explored space and the photo coverage, ensuring better perception and object recognition. Keywords: Cooperative multi-robot systems, robot coordination, robot planning, multi-robot explo- ration, active perception. 1. Introduction Some key challenges of robotics reported in the recent roadmap for U.S. robotics [1], e.g., planetary mis- sions and service robotics, require mobile robots to travel autonomously around unknown environments and to augment metric maps with higher-order seman- tic information such as the location and the identity of objects in the environment. The ability of the mobile robots that gather the necessary information to obtain a useful map for navigation is called au- tonomous exploration. This was the central topic of a DGA1/NRA2 robotic challenge, in which multiple robots have to explore and map some unknown in- door area while recognizing and localizing objects in this area. The scientific issues of this project involve SLAM3, object recognition and multi-robot collabo- ration for exploration. As a participant in this chal- lenge, we mainly focused on multi-robot collaboration for exploration. We were particularly interested in multi-robot exploration strategies. We proposed a new Dec-MDP (decentralized Markov decision pro- cess) resolution technique based on the distributed value function (DVF) to consider sparse interactions. Our Dec-MDP model for exploration and its resolu- tion based on DVF were applied for the contest. It allowed robots to explore an unknown area coopera- tively by reducing the overlap between the explored areas of each robot. In a second phase, we focused on improving object recognition by integrating into the planning stage the interest in covering explored areas with photos. Thus each robot will act to explore and 1Defense Procurement Agency. 2French National Research Agency. 3Simultaneous Localization and Mapping. to perceive. The objective of this paper is to present our interaction-sparse Dec-MDP resolution adapted to achieve improved perception. In the following, we first present the context of this work, with details concerning the robotic challenge and the system that we developed to participate in the competition. Second, related works on multi-robot exploration, active perception and interaction-oriented models are introduced. Then we present our Dec- MDP resolution, based on DVF, and its application to multi-robot exploration. Finally, we introduce and show experiments aimed at extending DVF to improve photo coverage of the space. Finally, some concluding remarks are made. 2. Context 2.1. The CaRotte Challenge This work was carried out within the framework of a French robotic contest called Defi CaRotte4 (Car- tography by a Robot of a territory), which involved exploring and mapping an unknown indoor area (an enclosed arena made up of a set of rooms) with one or several autonomous robots. The competition took place in an arena of approximately 120m2, in which objects had been laid out. The arena contained several rooms, typically 10 or more, with variable grounds and various difficulties to be dealt with (fitted carpet, grid, sand, etc. ). Several kinds of objects were present, isolated or gathered together, e.g., chairs, books, fans, furniture, etc. The CaRotte challenge proceeded over a period of three years, with an increase in the level of difficulty over the years. The required outcome was to produce 4http://www.defi-carotte.fr/ 169 http://dx.doi.org/10.14311/AP.2015.55.0169 http://ojs.cvut.cz/ojs/index.php/ap L. Matignon, L. Jeanpierre, A.-I. Mouaddib Acta Polytechnica 2D, 3D and topological maps of the arena and to de- tect, localize and classify objects present in the arena. The trials within the framework of the competition consisted of several missions with a time limit. Dur- ing each mission, robots navigated autonomously to map, detect and recognize objects. The robots were required to return to their starting position before the end of the mission. Five teams entered for this challenge [2, 3], in which the goal was to maximize the explored space, the precision of the map and the quality of object detection. 2.2. The Robots_Malins System We developed the Robots_Malins5 (Robots for Map- ping And Localization, using Intelligent Navigation and Search) system for the CaRotte challenge. Our system uses Wifibot6 µ-trooper M robots. These are six-wheel robots characterized by great flexibility, which allows the robots to cover rough terrain. Each µ-trooper embeds an Intel Core 2 Duo processor, a 2GB RAM and a 4GB flash and is equipped with a Hokuyo LIDAR 30m 7 laser range scanner for local- ization and mapping and a Hokuyo LIDAR 4m tilted toward the ground, plus an ultrasonic rangefinder for detecting nearby obstacles and glass walls. An AVT Marlin firewire camera is used for object detection. The software running on-board these robots is based on a Data Distribution System (DDS) implementation from OpenSplice8. This middle-ware allows several programs to run concurrently, even on different com- puters. In our architecture, this implies that various modules can run asynchronously: Laser acquisition, SLAM, Decision, Mobility and Object recognition. The architecture allows the robots to exchange their laser scans and their poses. Thus each robot knows the areas explored by the others, and updates its local map with local and distant scans. In particular, the SLAM module, based on [4], re- ceives laser readings and provides the other modules and the other robots with the robot pose (location and heading). So each robot knows the relative positions of all the robots and the map of the zones explored by the team. The mobility module implements an advanced point and shoot algorithm, along with a backtrack feature that prevents the robot from be- ing stuck, reverting back on its own trajectory. The point and shoot algorithm consists of turning to the desired heading and then going forward for a speci- fied distance, while correcting the heading drift. The object recognition module uses the different pictures taken by the camera to recognize predefined classes of objects. Pictures are taken every 4 seconds. Objects to be detected are known in advance and a database has been built containing each object over different 5https://robots_malins4carotte.greyc.fr/ 6www.wifibot.com 7www.hokuyo-aut.jp 8http://www.opensplice.com points of view. Object detection was performed us- ing a shape/template matching technique based on Dominant Orientation Template models [5]. The de- cision module runs asynchronously, computing a new strategy every second on an average. In this paper, we focus on the decision module. Details on the algo- rithm used to compute a joint policy of the robots to efficiently explore the arena and cover it with pictures are given in Sections 5 and 6. 3. State of the art 3.1. Multi-robot exploration Multi-robot exploration has received considerable at- tention in recent years due to its obvious advantages over single-robot systems: it is faster, robust, fault- tolerant, etc. Most multi-robot approaches assume that robots share the information that they have gath- ered to build a shared map and know their locations in this map. The robots cooperate through the shared map, but coordination techniques are required to ex- ploit the parallelism inherent to the team. In [6], each robot uses the greedy approach in the shared map, i.e., each robot chooses the nearest exploration frontiers, defined as regions on the boundary between open space and unexplored space. Therefore, there is no coordination and multiple robots can be as- signed to the same frontier. In [7–9], the coordination is centralized. A cost-utility model is used, where the gain expected at a target is the expected area discovered when the target is reached, taking into account the possible overlap between robot sensors. Coordination is accomplished by assigning different targets to the robots, thus maximizing the coverage and reducing the overlap between the explored areas of each robot. Coordination can also be decentral- ized. In [10] robots bid on targets to negotiate their assignments. Classically, frontiers are rated using the distance to the robot, but Bautin et al. [11] chose to favor a well-balanced spatial distribution of robots in the environment: for each robot-frontier pair, the cost function is the number of robots closer than it to the considered frontier. Accordingly, each robot is allo- cated to the frontier for which it has the lowest rank. All these strategies have been devised, but few efforts have been made to compare them. An exception is a recent article [12] that compares some methods for autonomous exploration and mapping using various criteria, e.g., exploration time and map quality. 3.2. Active perception combined with exploration Early works on object recognition were based on pas- sive approaches. During the last decade, some ap- proaches have investigated the field of active object recognition: the viewpoint of the camera can be con- trolled to improve the recognition rate. For example, looking at an object from different poses decreases am- biguities in object recognition, thanks to the choice of 170 vol. 55 no. 3/2015 Decentralized Multi-Robot Planning to Explore and Perceive different viewpoints. Several active approaches have been proposed for planning optimal sequences of views for a camera. An approach for viewpoint selection based on reinforcement learning is proposed in [13]. In [14], the objective is to plan the minimum num- ber of actions and observations required to achieve recognition. Active perception planning is formulated as POMDP. However, the issue of active planning combined with exploration is not addressed in most of these works, as the location of the object to be recognized is known. Other works are interested in adding in the trajecto- ries planned for the exploration other objectives. For example, integrated exploration [15, 16] involves inte- grating the path planned for exploration with SLAM in order to plan trajectories that favour the creation of a high-quality map. Some actions, e.g., closing a loop or returning to precedent positions, may reduce the uncertainty of the robot pose and the uncertainty of the map. A utility function is used which trades off the cost of exploring new terrain against the potential reduction of uncertainty by making measurements at selected positions. In these works, there is no verifica- tion of the assumption of having an accurate position estimate during exploration. Integrated exploration considers the problem of acting for better localization, but not for better recognition. 4. Interaction-oriented Models Decision-theoretic models based on Dec-MDPs provide an expressive means for modeling cooperative teams of decision makers. In this section, we present this formalism and recent advances in resolving it. 4.1. Background on Dec-MDP Dec-MDP [17] is an extension of MDP [18] for de- centralized control domains. A Dec-MDP is defined with a tuple 〈I,S,A,T,R, Ω,O〉. I is the number of agents, S is the set of joint states and A = {Ai} is the set of joint actions9. T : S ×A×S → [0; 1] is a transition function and T(s,a,s′) is the probability of the I robots transitioning from joint state s to s′ after performing joint action a. R : S →< is a reward function that represents the global immediate reward for the robots being in s. Ω is a set of joint observa- tions that agents can receive about the environment and O : S×A×S×Ω → [0; 1] is an observation func- tion giving the probability of receiving o ∈ Ω after performing joint action a and transitioning from s to s′. If the global state of the system is collectively totally observable, the Dec-POMDP is reduced to a Dec-MDP. We can see an MDP as a Dec-MDP where I = 1. It is defined with a tuple 〈S,A,T,R〉. The goal of MDP planning is to find a sequence of actions maximizing the long-term expected reward. Such a plan is called 9A state of the problem can be written with a tuple s = (s1, ..., sI ) such that sj is the state of robot j. Aj defines the set of actions aj of robot j. a policy π : S → A. An optimal policy π∗ specifies for each state s the optimal action to execute in the current step, assuming that the agent will also act optimally in future time steps. The value of π∗ is defined by the optimal value function V ∗ that satisfies the Bellman optimality equation: V ∗(s) = R(s) + γ max a∈A ∑ s′∈S T(s,a,s′)V ∗(s′), (1) where γ is the discount factor. An MDP is solved using Dynamic Programming, with polynomial time complexity. A Dec-POMDP is solved similarly by computing the optimal joint policy. However its time complexity is NEXP-complete [17], which is extremely hard. 4.2. Interaction-oriented Models When faced with real-world applications such as multi- robot systems, Dec-(PO)MDP models are very diffi- cult to apply, due to their high complexity. Recent advances in Dec-(PO)MDP resolution have allowed a notable increase in the size of the problems that can be solved. An interesting direction that has emerged recently involves taking advantage of local or sparse interactions between agents. These methods relax the most restrictive and complex assumption which con- siders that the agents interact permanently with all the others. The complexity of solving Dec-(PO)MDPs is then reduced by solving a set of interactive indi- vidual decision making problems. The ND-POMDP model [19] is a static interaction approach, i.e., an agent always interacts with the same subset of neigh- bors. However this assumption is not realistic. Models have therefore been proposed that use dynamic inter- actions so that each agent interacts with an evolving set of agents. The Dec-SIMDP model assumes full local observability, and unlimited, free communication between agents interacting together in some specific states [20]. DyLIM is similar but applies to partial observation and no communications [21]. It considers Dec-POMDPs as a set of POMDPs, and interactive situations are solved separately by deriving joint poli- cies. For non-interactive situations, each agent has its local policy to behave solely. These are promising approaches for real-world applications of decentralized decision makers. 5. DVF for multi-robot exploration During the robotic contest, we were particularly in- terested in multi-robot exploration strategies. In this section, we present our fully decentralized approach based on DVF and its application to multi-robot ex- ploration. 5.1. Interaction Sparse Dec-MDP with DVF To improve the complexity for solving Dec-MDPs, we proposed an interaction-oriented resolution based 171 L. Matignon, L. Jeanpierre, A.-I. Mouaddib Acta Polytechnica on distributed value functions (DVF). DVFs were introduced by [22] as a way to distribute reinforcement learning knowledge through different agents. Our approach decouples the multi-agent problem into a set of individual agent problems, and considers possible interactions among a team as a separate layer. This currently seems one of the main tracks for tackling scalability in Dec-(PO)MDPs (cf. Section 4.2). We represent a Dec-MDP with two classes: • The global interaction class, defined as a collec- tion of augmented MDPs {MDP Aug1 , ...,MDP Aug I }. There is one augmented MDP per agent, which is defined by MDP Augi = 〈Si,Ai,Ti,Ri, Γi〉 where 〈Si,Ai,Ti,Ri〉 individually models agent i in the absence of other agents and Γi is some additional information. Γi can be communicated by other agents or can be inferred locally. It provides the agent with global information, enabling interaction between MDPs of the global interaction class. Then, each agent resolves solely its augmented MDP so that interactions are minimized. The global Dec- MDP is solved as a set of local MDPs augmented by information from the other agents. This signifi- cantly reduces the computational complexity: the NEXP complexity of solving a Dec-MDP is reduced to the complexity of solving one MDP (polynomial) per agent. • The local interaction class is for close interactions. Indeed, each agent computes strategies with DVF in its augmented MDP to minimize the interactions. However, when situations of interaction occur, DVF does not handle these situations and the local coor- dination must be resolved separately with another technique. For example, joint policies can be com- puted off-line for the specific joint states of close interactions, including only interacting agents. In the exploration context, the additional informa- tion of the augmented MDP is limited to the last known state of other agents: Agent i knows at each step the state sij ∈ Si of the other agents j. Then it computes its distributed value function DVFi accord- ing to DVFi(si) = Ri(si) + γ max ai∈Ai ( ∑ s′∈Si Ti(si,ai,s′) ( DVFi(s′) − ∑ j 6=i fijPr (s′|sij )Vj (s′) )) (2) for all si ∈ Si, where Pr(s′|sij ) is the probability of agent j transitioning from state sij to state s′; Vj (s′) is the value function of agent j, computed locally by agent i; fij is a weighting factor that determines how strongly the value function of agent j reduces the value function of agent i. The DVF technique allows each agent to choose a goal which should not be considered by the others, and in this way the interactions are minimized. The value of a goal depends on the expected rewards at Figure 1. Reward propagation mechanisms with stars as resulting rewards. a) From frontier hexagons at the top. An unknown hexagon (grey) propagates its reward over a radius (dotted circle) on free neigh- borhood hexagons (white). Propagation is stopped if occupied (black) or unknown hexagons are encoun- tered. White arrows show impossible propagations, and black arrows represent active propagations. b) From free and non-covered hexagons at the bottom. A free and non-covered hexagon (yellow) propagates varying rewards at a best view point distance (solid circle). this goal and on the fact that it is unlikely to be selected by other agents. More details about DVF and its extension under communication constraints can be found in [23]. 5.2. DVF applied to multi-robot exploration In the second year of the contest, DVF was used in our decision module to compute multi-robot exploration strategies in a decentralized way. Thanks to the SLAM module (cf. Section 2.2), each robot has access to a map updated with all explored areas and to the position of all the robots. We can then assume that the location of the robot and of the other robots is known at decision time. 5.2.1. MDP Model Each robot generates its local augmented MDP from a four-layer grid. The first layer is the real world layer where the robots move. The pixel layer is an occu- pancy grid of pixels, where each pixel is initialized as unknown and updated as free (no obstacle) or occu- pied (something there) by the data acquisition process. The hexagon layer is an occupancy grid of hexagons10. Hexagons and Voronoi layers are each computed from the pixel layer, and are used to generate the data struc- tures of the local augmented MDP. States and rewards are based on the hexagonal layer, while actions and transitions are based on the hexagonal and Voronoi 10Each hexagon is composed of a set of pixels and is consid- ered as unknown, free or occupied according to the value of its pixels. 172 vol. 55 no. 3/2015 Decentralized Multi-Robot Planning to Explore and Perceive (a) Simulation environment. (b) Simulation screenshots. Figure 2. In b), areas that have been explored but not yet covered with photos are in yellow; explored areas covered with photos are in white; non explored areas are in grey. layers. The exploration reward function is computed with a reward propagation mechanism based on the expected information gain in each state as in [24]. We propagate rewards in some radius around the frontier hexagons, taking into account line-of-sight constraints (see Fig. 1a). Further details about our model are given in [25]. 5.2.2. DVF To apply DVF (2), we consider that the robots are homogeneous. The value functions Vj of the other robots j can therefore be computed only once by robot i 11. Robot i computes an empathic value function with the standard value iteration algorithm [26] in its local augmented MDP. To evaluate the transition probability of other robot j, i applies a wavefront propagation algorithm from the last known state sj of robot j. 5.2.3. Decision step A decision step involves building the model, comput- ing the policy from DVF, and producing a smoothed trajectory. The agent plans continuously, executing a decision step as it perceives changes in its environ- ment. We can observe that exploration rewards will never be gained by the robot. Indeed, as the robot comes close enough to the frontier, it will gather new information with its sensors and unknown cells will become known before they are reached. Therefore, the exploration rewards will disappear before the robot can claim them and the frontier between known and unknown areas, which is the source of the rewards, retreats as the robot approaches it. The action plan must then be updated quickly to react as soon as possible to this kind of information gained en route. However, this requires the decision step to be quick enough for on-line use. Given that the model will be updated at each decision step, we use the greedy approach, which plans on a short-term horizon. 5.2.4. Experiments Experimental results from simulation and real-world scenarios can be found in [23, 25]. Videos12 present 11If the robots are not homogeneous we just need to compute one value function for each type of robot 12available at http://liris.cnrs.fr/laetitia.matignon/ research.html various exploration tasks with real robots, and some interesting situations are underlined as global task repartition or local coordination. The experiments show that this method is able to coordinate a team of robots effectively during exploration. The global in- teraction class addresses the allocation of exploration goals, and also minimizes close interactions between robots. Each robot locally computes a strategy that minimizes the interactions between the robots and maximizes the explored space. 6. Planning to explore and perceive In the first two years of the contest, pictures were taken to locate and recognize objects. This was done separately from the decision module. Pictures taken by the camera and analyzed by the object recognition module were gathered along the way, i.e., the camera took pictures at a specified rate along the trajectory planned for the exploration. However, this led to poor performance in terms of object recognition results, primarily because some objects in the arena were not photographed. Indeed, the pictures that are taken depend on the trajectory computed for the exploration, so some objects were not covered by photos, while some areas without objects were photographed several times. To improve the coverage of the objects with pictures, we extended the decision module so that it does not only explore the arena but must also ensure that the explored areas are covered by photos. Indeed, if all the explored space has been photographed, each object must be in at least one picture and the recognition module will handle it. In this section, we introduce a DVF extension to improve the photo coverage of the space, and we also present some experiments. 6.1. DVF extension to improve photo coverage of the space The decision module must simultaneously combine the interest in exploring an area with the interest in covering it with pictures. To take into account both the exploration criterion and the picture coverage criterion, we modified the reward function Ri of the augmented MDP. We introduced a specific reward for 173 L. Matignon, L. Jeanpierre, A.-I. Mouaddib Acta Polytechnica Figure 3. Number of objects detected and mission time (in seconds) versus coverage rate (produced from 340 simulations). areas explored and covered with photos in addition to the exploration reward. MDPs allow planning while optimizing several criteria at once: in our case, the expected information gain, the picture-taking and the cost to reach the chosen location. During the challenge, we also optimized the return-to-base criterion at the end of the mission and the ball-pushing feature when the ball was detected. These two criteria are not considered in this paper. The DVF equation is then adapted to: ∀si ∈ Si DVFi(si) = (1 −α)Ri,exp(si) + αRi,cov(si) + γ max ai∈Ai ( ∑ s′∈Si T(si,ai,s′) ( DVFi(s′) − ∑ j 6=i fijPr (s′|sij )Vj (s′) )) (3) where Ri,exp(si) is the reward function for exploring a state si and Ri,cov(si) is the reward function for taking a photo of si. α ∈ [0, 1] is the picture coverage rate for balancing the exploration and picture coverage behaviors of the robots. With α = 0, the robots will only optimize the exploration without covering the space with photos. With α = 1, the robots will only optimize photo coverage. Exploration then occurs only as a side-effect. To compute the cover reward function Ri,cov, a cover grid of pixels counts the number of times that each pixel has been photographed. Each time the robot takes a picture, the cover grid is updated by tracing a set of rays covering the optimal recognition area of the camera. The pixels are updated with Bre- senham’s algorithm [27]. The cover reward is then computed with a hexagonal reward propagation mech- anism similar to the exploration rewards, multiplied by a bell-shaped factor to boost the rewards at the optimal recognition distance (see Fig. 1b). This allows the MDP to select the best viewpoint for the next picture. Regarding communications, this new reward implies that new data is sent: each robot needs to know where a picture has been taken. Thus, each robot sends the location of its photos to all the other robots. Then they can update their coverage grid and maintain an accurate reward function. In order to keep the computing complexity low, we chose not to add an action for taking a picture. When the policy brings the robot to a location where a picture must be taken, the robot will stop there and face the reward. When a picture is taken, the reward disappears and the new policy drives the robot farther. In order to avoid having blurred pictures where the recognition process will fail, we specify that photos must be taken only when the robot velocity is low. 6.2. Experiments 6.2.1. Simulated robots Stage13 simulator was used with an architecture that mimics the real robots. DDS is replaced by an In- ter Process Communication shared memory segment. Laser acquisition is simulated by a “ranger” virtual sensor. A “position” virtual device simulates both the SLAM module by providing odometric data and the mobility module by executing the point and shoot algorithm. The Stage blobfinder model is used to simulate (poorly) the camera and object detection, as it can track areas of color in a simulated 2D image, giving the location and the size of the color blobs. Thanks to this architecture, the decision module of the real robots can be used with the simulator without modification. 6.2.2. Results We conducted a set of experiments in the autolab environment (see Fig. 2a). Figure 2b shows successive screenshots of one simulation. The robots are initially in the starting zone and objects are regularly posi- tioned in the environment. For each coverage rate, we compute the number of objects detected and the mission time (see Fig. 3). With α = 0, the mission is fastest, given that the robot does not care about taking pictures. The only goal of the robot is to map the entire arena, and it only takes few pictures when it stops between two actions. Few objects are there- fore detected. With α > 0, all objects are detected but the mission time varies over α. Indeed α mod- ifies the priority of the two tasks. When α is low, pictures are taken at the end of the mission, if there is sufficient time remaining time. This is illustrated 13http://playerstage.sourceforge.net/ 174 vol. 55 no. 3/2015 Decentralized Multi-Robot Planning to Explore and Perceive E n v ir o n m e n t c o v e ra g e b y p ic tu re s fo r th e o b je c t re c o g n it io n Environment coverage by laser for the exploration/mapping Coverage rate Figure 4. Percentage of objects detected versus percentage of exploration. in Fig. 4, with α = 0.0001, where the number of de- tected objects increases and 100% of the arena has been explored. With such a low α, the robot takes pictures once the exploration has been finished, so it travels the arena twice and the mission time is high. When α is high, pictures are taken throughout the mission, and the robot deals with both tasks at the same time. With α = 1, we obtain the second fastest mission time and all the objects are detected. So this new approach, based on exploration and photo coverage of the space, allows tall of the objects to be photographed. The α factor must be chosen to balance priority of exploration versus object detection. For example, if the time is limited and α is high, the robot may not explore the whole map, since taking pictures is time consuming, but it will photograph all objects in the explored area. 7. Conclusion and Perspectives We have shown a new algorithm for coordinating mul- tiple robots that explore some unknown area. DVF enables the expensive Dec-MDP to be solved as a set of augmented MDPs in a distributed way. The versa- tile reward function can be adapted so that the robots explore the whole area, but also so that they choose the right positions to take meaningful pictures. As a result, we can observe that the robots take longer to explore, but that they also take many more pictures, efficiently covering the whole space. We can then expect a better detection rate, as any object should be in at least one picture. An immediate perspective is therefore to compare object recognition results with different parameters on real robots to confirm through a real experiment that our method provides improved perception. Another perspective of our work is to plan to per- ceive by interlacing detection and decision modules to achieve more robust object recognition. The idea is not to take more pictures, but to take better pictures. The objective would be to plan viewpoints where the recognition process would be more reliable. To achieve this kind of active perception planning, the decision module should use information from the recognition module. In our work, the recognition module is based on the Dominant Orientation Template method [5]. For each object, a set of views is defined, and the method gives a weighting for each view that is equiva- lent to its precision14. A positive detection is reliable when the point of view has a high weighting. For each object detected, the module also gives the location of the object and scores (matching results) for each view. Thus the decision module could manage a set of hy- potheses about the probability of the presence of each object. These hypotheses could then be confirmed or discarded by taking pictures of the object from viewpoints that maximize the precision. This could be done easily by generating specific high rewards at these viewpoints, so that the decision module will adapt the planned trajectory to ensure that the robots take pictures from there. Acknowledgements This work has been supported by NRA and DGA (ANR-09- CORD-103) and was developed jointly with the members of the Robots Malins consortium. It has also received support from the INSA Foundation CROME project. References [1] From Internet to Robotics: A Roadmap for U.S. Robotics . 2013. [2] A. Bautin, P. Lucidarme, R. Guyonneau, et al. Cart-O-matic project : autonomous and collaborative multi-robot localization, exploration and mapping. In Proc. of 5th Workshop on Planning, Perception and Navigation for Intelligent Vehicles, pp. 210–215. 2013. [3] D. Filliat, E. Battesti, S. Bazeille, et al. Rgbd object recognition and visual texture classification for indoor semantic mapping. In Proc. of TePRA. 2012. doi:10.1109/TePRA.2012.6215666. [4] J. Xie, F. Nashashibi, N. M. Parent, O. Garcia-Favrot. A Real-Time Robust SLAM for Large-Scale Outdoor Environments. In 17th ITS World Congress. 2010. doi:10.1028/ITS.SLAM.Nashashibi. [5] S. Hinterstoisser, V. Lepetit, S. Ilic, et al. Dominant orientation templates for real-time detection of texture-less objects. In CVPR, pp. 2257–2264. 2010. [6] B. Yamauchi. Frontier-based exploration using multiple robots. In Proceedings of the second international conference on Autonomous agents, AGENTS ’98, pp. 47–53. 1998. 14or positive predictive value 175 http://dx.doi.org/10.1109/TePRA.2012.6215666 http://dx.doi.org/10.1028/ITS.SLAM.Nashashibi L. Matignon, L. Jeanpierre, A.-I. Mouaddib Acta Polytechnica [7] R. Simmons, D. Apfelbaum, W. Burgard, et al. Coordination for Multi-Robot Exploration and Mapping. In Proc. of the AAAI National Conf. on Artificial Intelligence. 2000. [8] W. Burgard, M. Moors, C. Stachniss, F. Schneider. Coordinated Multi-Robot Exploration. IEEE Transactions on Robotics 21:376–386, 2005. doi:10.1109/TRO.2004.839232. [9] K. M. Wurm, C. Stachniss, W. Burgard. Coordinated multi-robot exploration using a segmentation of the environment. In Proc. of IROS, pp. 1160–1165. 2008. doi:10.1109/IROS.2008.4650734. [10] R. Zlot, A. Stentz, M. Dias, S. Thayer. Multi-Robot Exploration Controlled By A Market Economy. In Proc. of ICRA, vol. 3, pp. 3016–3023. 2002. doi:10.1109/ROBOT.2002.1013690. [11] A. Bautin, O. Simonin, F. Charpillet. MinPos : A Novel Frontier Allocation Algorithm for Multi-robot Exploration. In ICIRA, vol. 7507 of Lecture Notes in Computer Science, pp. 496–508. 2012. [12] M. Julia, A. Gil, Ó. Reinoso. A comparison of path planning strategies for autonomous exploration and mapping of unknown environments. Autonomous Robots 33(4):427–444, 2012. doi:10.1007/s10514-012-9298-8. [13] F. Deinzer, C. Derichs, H. Niemann, J. Denzler. A framework for actively selecting viewpoints in object recognition. IJPRAI 23(4):765–799, 2009. [14] S. Brandao, M. Veloso, J. P. Costeira. Active object recognition by offline solving of pomdps. In Proc. of the International Conference on Mobile Robots and Competitions, pp. 33–38. 2011. [15] A. A. Makarenko, S. B. Williams, F. Bourgault, H. F. Durrant-Whyte. An Experiment in Integrated Exploration. In Proceedings of IROS, pp. 534–539. 2002. doi:10.1109/IRDS.2002.1041445. [16] C. Stachniss, G. Grisetti, W. Burgard. Information gain-based exploration using rao-blackwellized particle filters. In Proc. of Robotics: Science and Systems (RSS). Cambridge, MA, USA, 2005. [17] D. S. Bernstein, R. Givan, N. Immerman, S. Zilberstein. The Complexity of Decentralized Control of Markov Decision Processes. Math Oper Res 27:819–840, 2002. [18] M. L. Puterman. Markov decision processes. 1994. [19] R. Nair, P. Varakantham, M. Tambe, M. Yokoo. Networked Distributed POMDPs: A Synthesis of Distributed Constraint Optimization and POMDPs. In Proc. of AAAI, pp. 133–139. 2005. [20] F. S. Melo, M. M. Veloso. Decentralized MDPs with sparse interactions. Artificial Intelligence 175(11):1757– 1789, 2011. doi:10.1016/j.artint.2011.05.001. [21] A. Canu, A.-I. Mouaddib. Collective Decision- Theoretic Planning for Planet Exploration. In Proc. of ICTAI. 2011. [22] J. Schneider, W.-K. Wong, A. Moore, M. Riedmiller. Distributed Value Functions. In Proc. of ICML, pp. 371–378. 1999. [23] L. Matignon, L. Jeanpierre, A.-I. Mouaddib. Coordinated multi-robot exploration under communication constraints using decentralized markov decision processes. In Proc. of AAAI. 2012. [24] S. Le Gloannec, L. Jeanpierre, A.-I. Mouaddib. Unknown Area Exploration with an Autonomous Robot using Markov Decision Processes. In Proc. of TAROS, pp. 119–125. 2010. [25] L. Matignon, L. Jeanpierre, A.-I. Mouaddib. Distributed Value Functions for Multi-Robot Exploration. In Proc. of ICRA. 2012. doi:10.1109/ICRA.2012.6224937. [26] R. Bellman. Dynamic programming: Markov decision process. 1957. [27] J. E. Bresenham. Algorithm for computer control of a digital plotter. IBM Systems Journal 4(1):25–30, 1965. 176 http://dx.doi.org/10.1109/TRO.2004.839232 http://dx.doi.org/10.1109/IROS.2008.4650734 http://dx.doi.org/10.1109/ROBOT.2002.1013690 http://dx.doi.org/10.1007/s10514-012-9298-8 http://dx.doi.org/10.1109/IRDS.2002.1041445 http://dx.doi.org/10.1016/j.artint.2011.05.001 http://dx.doi.org/10.1109/ICRA.2012.6224937 Acta Polytechnica 55(3):169–176, 2015 1 Introduction 2 Context 2.1 The CaRotte Challenge 2.2 The Robots_Malins System 3 State of the art 3.1 Multi-robot exploration 3.2 Active perception combined with exploration 4 Interaction-oriented Models 4.1 Background on Dec-MDP 4.2 Interaction-oriented Models 5 DVF for multi-robot exploration 5.1 Interaction Sparse Dec-MDP with DVF 5.2 DVF applied to multi-robot exploration 5.2.1 MDP Model 5.2.2 DVF 5.2.3 Decision step 5.2.4 Experiments 6 Planning to explore and perceive 6.1 DVF extension to improve photo coverage of the space 6.2 Experiments 6.2.1 Simulated robots 6.2.2 Results 7 Conclusion and Perspectives Acknowledgements References