Occupancy grid mapping for rover navigation based on semantic segmentation ACTA IMEKO ISSN: 2221-870X December 2021, Volume 10, Number 4, 155 - 161 ACTA IMEKO | www.imeko.org December 2021 | Volume 10 | Number 4 | 155 Occupancy grid mapping for rover navigation based on semantic segmentation Sebastiano Chiodini1,2, Marco Pertile1,2, Stefano Debei1,2 1 CISAS "Giuseppe Colombo" - Università degli Studi di Padova, Via Venezia 15, 35131 Padova, Italy 2 Dipartemento di Ingegneria Industriale - Università degli Studi di Padova, Via Gradenigo 6/a, 35131 Padova, Italy Section: RESEARCH PAPER Keywords: Convolutional neural network; SLAM; semantic segmentation; robot vision; Martian environment Citation: Sebastiano Chiodini, Marco Pertile, Stefano Debei, Occupancy grid mapping for rover navigation based on semantic segmentation, Acta IMEKO, vol. 10, no. 4, article 25, Citation: Sebastiano Chiodini, Marco Pertile, Stefano Debei, Occupancy grid mapping for rover navigation based on semantic segmentation, Acta IMEKO, vol. 10, no. 4, article 25, December 2021, identifier: IMEKO-ACTA-10 (2021)-04-25 Section Editor: Roberto Montanini, Università di Messina and Alfredo Cigada, Politecnico di Milano, Italy Received July 26, 2021; In final form December 6, 2021; Published December 2021 Copyright: This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: Project BIRD181070 funded by the program BIRD 2018 sponsored by the University of Padova. Corresponding author: Sebastiano Chiodini, e-mail: sebastiano.chiodini@unipd.it 1. INTRODUCTION The problem of perception of the environment is fundamental for safe robot navigation. Planetary rovers’ exploration has some peculiarities that we do not find in other autonomous robotics applications, (1) there is no GPS system that can help with the localization process, (2) terrain assessment deals with an unstructured environment that is characterised by sharp rocks (large or small), sand, and bedrocks, which are often confused with the noise of a stereo point cloud. A review of learning-based perception and navigation methods for rescue robotics, planetary exploration, and agricultural robotics can be found in [1]. It is not only safe navigation that benefits from the autonomous navigation capabilities of a rover, but also its scientific output, as an example, the distance travelled for each sol by the NASA MSL rover has increased from a few meters to 100 m [2]. The NASA MER rover disposed of the GESTALT (Grid-based Estimation of Surface Traversability Applied to Local Terrain) system [3], which is one of the first autonomous terrain assessment systems for planetary rovers. This system was able to detect geometric hazards such as rock, ditches, and cliffs by processing the 3D point clouds generated by the rover stereo- images; it looked mainly at geometric characteristics such as steps, slopes, and terrain roughness. An alternative method for estimating the traversability of a terrain is presented in [4], where the unevenness of the terrain is analysed by means of the power spectral density (PSD) of the surface profile as measured by a stereo camera. In [5] an evaluation system for the traversability of rough terrain for a rover based on aerial UAV survey is presented. ABSTRACT Obstacle mapping is a fundamental building block of the autonomous navigation pipeline of many robotic platforms such as planetary rovers. Nowadays, occupancy grid mapping is a widely used tool for obstacle perception. It foreseen the representation of the environment in evenly spaced cells, whose posterior probability of being occupied is updated based on range sensors measurement. In more classic approaches, the cells are updated to occupied at the point where the ray emitted by the range sensor encounters an obstacle, such as a wall. The main limitation of this kind of methods is that they are not able to identify planar obstacles, such as slippery, sandy, or rocky soils. In this work, we use the measurements of a stereo camera combined with a pixel labeling technique based on Convolution Neural Networks to identify the presence of rocky obstacles in planetary environment. Once identified, the obstacles are converted into a scan-like model. The estimation of the relative pose between successive frames is carried out using ORB-SLAM algorithm. The final step consists of updating the occupancy grid map using the Bayes’ update Rule. To evaluate the metrological performances of the proposed method images from the Martian analogous dataset, the ESA Katwijk Beach Planetary Rover Dataset have been used. The evaluation has been performed by comparing the generated occupancy map with a manually segmented ortomosaic map, obtained by drones’ survey of the area used as reference. mailto:sebastiano.chiodini@unipd.it ACTA IMEKO | www.imeko.org December 2021 | Volume 10 | Number 4 | 156 Looking at the context of indoor navigation, many methods involve the construction of occupancy grid maps that are updated via the Bayesian occupancy filter with the range measurements coming from LiDAR sensors. As an example, [6] presents a corridor detection method based on 2D lidar and occupancy filtering. Thanks to semantic segmentation, which is the process of dividing an image into digital objects, it is possible to add to the map information that otherwise cannot be deduced from the three-dimensional model alone, such as if the terrain is sandy or slippery. Compared to other methods, such as classification, semantic segmentation allows for pixel-by-pixel labelling of the image, see Figure 1. This enables a more detailed interpretation of the surrounding environment. Cadena et al. [7] says that Simultaneous Localization and Mapping is now entering its third stage, characterised by robust perception. This robustness requires the realization of robust performance, high-level understanding, resource awareness, and task-driven perception. Semantics is an important tool in the pursuit of improved robustness, intuitive visualization, and efficient human–robot-environment interaction. The term Semantic SLAM can be used to identify methods that comprise either semantic-based robustness/accuracy enhancements or semantic mapping. For details, see [8], which presents a detailed review of recent advances in semantic simultaneous localization and mapping, considering multiscaled map representation, object simultaneous localization and mapping system, and deep neural network-based SLAM. DeepLab networks make a significant contribution to semantic segmentation, see, e.g., [9], [10], which introduce the concept of “atrous convolution” in CNN models and exhibit excellent accuracy in semantic segmentation, see [8]. In the literature, it is possible to find many works on semantic mapping in the urban environment. As an example [11] generates a dense 3D reconstruction with associated semantic labelling from stereo camera images, [12] and [13] propose a multimodal sensor-based semantic 3D mapping system using a 3D LiDAR combined with a camera. In the frame of maritime navigation, the authors of [14] present a Water Obstacle Detection network based on Image Segmentation (WODIS) for autonomous surface vehicles. Gan et al. [15] describe a method that provides a unified probabilistic model for both occupancy and semantic probabilities, producing a Bayesian continuous 3D semantic occupancy map from noisy point clouds. Wang et al. [16] describe a joint method of a priori convolutional neural networks at superpixel level and soft restricted context transfer. In [17], the authors describe a method to build a complete semantic map based on merging the segmentation results from street and satellite view images. In [18] satellite images are segmented using a SegNet network. Li et al. [19] present a fast 3D semantic mapping system based on monocular vision by fusion of localization, mapping, and scene parsing. The method is based on an improved version of DeepLab-v3+ [9], [10]. [20] presents an approach to generate an outdoor large-scale 3D dense semantic map based on binocular stereo vision and the SegNet deep learning architecture. Paz et al. [21] fuses image and pre-built point cloud map information to perform automatic and accurate labelling of static landmarks such as roads, sidewalks, crosswalks, and lanes. In this work, semantic segmentation is also performed on 2D images using the DeepLab-v3+ network [9], [10]. However, these techniques have rarely been applied to the case of planetary exploration and environments with a low number of features. This paper proposes a terrain assessment method for Martian environment based on semantic mapping. The method takes as input a set of stereo-images which are pixel- wise labelled with the state-of-the-art Convolutional Neural Network labeller DeepLabv3+ [9]. Thanks to the stereo reconstruction of the scene, it is possible to associate the labels to the 3D points; therefore, obstacles are represented through a scan-like model [22]. All scans are combined with each other in an occupancy grid considering the trajectory travelled by the rover. The trajectory was calculated using the ORB-SLAM algorithm [23]. To evaluate the metrological performances of the proposed method, images from the public available ESA Katwijk Beach Planetary Rover Dataset [24] are used. The dataset was created for the validation of localization and navigation algorithms in Martian-like environment, and it provides trajectory and map ground truth. The method evaluation has been performed using a manually segmented ortomosaic map, taken by a drone, as reference, and the trajectory reconstruction performances have been evaluated by comparison with a differential GPS (DGPS). The major contributions of this work are: • the application of CNN to identify obstacles characteristic of the planetary environment; • the generation of laser-scan like point cloud representing the obstacle; • the application of Bayes filtering to obtain global obstacle maps; • comparison of the generated map with the ground truth for method validation. The paper is divided as follows: Section 2 describes the proposed terrain assessment method. In Section 3 the algorithm performance on a Martian analogous environment has been evaluated, and in Section 4 the concluding remarks are reported. 2. METHOD The proposed method takes as input a rectified stereo image (𝐼𝑙 , 𝐼𝑟 ), which is used to (1) identify the obstacle with a CNN labeller (DeepLabv3+), (2) estimate the scene point cloud, and (3) estimate the rover trajectory. Afterward, the point cloud is fused with the labels to obtain a scan-like representation of the obstacles. The scans associated with the rover poses that compose the trajectory are merged into an occupancy grid map using Bayesian filtering. A diagram summarizing the various steps is shown in Figure 2. Figure 1. Compared to other methods, such as classification, semantic segmentation allows for a pixel-by-pixel labelling of the image. This enables the creation of more detailed maps. ACTA IMEKO | www.imeko.org December 2021 | Volume 10 | Number 4 | 157 2.1. Obstacle Identification The purpose of the obstacle identification step is labelling each pixel of the image (𝑢𝑙,𝑣𝑙 ) into obstacle or free area. Image labelling is performed using a Convolutional Neural Network (CNN) technique, as CNNs have demonstrated a significant improvement in semantic segmentation tasks. Moreover, compared to other methods, such as classification, semantic segmentation enables a more detailed interpretation of the surrounding environment. Among the state-of-the-art methods (FCN [25], SegNet [18] and U-net [26]), DeepLabv3+ [9] has been chosen because (1) it has been demonstrated to outperform similar methods in several applications, (2) it is a pre-trained network. The latter feature is necessary to reduce the training time. Network pretraining has been performed on ImageNet, which is a dataset containing more than one million images divided into 1000 classes. Pre-training allows training the deeper layer of the network with the most characteristic feature. Instead, the final part of the training phase is application dependent and is used to train the outer layer of the network. For the context of this application, three classes have been chosen: sand, rocks, and background. Fehler! Verweisquelle konnte nicht gefunden werden.2 shows the obstacle detection step applied to the images of the dataset used for testing. We have taken advantage of a pre-trained network to reduce the training images number. Data augmentation techniques have been applied to improve the training accuracy and robustness to image variation, in particular, cropping, mirroring, and resizing. The number of images used for training is 400: 50 original images and 350 obtained with data augmentation; more details are given in [27], [28], and [29]. Since it may happen that the boundaries of the identified classes may present imperfections or smudges, morphological operations have been applied to filter the identified regions. The morphological operations that have been applied are the following: binary erosion, binary dilation, and removal of a small region [30]. For collision avoidance purposes, the main interest is limited to the potential point of collision with the obstacle. For this reason, for each identified region, only the lower limit is considered, which is the one that corresponds to the intersection between the ray that starts from the camera centre and the obstacle. This contour, combined with the range measurements obtained with the method described in Section 2.2, allows us to have a scan-like representation of the obstacle (see Figure 3). 2.2. Range Measurement First, images are stereo-rectified using camera intrinsic and extrinsic parameters. Then, the Semi-Global Block Matching algorithm [31] is used to compute the disparity map. Semi-Global Block Matching algorithm computes the scene disparity by comparing the sum of absolute differences (SAD) for each block of pixels in the image and forces a similar disparity on neighbouring blocks. By knowing the disparity map and the intrinsic parameters of the camera it is possible to estimate the depth of the scene and compute the related point cloud relative to the rover frame (𝑿𝑗 = [𝑋𝑗 , 𝑌𝑗 , 𝑍𝑗 ]), where 𝑗 = 1, … , 𝑛 with 𝑛 Figure 2. Scheme overview of the proposed terrain assessment algorithm. The algorithm takes as input a rectified stereo image, which is used to (1) identify the obstacle with a CNN labeller (DeepLabv3+), (2) estimate the scene point cloud, and (3) estimate the rover trajectory. Afterward, the point cloud is fused with the labels to obtain a scan-like representation of the obstacles. The scans associated with the rover poses that compose the trajectory are merged into an occupancy grid map using Bayesian filtering. Figure 3. The 3D obstacle map is broken down to a laser scan-like model. ACTA IMEKO | www.imeko.org December 2021 | Volume 10 | Number 4 | 158 total number of pixels with an associated disparity. The 3D points coordinates in the left camera frame are given by: 𝑋𝑗 = 𝑢𝑙 𝑍𝑗 𝑓 𝑌𝑗 = 𝑣𝑙 𝑍𝑗 𝑓 𝑍𝑗 = 𝑏𝑓 𝑢𝑙 − 𝑢𝑟 , (1) where (𝑢𝑙,𝑣𝑙 ) and (𝑢𝑟,𝑣𝑟 ) are respectively the pixel coordinates in the left and right image, 𝑏 is the camera baseline and 𝑓 is the camera focal length. Each 3D point of the point cloud (𝑿𝑗 ) is associate with a pixel of the image (𝑢𝑙,𝑣𝑙 ). Only 3D points that correspond to the lower boundary of the obstacles are used to update the occupancy grid map; see Section 2.3. 2.3. Relative Trajectory Reconstruction Sections 2.1 and 2.2 show how to determine the position of the obstacles with reference to the rover frame. However, to build a global map of obstacles and filter the measurements belonging to different rover poses, the position of the obstacles position 𝑿𝑖 needs to be defined with respect to an absolute reference system 𝑿𝑖 𝑊 , the W world reference frame. In this work the frame W corresponds to the first stereo-image frame rotated by the pan and tilt angles of the PTU (Pan and Tilt Unit). To perform the trajectory reconstruction step, only stereo camera visual input has been used in combination with the state- of-the-art visual SLAM algorithm, ORB-SLAM2 [23]. For the purposes of this work, the camera pose issued by the tracking step was used, this pose belongs to SE(3) (the Special Euclidean Group in three dimension). Assuming that the rovers are mainly moving in a planar environment, the SE(3) pose has been transformed into the corresponding SE(2) pose 𝑻𝑘 𝑊 = [𝑹𝑘 𝑊 |𝒕𝑘,𝑊 ], which for completeness is reported in the following relation: 𝑻𝑘 𝑊 = [ cos(𝜓𝑘 𝑊 ) − sin(𝜓𝑘 𝑊 ) 𝑡𝑥(𝑘,𝑊) sin(𝜓𝑘 𝑊 ) cos(𝜓𝑘 𝑊 ) 𝑡𝑦(𝑘,𝑊) 0 0 1 ] , (2) where 𝜓𝑘,𝑊 is the absolute heading angle and 𝑡𝑥(𝑘,𝑊) 𝑡𝑦(𝑘,𝑘−1) is the rover absolute position. Finally, obstacle scan line points are transformed from the rover frame to the W frame using the following equation: 𝑿𝑖 𝑊 = 𝑹𝑘 𝑊 𝑿𝑖 + 𝒕𝑘,𝑊 . (3) 2.4. Sensor Fusion The last step is the update of the occupational grid. The probability of each grid cell being occupied is updated following the standard Bayes update rule [29] [32] and using the obstacle scan 𝑿𝑖 𝑊 . In the experimental part, a grid resolution of 0.2 × 0.2 m has been considered. Assuming that the cells of the grid map are independent from each other, and given a series of rock observations 𝑧1:𝑗 , the probability belief of a single cell to be occupied by an obstacle or not 𝑝(𝑚𝑥,𝑦 |𝑧1:𝑗 ), is reported in Equation (4). 𝑝(𝑚𝑥,𝑦 |𝑧1:𝑗 ) = 𝑝(𝑚𝑥,𝑦 |𝑧𝑗 )𝑝(𝑧𝑗) 𝑝(𝑚𝑥,𝑦) 𝑝(𝑚𝑥,𝑦 |𝑧1:𝑗−1) 𝑝(𝑧𝑗|𝑧1:) , (4) where 𝑝(𝑚𝑥,𝑦 |𝑧𝑗 ) is the inverse measurement model of the depth data retrieved from the stereo-camera, 𝑝(𝑚𝑥,𝑦 |𝑧1:𝑗−1) is the depth measurements of the previous rover poses, and 𝑝(𝑚𝑥,𝑦 ) is the prior map. To avoid difficult-to-calculate probabilities, we use the binary Bayes filter in log-odds form: 𝑙𝑗 = 𝑙𝑗−1 + log 𝑝(𝑚𝑥,𝑦 |𝑧𝑗 ) 1−𝑝(𝑚𝑥,𝑦 |𝑧𝑗 ) + 𝑝(𝑚𝑥,𝑦) 1−𝑝(𝑚𝑥,𝑦) , (5) where 𝑙𝑗 = log 𝑝(𝑚𝑥,𝑦 |𝑧1:𝑗 ) 1−𝑝(𝑚𝑥,𝑦 |𝑧1:𝑗 ) . The log odd form of the inverse measurement model log 𝑝(𝑚𝑥,𝑦 |𝑧𝑗 ) 1−𝑝(𝑚𝑥,𝑦 |𝑧𝑗 ) an occupancy value 𝑙occ assigns to all cells within the 3D labelled points 𝑿𝑗 . In experiments, the occupied threshold is 𝑙occ = 0.65, the free threshold is 𝑙free = 0.2. The grid is initialized without prior knowledge of the map 𝑝(𝑚𝑥,𝑦 ) = 0.5. 3. EXPERIMENTAL RESULTS The metrological performances of the proposed method have been evaluated using the ESA Katwijk Beach Planetary Rover Dataset [24], which provides images analogous to Mars. Of the overall dataset, the LocCam stereo images have been used for semantic maps generation, site ortomosaic combined with differential GPS has been used as ground truth. The characteristics of the sensors used in the experimental part are summarized in Table 1. The DGPS has been used to register the generated maps to the ortomosaic taken by the drone, which has been used as a map reference. The ground truth of the map has been obtained by manually labelling the ortomosaic drone, and the ground truth of the trajectory is given directly by the DGPS. Figure 4 shows the occupancy grid map generated with the proposed method, the occupancy grid is superimposed on the drone image used for performance evaluation. The metrics normally used to evaluate object detection algorithms were used: the accuracy, the Intersection over Union (IoU) and the F1 score: Figure 4. Occupancy grid map generated with the proposed method. Free cells are shown in green, obstacles-occupied cells in red, and yellow arrows show the trajectory of the rover. The occupancy grid is superimposed on the drone image used for performance evaluation. ACTA IMEKO | www.imeko.org December 2021 | Volume 10 | Number 4 | 159 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 𝑇𝑃 + 𝑇𝑁 𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁 , (6) 𝐼𝑜𝑈 = 𝑇𝑃 𝑇𝑃 + 𝐹𝑃 + 𝐹𝑁 , (7) where TP represent the True Positives, TN the True Negatives, FP the False Positives, and FN the False Negatives. In the case of an ideal classifier 𝐹𝑃 = 𝐹𝑁 = 0, thus the accuracy metric would be equal to 1. IoU is the ratio of correctly classified cells over the sum of the total number of cells labelled and cells classified. Table 2 summarises the performances (Accuracy and IoU) of the proposed method. The left column of Figure 5 shows the images labelled with rock, sand and background classes, and the right column shows Figure 5. Left column: labelled images with rock, sand, and background classes. Right column: occupancy grid map and trajectory estimated with the proposed method. From top to bottom successive images of the sequence. Table 1. Sensors characteristics. Sensor Description Data Logged LocCam PointGery Bumblebee2 (BB2-08S2C- 38) 12 cm baseline stereo camera. 1024 × 768 images RTK GPS Trimble BD 970 Receiver with Zephyr Model2 Antenna (rover) Trimble BX 982 Receiver with Zephyr Geodetic Antenna (base station). Latitude, Longitude, and Altitude expressed on WGS84 ellipsoid Table 2. Global mapping performances in terms of accuracy and IoU metrics. Labeller Trajectory Accuracy IoU DeepLabv3+ ORB-SLAM2 0.987 0.282 ACTA IMEKO | www.imeko.org December 2021 | Volume 10 | Number 4 | 160 the occupancy grid map and trajectory estimated with the proposed method. As it is possible to observe in the first row of Figure 5 the two rocky obstacles are a few meters away from the camera and not easily recognizable, however, they are correctly labelled by the DeepLabv3+ labeller. The effectiveness of the labeller can also be observed in the middle and bottom row of Figure 5, although the shadows of the rover are present in both rock and sand, it does not affect the performance of the label. 4. CONCLUSIONS In this paper, a hazard mapping method for planetary rovers’ navigation is presented. The method is based on occupancy grid mapping. The posterior probability of the grid cells is updated using the inverse measurement model of a scan-like obstacle detector. Objects are identified as obstacles by means of the DeepLabv3+ deep neural network. The estimation of the relative pose between successive frames is carried out using the state-of- the-art visual SLAM method, ORB-SLAM. The proposed method shows the ability to produce accurate occupancy grid maps with associated label up to a dozen meters from the camera and when the rover shadow is present in the image field of view. Finally, the method has been tested on a public available dataset of a Martian analogous environment. REFERENCES [1] D. C. Guastella, G. Muscato, Learning-Based Methods of Perception and Navigation for Ground Vehicles in Unstructured Environments: A Review. Sensors 2021, 21, 73. DOI: 10.3390/s21010073 [2] M. Bajracharya, M. W. Maimone, D. Helmick, Autonomy for Mars Rovers: Past, Present, and Future, Computer, 41 (12) (2008), pp. 44-50. DOI: 10.1109/MC.2008.479 [3] M. W. Maimone, P. C. Leger, J. J. Biesiadecki, Overview of the Mars exploration rovers autonomous mobility and vision Capabilities, Proc. Of the IEEE International Conference on Robotics and Automation (ICRA) space robotics workshop, April 2007. [4] G. Reina, A. Leanza, A. Milella, A. Messina, Mind the ground: a power spectral density-based estimator for all-terrain rovers. Measurement, 151 (2020), 107136. DOI: 10.1016/j.measurement.2019.107136 [5] D. C. Guastella, L. Cantelli, D. Longo, C. D. Melita, G. Muscato, Coverage path planning for a flock of aerial vehicles to support autonomous rovers through traversability analysis, Acta IMEKO 8 (2019) 4, pp. 9-12. DOI: 10.21014/acta_imeko.v8i4.680 [6] L.-H. Chen, C. Peng, A Robust 2D-SLAM Technology With Environmental Variation Adaptability, IEEE Sensors Journal 19 (23) (2019), pp. 11475-11491. DOI: 10.1109/JSEN.2019.2931368 [7] Cesar Cadena, Luca Carlone, Henry Carrillo, Yasir Latif, Davide Scaramuzza, José Neira, Ian Reid, John J. Leonard, Past, present, and future of simultaneous localization and mapping: toward the robust-perception age, IEEE Trans. Robot. 32(6) (2016), pp 1309–1332. DOI: 10.1109/TRO.2016.2624754 [8] L. Xia, J. Cui, R. Shen, X. Xu, Y. Gao, X. Li, A survey of image semantics-based visual simultaneous localization and mapping: Application-oriented solutions to autonomous navigation of mobile robots, Int. J. of Advanced Robotic Systems (2020), pp. 1– 17. DOI: 10.1177/1729881420919185 [9] L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, H. Adam, Encoder-decoder with atrous separable convolution for semantic image segmentation, Proc. of the European conference on computer vision (ECCV), 2018, pp. 801-818. In: Ferrari V., Hebert M., Sminchisescu C., Weiss Y. (eds) Computer Vision – ECCV 2018, Lecture Notes in Computer Science, Springer, Cham, vol 11211. DOI: 10.1007/978-3-030-01234-2_49 [10] L.-C. Chen, G. Papandreou, I. Kokkinos, Kevin Murphy, Alan L. Yuille, Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFS, IEEE Trans. Pattern Analysis and Machine Intelligence, 40(4) (2017), pp 834–848. DOI: 10.1109/TPAMI.2017.2699184 [11] S. Sengupta, E. Greveson, A. Shahrokni, P. H. Torr, Urban 3d semantic modelling using stereo vision, Proc. 2013 IEEE International Conference on robotics and Automation. IEEE, 2013, pp. 580-585. DOI: 10.1109/ICRA.2013.6630632 [12] J. Jeong, T.S. Yoon, J.B. Park, Multimodal sensor-based semantic 3d mapping for a large-scale environment, Expert Systems with Applications, 105 (2018), pp. 1–10. DOI: 10.1016/j.eswa.2018.03.051 [13] Z. Qiu, Y. Zhuang, F. Yan, H. Hu, W. Wang, RGB-DI Images and Full Convolution Neural Network-Based Outdoor Scene Understanding for Mobile Robots, IEEE Transactions on Instrumentation and Measurement, 68(1) (2019), pp. 27-37. DOI: 10.1109/TIM.2018.2834085 [14] X. Chen, Y. Liu, K. Achuthan, WODIS: Water Obstacle Detection Network Based on Image Segmentation for Autonomous Surface Vehicles in Maritime Environments, IEEE Transactions on Instrumentation and Measurement, 70 (2021), pp. 1-13, 7503213. DOI: 10.1109/TIM.2021.3092070 [15] L. Gan, R. Zhang, J.W. Grizzle, R.M. Eustice, M. Ghaffari, Bayesian Spatial Kernel Smoothing for Scalable Dense Semantic Mapping, IEEE Robotics and Automation Letters, 5(2) (2020). DOI: 10.1109/LRA.2020.2965390 [16] Q. Wang, J. Gao, Y. Yuan, A Joint Convolutional Neural Networks and Context Transfer for Street Scenes Labeling, IEEE Trans. on Intelligent Transportation Systems, 19(5) (2018), pp. 1457-1470. DOI: 10.1109/TITS.2017.2726546 [17] V. Balaska, L. Bampis, I. Kansizoglou, A. Gasteratos, Enhancing satellite semantic maps with ground-level imagery, Robotics and Autonomous Systems, 139 (2021) 103760. DOI: 10.1016/j.robot.2021.103760 [18] V. Badrinarayanan, A. Kendall, R. Cipolla, SegNet: A deep convolutional encoder–decoder architecture for image segmentation, IEEE Trans. Pattern Analysis Mach. Intell. 39 (12) (2017), pp. 2481–2495. DOI: 10.1109/TPAMI.2016.2644615 [19] X. Li, D. Wang, H. Ao, R. Belaroussi, D. Gruyer, Fast 3D Semantic Mapping in Road Scenes, Appl. Sci. 9(4) (2019), pp. 631. DOI: 10.3390/app9040631 [20] Y. Yang, F. Qiu, H. Li, L. Zhang, M.-L. Wang, M.-Y. Fu, Large- scale 3D Semantic Mapping Using Stereo Vision, International Journal of Automation and Computing, 15(2) (2018), pp. 194-206. DOI: 10.1007/s11633-018-1118-y [21] D. Paz, H. Zhang, Q. Li, H. Xiang, H.I. Christensen, Probabilistic Semantic Mapping for Urban Autonomous Driving Applications, Proc. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) October 25-29, 2020. DOI: 10.1109/IROS45743.2020.9341738 [22] F. Andert, Drawing stereo disparity images into occupancy grids: Measurement model and fast implementation, Proc. IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2009, pp. 5191–5197. DOI: 10.1109/IROS.2009.5354638 [23] Raúl Mur-Artal, Juan D. Tardós, ORB-SLAM2: an Open-Source SLAM System for Monocular, Stereo and RGB-D Cameras. IEEE Transactions on Robotics, 33 (5) (2017), pp. 1255-1262. DOI: 10.1109/TRO.2017.2705103 https://doi.org/10.3390/s21010073 https://doi.org/10.1109/MC.2008.479 https://doi.org/10.1016/j.measurement.2019.107136 https://doi.org/10.21014/acta_imeko.v8i4.680 https://doi.org/10.1109/JSEN.2019.2931368 https://doi.org/10.1109/TRO.2016.2624754 https://doi.org/10.1177/1729881420919185 https://doi.org/10.1007/978-3-030-01234-2_49 https://doi.org/10.1109/TPAMI.2017.2699184 https://doi.org/10.1109/ICRA.2013.6630632 https://doi.org/10.1016/j.eswa.2018.03.051 https://doi.org/10.1109/TIM.2018.2834085 https://doi.org/10.1109/TIM.2021.3092070 https://doi.org/10.1109/LRA.2020.2965390 https://doi.org/10.1109/TITS.2017.2726546 https://doi.org/10.1016/j.robot.2021.103760 https://doi.org/10.1109/TPAMI.2016.2644615 https://doi.org/10.3390/app9040631 https://doi.org/10.1007/s11633-018-1118-y https://doi.org/10.1109/IROS45743.2020.9341738 https://doi.org/10.1109/IROS.2009.5354638 https://doi.org/10.1109/TRO.2017.2705103 ACTA IMEKO | www.imeko.org December 2021 | Volume 10 | Number 4 | 161 [24] R. A. Hewitt, E. Boukas, M. Azkarate, M. Pagnamenta, J. A. Marshall,A. Gasteratos, G. Visentin, The katwijk beachplanetary rover dataset, The International Journal of Robotics Research, 37(1) (2018), pp. 3-12. DOI: 10.1177/0278364917737153 [25] J. Long, E. Shelhamer, T. Darrell, Fully convolutional networks for semantic segmentation, in Proc. of the IEEE conference on computer vision and pattern recognition, 2015, pp. 3431-3440. [26] O. Ronneberger, P. Fischer, T. Brox, U-net: Convolutional networks for biomedical image segmentation, Proc. International Conference on Medical image computing and computer-assisted intervention. Springer, 2015, pp. 234-241. DOI: 10.1007/978-3-319-24574-4_28 [27] S. Chiodini, L. Torresin, M. Pertile, S. Debei, Evaluation of 3D CNN Semantic Mapping for Rover Navigation, Proc. IEEE 7th International Workshop on Metrology for AeroSpace (MetroAeroSpace), 2020, pp. 32-36. DOI: 10.1109/MetroAeroSpace48742.2020.9160157 [28] L. Torresin, Sviluppo ed applicazione di reti neurali per segmentazione semantica a supporto della navigazione di rover marziani, Master's Thesis Report, 2019. [29] S. Chiodini, M. Pertile S. Debei, Semantic Mapping for Rover Navigation, Proc. IV Forum Nazionale delle Misure, 2020. [30] R. Van den Boomgard, R. van Balen, Methods for Fast Morphological Image Transforms Using Bitmapped Images, Computer Vision, Graphics, and Image Processing: Graphical Models and Image Processing, 54(3) (1992), pp. 254-258. DOI: 10.1016/1049-9652(92)90055-3 [31] H. Hirschmuller, Accurate and efficient stereo processing by semi- global matching and mutual information, Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 05), 2005, pp. 807-814. DOI: 10.1109/CVPR.2005.56 [32] S. Thrun, Probabilistic robotics, Communications of the ACM, 45(3) , 2002, pp. 52-57. DOI: 10.1145/504729.504754 https://doi.org/10.1177/0278364917737153 https://doi.org/10.1007/978-3-319-24574-4_28 https://doi.org/10.1109/MetroAeroSpace48742.2020.9160157 https://doi.org/10.1016/1049-9652(92)90055-3 https://doi.org/10.1109/CVPR.2005.56 https://doi.org/10.1145/504729.504754