International Journal of Interactive Mobile Technologies (iJIM) – eISSN: 1865-7923 – Vol 16 No 21 (2022) Paper—An Intelligent Autonomous Document Mobile Delivery Robot Using Deep Learning An Intelligent Autonomous Document Mobile Delivery Robot Using Deep Learning https://doi.org/10.3991/ijim.v16i21.32071 Thittaporn Ganokratanaa1, Mahasak Ketcham2() 1 Faculty of Science, King Mongkut's University of Technology Thonburi, Bangkok, Thailand 2 Faculty of Information Technology and Digital Innovation, King Mongkut's University of Technology North Bangkok, Bangkok, Thailand mahasak.k@itd.kmutnb.ac.th Abstract—This paper presents an intelligent autonomous document mobile delivery robot using a deep learning approach. The robot is built as a prototype for document delivery service for use in offices. It can adaptively move across different surfaces, such as terrazzo, canvas, and wooden. In this work, we intro- duce a convolutional neural network (CNN) to recognize the traffic lanes and the stop signs with the assumption that all surfaces have identical traffic lanes. We train the model using a custom indoor traffic lane and stop sign dataset with the label of motion directions. CNN extracts a direction-of-motion feature to estimate the robot's direction and to stop the robot based on an input image monocular camera view. These predictions are used to adjust the robot's direction and speed. The experimental results show that this robot can move across different surfaces along with the same structured traffic lanes, achieving the model accuracy of 96.31%. The proposed robot helps to facilitate document delivery for office workers, allowing them to work on other tasks more efficiently. Keywords—autonomous, mobile, delivery robot, convolutional neural network 1 Introduction Advanced technology plays an important role in the daily lives of humans for all generations. A robot is an advanced technology that is widely used in factories in vari- ous systems. It has gained popularity for use with logistics systems in production pro- cesses. Transportation is a key function of the logistics system where it transfers goods from the place of origin to another place regarding customer demands at a time. Be- sides, robot technologies have been implemented in the transportation system by re- placing human labor to reduce costs and time in warehouse transfers. To facilitate the transportation system, automated guided vehicles (AGVs) have been used in modern factory systems for over a decade [1]. The reason why AGVs are im- portant is that they can move around in restricted environments [2-3]. The processing of AGVs can be based on computer vision techniques along with automatic guidance and landmark recognition systems. However, the environment changes, including the brightness of lights, floor patterns, and landmark position, can affect the vision systems 4 http://www.i-jim.org https://doi.org/10.3991/ijim.v16i21.32071 Paper—An Intelligent Autonomous Document Mobile Delivery Robot Using Deep Learning [4,5]. Recent works try to improve the efficiency of the robots by reducing unnecessary movement between shelves and using linear motors to reduce the control complexity. Some works install robots in a factory to move cartons and pallets and to unload and load goods [6-9]. To ensure safe operation, a cloud-based communication system [10,11] was developed to enable communication among autonomous vehicles carrying out transportation functions within a factory. A safe laser scanner or wireless commu- nication system is used to reduce the possibility of collisions [12,13]. Each AGV detects obstacles in the surrounding areas and then either stops or changes its direction of movement to avoid collisions [14,15]. The computer vision techniques used in autono- mous robots are important for object detection [16,17] in conjunction with a human operator. Computer vision techniques have been used in various models of autonomous robots, such as road data collection [18-24] and target tracking [25-28]. However, alt- hough using autonomous robots to help manage an inventory system is beneficial, path planning is also required for this task. Path planning algorithms have been well-devel- oped and improved to be more efficient; some examples include the field D* algorithm and the improved chaotic motion path planner [29,30]. In this work, we propose an intelligent autonomous mobile delivery robot (MDR) using a deep learning approach. Our proposed method can recognize various traffic signs on different surfaces based on our dataset. The main contributions of this study are as follows: 1. We present the development of an autonomous mobile delivery robot that combines packages of delivery feature, a human-assistance feature, and an automatic move- ment feature. It uses a camera as the only sensor to detect images and deep learning techniques to recognize traffic line-markings. 2. Our autonomous mobile delivery robot can be driven across various surfaces, in- cluding a terrazzo floor, a canvas floor, and a wooden floor, in an environment with road markings similar to those in the training images, thus obviating the need for new road marking recognition technologies. We hypothesize that if a car were to be driven to many places across different road surfaces and in different environments but with the same traffic lanes, a robot would be able to move along designated di- rections without needing to be retrained regarding road marking recognition. 3. We combine the parcel-handling features of a factory assistant robot and the acces- sibility features of human-assistance robots to the proposed autonomous mobile de- livery robot. These two features respectively provide the autonomous mobile deliv- ery robot with the ability to transport packages and interact with humans. The devel- oped robot does not only move automatically and deliver packages but can also move applicably in complicated office environments. The remainder of this paper is organized as follows: The related works are reviewed in section two. The proposed system is presented in detail in section three. The experi- mental results and a discussion are given in section four. Finally, the conclusions sec- tion has been concluded in section five. iJIM ‒ Vol. 16, No. 21, 2022 5 Paper—An Intelligent Autonomous Document Mobile Delivery Robot Using Deep Learning 2 Related works 2.1 Autonomous robots Autonomous robots are machines with the ability to move around in environments such as factories, houses, hospitals, apartments, and public places. Such robots can per- ceive objects in these environments and can make decisions. Generally, studies on ro- botic perception systems employ a camera sensor technology in conjunction with other sensors [31]. Sensors are critical devices that enable robots to receive input [32,33]. A camera is a sensor that can work with LiDAR, radar, and ultrasonic technologies. It can help in understanding the environment when provided with a stable infrastructure, such as traffic lane signs, speed signs, and speed limit signs [34, 35]. Regarding robot automatic movement features, researchers have examined the ben- efits and conducted studies related to road recognition. Typically, an autonomous robot was used to learn public and unstructured roads that do not have traffic lanes. The image processing technique has been used to recognize constructed and unconstructed roads with shadow and illumination changes and the neural network has been chosen to han- dle the lane recognition tasks. Qingji Gao and Qijun Luo SM [19] proposed the Rough Set Based Unstructured Road Detection (RSURD) method employed with the HSV color model to convert camera images to a color spectrum ranging from 0 to 256 levels. This method enables the robot to examine roads with complicated structures using a low-complexity algorithm. The histogram is used to build the color model of the roads and compare it in each pixel for each color model. However, it relied on tracking results from one image to another and were unable to recognize roads with different color properties. Kuo-Yu Chiu, et. al. [36] presented a color-based segmentation method for lane detection by selecting an area of interest (ROI), group the traffic lines by choosing a white line and applying a color threshold to separate the road and traffic colors, and finally use the line boundaries to detect the traffic line. This method can easily eliminate the influence due to sunlight, shadows on sidewalks, and obstacles such as vehicles and pedestrians. However, it depends on the manual settings of parameters. Zhang, F., et. al. [37] introduced a lane marking separation technique using Random Finite Set Sta- tistics for estimating the position of road markings and using the Probability Hypothesis Density (PHD) filter. They match between the visual coordinate (u, v) of the tracked pixel and the corresponding point (x, y) on the ground plane (vehicle coordinate) to extract the characteristics of the traffic line. The PHD filters are used to reduce the computational difficulty of Bayes filters, resulting in higher cost on time complexity. Amaradi P. et. al. [38] presents the Hough transform technique of lane tracking and obstacle detection using LIDAR sensors to measure the distance drifted from the center of the lane to be able to detect obstacles. Kim ZW [39] presented the RANdom SAmple Consensus (RANSAC) algorithm to find the lane boundary hypothesis in real-time. The probability clustering algorithm is applied to group the lane boundary assumption as the left and right lane boundaries. They hypothesized left and right lane boundaries separately and identified the traffic lines with Intensity-Bump Detection, ANNs, NBCs, and SVM. The results show that SVMs show better performance than other classifiers, and ANNs are inferior in performance. 6 http://www.i-jim.org Paper—An Intelligent Autonomous Document Mobile Delivery Robot Using Deep Learning From the aforementioned works, our research is different as we do not consider the color of the surface where the robot moves. We use deep learning techniques to learn traffic lanes, allowing the robot to move on different surfaces. The proposed method is designed to allow the robot to move on various surfaces, including terrazzo, canvas, and wooden floors. The model has trained its movement from the dataset on a terrazzo floor environment and tested it on canvas and wooden floors. 2.2 Recognition using a convolutional neural network The movement of the robot in the traffic lane is addressed in recognition using a convolutional neural network (CNN) to achieve end-to-end training. End-to-end plat- form [21, 22], a new model for self-driving cars (i.e., autonomous cars), was presented as a brain-inspired cognitive model with attention (CMA). This model functions to sim- ulate the operation of the human brain. It describes the relationship between complex traffic scenes and the recurrent neural network updated using an attention mechanism and short-term memory. Eleftheriou G., et. al. [40] presents the development of a linear walking robot based on a Fuzzy-based PID closed-loop control system. The ruled-based method works in tandem with the computer vision system where system detects and recognizes the lines in front of the robot using a Pixy2 camera and reports them to the microcontroller board. These visual data enhance the PID controller's ability to respond on time to rapid nonlinear changes from preset values. Simmons B., et al. [41] proposed an end-to-end learning approach to help small remote-controlled cars run in indoor en- vironments. The Deep Artificial Neural Network algorithm (CUA-DNN) and Convo- lution Neural Network Algorithm (CUA-CNN) were used in the training to map proto- types that presented the mechanical, electrical, and software design of self-driving cars. The accuracy and loss of the two neural networks were compared with VGG16 and DenseNet models. A finite state machine was used to control vehicle behavior when changing lanes and stopping state. Do TD., et al. [42] focused on finding a model that mapped the dataset to the predictive output of steering angle using deep neural net- works. There are two parts; i) building a 1/10 scale RC car platform, computer, Rasp- berry Pi 3 Model B, and front camera; ii) is building a mock test road on a Raspberry Pi to autonomously drive in an outdoor environment around an oval and Figure 8 with a traffic sign. The results demonstrated the efficiency and strength of the automatic model in lane healing tasks. 3 Proposed methods 3.1 Data collection We have collected a total of 120000 images, including 70000 forward, 24500 left turns, 24500 right turns, and 1000 stop sign images. We put the robot in the lane and operate it by hand control via a smartphone. The camera is set to capture 160×120 pixels with a frame rate of 20 fps. A camera was set with the Raspberry Pi, where it iJIM ‒ Vol. 16, No. 21, 2022 7 Paper—An Intelligent Autonomous Document Mobile Delivery Robot Using Deep Learning performs two functions: collecting image datasets for training and receiving image sig- nals as input values for testing the model. While collecting the dataset, the microcon- troller calculates the motion linear velocity and wheel angle for each image frame. The main key factors to consider in this work are appearance and motion. The ap- pearance is the presence of the traffic lanes and the stop signs in an image. We use the RGB and Grayscale image datasets with the same size (i.e., 160×120 pixels) to train the model. For training, the grayscale image model improves recognition performance [43, 44]. Therefore, we train the model with RGB and grayscale images to clarify which types of images are affected by the recognition performance of the model. While ma- nipulating the robot to collect the images, the dataset is generated with one motion lin- ear velocity value and one wheel angle value per image. The RGB color images are converted to grayscale images as shown in Eq (1): 𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺 𝑠𝑠𝑠𝑠𝐺𝐺𝑠𝑠𝑠𝑠 𝑖𝑖𝑖𝑖𝐺𝐺𝑖𝑖𝑠𝑠 = (𝑊𝑊1 ∗ 𝑅𝑅) + (𝑊𝑊2 ∗ 𝐺𝐺) + (𝑊𝑊3 ∗ 𝐵𝐵), (1) where W1 + W2 + W3 = 1 and W1, W2, and W3 > 0. R, G, and B represent the red, green, and blue values at the conversion point, respectively, while W1, W2, and W3 are the weights of each red, blue, and green, respectively. During testing the robot's autonomous movement, it has been found that when the robot encounters an obstacle, it chooses to move on an unobstructed path. This allows the robot to be able to avoid obstacles. For the motion, the feature extraction used in the movement of the robot includes linear velocity, wheel degrees, and motion characteristics. Motion linear velocity value and wheel angle value have occurred while collecting the image dataset, traffic lanes, and stop signs. These values are controlled manually via a smartphone. The motion linear velocity and the wheel degree are calculated by the program installed in the robot. The motion linear velocity is the velocity that occurs as the robot moves in a plane. As the robot moves through each point, a linear coordinate is obtained and represented in the form of Eq (2). 𝑣𝑣0 = 𝑖𝑖𝑚𝑚 + 𝑏𝑏, (2) where 𝑣𝑣0 is the point at which the line crosses the y-intercept, x is the point where the line crosses the x-axis, m is the slope of the line, and b is the y-intercept. The robot can move forward, turn left, or turn right. Forward motion is carried out in a straight line and can be calculated from the linear velocity as shown in Eq (3). 𝑣𝑣 = 𝑣𝑣0 + 𝐺𝐺𝑎𝑎, (3) where 𝑣𝑣 is the change in position from point to point, 𝑣𝑣0 is the initial linear velocity, 𝐺𝐺 is the acceleration, and 𝑎𝑎 is time. The wheel angle is the degree of wheel rotation caused by robot movement in the left and right directions as shown in Eq (4). The angle of the robot wheel can be changed to a nonlinear operation by input from the front axle to the nearest route point (cx, cy). The front-wheel steering position provides easy handling and helps the wheels to move correctly in the path without slipping off. 8 http://www.i-jim.org Paper—An Intelligent Autonomous Document Mobile Delivery Robot Using Deep Learning 𝛿𝛿(𝑎𝑎) = 𝜃𝜃(𝑎𝑎) + 𝐺𝐺𝐺𝐺𝑠𝑠𝑎𝑎𝐺𝐺𝑎𝑎 𝑘𝑘𝑘𝑘(𝑡𝑡) 𝜐𝜐𝑥𝑥(𝑡𝑡) , (4) where δ(t) is the degree of inclination of the robot’s front wheels, 𝜃𝜃(𝑎𝑎) is the angle describing the trajectory of the robot, kx(t) is the increase in speed of the robot, and 𝜐𝜐𝑘𝑘(𝑎𝑎) is the speed of the robot over time t. The center of the robot’s front wheel from the nearest point along the edge, 𝜃𝜃(𝑎𝑎) is set to zero; 𝜐𝜐𝑘𝑘(𝑎𝑎) is the speed of the robot at time t and 𝜃𝜃 is the angle that describes the trajectory of the robot. Here, 𝜃𝜃 is the variable wheel angle, which can be calculated from Eq (5). 𝜃𝜃 = ω 𝑡𝑡 , (5) where 𝜃𝜃 represents the wheel angle offset (in degrees), ω is the angular velocity, and t is time. We have defined images as input data for the model and the motion linear velocity and wheel angle as output data. The motion linear velocity and wheel angle are used as a label for each image frame, leading to the movements of the robot as forward, left, right, and stop for each frame. The labels are represented as [1 0 0 0], [0 1 0 0], [0 0 1 0], and [0 0 0 1], denoting forward, left, right, and stop, respectively. The features of the robots are the image, motion linear velocity, and wheel angle. The motion linear velocity and wheel angle values were collected in the dataset. The advantage of per- forming on these features is that the robots perform well and work rapidly without un- due complexity while yielding good results [45]. The motion and the wheel angle of the robot can be classified as forward, left, right with the criteria of 0