Mechatronics, Electrical Power, and Vehicular Technology 04 (2013) 99-108 Mechatronics, Electrical Power, and Vehicular Technology e-ISSN: 2088-6985 p-ISSN: 2087-3379 Accreditation Number: 432/Akred-LIPI/P2MI-LIPI/04/2012 www.mevjournal.com © 2013 RCEPM - LIPI All rights reserved doi: 10.14203/j.mev.2013.v4.99-108 OBJECT RECOGNITION SYSTEM IN REMOTE CONTROLLED WEAPON STATION USING SIFT AND SURF METHODS Midriem Mirdanies a, Ary Setijadi Prihatmanto b, Estiko Rijanto a a Research Center for Electrical Power and Mechatronics, Indonesian Institute of Sciences (LIPI) Komp LIPI Bandung, Jl. Sangkuriang, Gd. 20. Lt. 2, Bandung 40135, Indonesia b School of Electrical Engineering And Informatics, Bandung Institute of Technology (ITB) Jl. Ganesha No 10, Bandung 40132, Indonesia Received 8 April 2013; received in revised form 12 November 2013; accepted 13 November 2013 Published online 24 December 2013 Abstract Object recognition system using computer vision that is implemented on Remote Controlled Weapon Station (RCWS) is discussed. This system will make it easier to identify and shoot targeted object automatically. Algorithm was created to recognize real time multiple objects using two methods i.e. Scale Invariant Feature Transform (SIFT) and Speeded Up Robust Features (SURF) combined with K-Nearest Neighbors (KNN) and Random Sample Consensus (RANSAC) for verification. The algorithm is designed to improve object detection to be more robust and to minimize the processing time required. Objects are registered on the system consisting of the armored personnel carrier, tanks, bus, sedan, big foot, and police jeep. In addition, object selection can use mouse to shoot another object that has not been registered on the system. Kinect™ is used to capture RGB images and to find the coordinates x, y, and z of the object. The programming language used is C with visual studio IDE 2010 and opencv libraries. Object recognition program is divided into three parts: 1) reading image from kinect™ and simulation results, 2) object recognition process, and 3) transfer of the object data to the ballistic computer. Communication between programs is performed using shared memory. The detected object data is sent to the ballistic computer via Local Area Network (LAN) using winsock for ballistic calculation, and then the motor control system moves the direction of the weapon model to the desired object. The experimental results show that the SIFT method is more suitable because more accurate and faster than SURF with the average processing t ime to detect one object is 430.2 ms, two object is 618.4 ms, three objects is 682.4 ms, and four objects is 756.2 ms. Object recognition program is able to recognize multi-objects and the data of the identified object can be processed by the ballistic computer in realtime. Keywords: RCWS, object recognition, shared memory, SIFT, SURF, opencv, C language, kinect™. I. INTRODUCTION Defense system of a country requires reliable weapon systems [1], one of them is Remote Controlled Weapon Station (RCWS). RCWS is commonly used in modern combat equipment such as Tank or Armoured Personnel Carrier (APC). RCWS is a weapons system that can be operated remotely from vehicle cabin so the gunner is protected savely. This paper describes the results of the computer vision research which is implemented on RCWS. This system will make it easier to recognize and to shoot targeted objects automatically, including to shoot multiple objects. Kinect™ is used to capture RGB images and the object coordinates x, y, and z in meters [2] which can be seen in Figure 1. Kinect™ consists of infrared cameras, RGB camera, infrared laser projector, microphone, and tilt motors. The research was performed at the Mechatronics Lab. - Research Center for Electrical Power and Mechatronics (Indonesian Institute of Sciences), and the School of Electrical Engineering And Informatics Lab. - Bandung Institute of Technology (ITB). RCWS model tested to move the weapon model in the Laboratory. The weapon model can be seen in * Corresponding Author. Phone: +62-22-2503055 E-mail: midriem.mirdanies@lipi.go.id http://dx.doi.org/10.14203/j.mev.2013.v4.99-108 M. Mirdanies et al. / Mechatronics, Electrical Power, and Vehicular Technology 04 (2013) 99-108 100 Figure 2. Kinect™ is placed in the front side and separated from the weapon model. Methods that can be used in tracking and object recognition have been reported by Yilmaz [3]. Methods to recognize objects can be based on color [4], shape, or other. Methods used in this research is Scale Invariant Feature Transform (SIFT) [5], [6] and Speeded Up Robust Features (SURF) [7] combined with the K-Nearest Neighbors (KNN) and Random Sample Consensus (RANSAC) for verification. The programming language used is C with visual studio IDE 2010 and Open CV libraries [8], [9], [10]. Experiment was done to compare the accuracy and process speed of the both methods. An algorithm is developed to detect multiple objects simultaneously, while improving object detection results to be more robust and to minimize the time required for the object recognition. The recognized objects are sent to the ballistic computer through Local Area Network (LAN) using winsock [11] for further processing. Object recognition program is divided into three section: capturing images from kinect™ and simulation results, object recognition process, and object data transfer to the ballistic computer. Communication between programs was performed using shared memory. With shared memory, data on the memory can be accessed by multiple programs [12]. Concept used in SIFT and SURF method is to find interest points, also called keypoint or feature points. The idea is to find the unique features in the images and perform analysis on that feature. The advantage of using SIFT and SURF methods than some other methods is scale invariant, i.e. the object that had been detected may have different scale than the comparison object. It also supports object rotation and can detect objects even if partially visible. There are four major stages in the SIFT method to generate the set of image features: scale-space extrema detection, keypoint localization, orientation assignment, and calculating keypoint descriptor [13]. After doing these stages, features will be obtained which is used as a descriptor of an object to be processed further. In the process, SIFT uses Laplacian filter. Laplacian filter in difference scale is calculated based on Gaussian filter. The stages of SURF almost the same as SIFT, but it uses Hessian determinant filter, not the Laplacian filter. The main difference between the SURF and the SIFT is the speed and accuracy, typically SURF gives faster process, whereas SIFT is more accurate in finding its matching feature [14]. II. REMOTE CONTROLLED WEAPON STATION DESIGN Figure 3 shows the RCWS design conducted in this research. There are three main parts on the RCWS system, i.e. computer vision, ballistic Figure 1. Kinect™ Sensor used in this research Color sensor IR emitter IR depth sensor Tilt motor Microphone array Figure 3. RCWS Design Figure 2. Weapon model M. Mirdanies et al. / Mechatronics, Electrical Power, and Vehicular Technology 04 (2013) 99-108 101 computer, and motor control. The dotted lines in Figure 3 are area of this research. There are two computers used, the first computer is for computer vision, and the other is for ballistic computer and motor control. Communication between both computers is done via LAN. Image from kinect is processed by the first computer to recognize object. Objects coordinate that are detected will be sent to the second computer to do ballistic calculation. Ballistic calculation is performed by several parameters, i.e. the object coordinates, projectile, and other data from the temperature sensor, wind velocity and wind direction in order to obtain the correction of shooting angle (azimuth and elevation), then the motor control system will give the command to the motor driver to move the weapon model in order to direct towards a selected object. III. COMPUTER VISION DESIGN Two computer programs have been created. The first is used to store data in a database (training data) that can be seen in Figure 4. The second is realtime object recognition process that can be seen in Figure 6 and Figure 7. The database file was created with ".yml" extension and it is saved separately for each object and method. It is intended to be efficient, so only the required files are used. This would avoid the buildup of data in a single file. The process storage of objects data to the database is done by selecting the desired object using mouse Figure 5. Objects stored in the database Figure 4. Object data storage process in the database M. Mirdanies et al. / Mechatronics, Electrical Power, and Vehicular Technology 04 (2013) 99-108 102 so it is user friendly. Objects used consist of toys APC, Tank, Bus, Sedan, Big foot, and Police jeep. Figure 5 shows the objects, while the database file object is listed in Table 1. Object was trained from several sides. Each file contains object name, images, key point, and the object descriptor. All data is stored in the database so there is no need to look for key point and descriptor again during the process of object recognition. This method will speed up the program execution. Figure 6 shows real time objects identification flowchart. In the initial phase, all databases are loaded into memory, to speed up the execution time. The experiment result shows that the speed of the object identification process is faster than if it is made directly to the database. The next step is to compare all the data in the database that has been loaded into memory with the image data from kinect™ to find a match object. This process is based on an algorithm created by Robert Lagainere [14]. The experimental results show that the objects identification becomes more robust. However, this study has done some modifications, such as addition of data checking i.e. keypoint checking before matching process, symMatches before RANSACTest, outMatches before the conversion of keypoint to point process, and point before the findFundamentalMat. The experimental result shows that realtime process will generate error if the data input does not exist or insufficient extent. The algorithm was also modified to be able to choose the method used (SIFT or SURF method) by changing the parameter setJenisFeature. In addition, the searching process of the keypoint and descriptor for each compared image Table 1. Object database file No File name Object Method The amount of data Object Name Images Keypoints Descriptors 1 SIFT_bus.yml Bus SIFT 1 36 36 36 2 SURF_bus.yml Bus SURF 1 36 36 36 3 SIFT_mobil_sedan.yml Sedan SIFT 1 36 36 36 4 SURF_mobil_sedan.yml Sedan SURF 1 36 36 36 5 SIFT_tank.yml Tank SIFT 1 38 38 38 6 SURF_tank.yml Tank SURF 1 38 38 38 7 SIFT_panser.yml APC SIFT 1 38 38 38 8 SURF_ panser.yml APC SURF 1 38 38 38 9 SIFT_big_foot.yml Big foot SIFT 1 48 48 48 10 SURF_big_foot.yml Big foot SURF 1 48 48 48 11 SIFT_jeep_polisi.yml Police jeep SIFT 1 49 49 49 12 SURF_jeep_polisi.yml Police jeep SURF 1 49 49 49 Figure 6. Real time object recognition flowchart M. Mirdanies et al. / Mechatronics, Electrical Power, and Vehicular Technology 04 (2013) 99-108 103 is not performed continuously, but only as needed, i.e. the image from the kinect™ is done only at the beginning when image from kinect™ will be compared with the database and if there is a marking of detected objects for every image. As a result, it is not necessary for the image to search a key point and descriptor again, because it’s already stored in the database. This way will speed up the program execution process. Section of finding objects and sending data objects to ballistic computer in Figure 6 consists of matching process, calculating homography, finding object coordinate and sending to ballistic computer. Detail process of finding objects and sending data objects to ballistic computer can be seen in Figure 7. Realtime object recognition program is divided into three parts and interact each other using shared memory as shown in Figure 8. Three parts of data are shared between programs: captured image from kinect™, object identified object data (name and center of the object), and a complete objects data (name and coordinate x, y, z in meter). Handle file mappings used for each data are “Global\\MatMappingObject”, “Global\\ObjekMappingObject”, and “Global\\DtLngkpMappingObject”. Figure 9 shows the flowchart of the delivery of the detected object data to the ballistic computer flowchart, Figure 10 shows the object recognition flowchart, and Figure 11 shows flowchart of the real time images captures from kinect™ and the detected object display to the screen for simulation. Figure 7. Find objects and send object data to ballistic computer Figure 8. Distribution of object identification programs with shared memory mechanism Figure 9. Distribution of object identification programs with shared memory mechanism M. Mirdanies et al. / Mechatronics, Electrical Power, and Vehicular Technology 04 (2013) 99-108 104 IV. RESULT AND DISCUSSION The computer used in this experiment is a laptop with the following specifications: processors Intel® Core™ i5-2450M CPU @ 2.50 GHz, 8 GB RAM, Microsoft Windows 7 64 bit, and Nvidia Geforce 410M, which can be seen in Figure 12. A. Object Recognition Algorithm Figure 13 is an experiment result displaying the detection of an APC. Name and coordinates x, y, and z of the detected object is displayed on the screen and command prompt. The program execution time is displayed in the command prompt. Figure 10. Object recognition flowchart Figure 11. Flowchart of the image capturing process from kinect™ and simulation results Figure 12. Laptops used in experiments M. Mirdanies et al. / Mechatronics, Electrical Power, and Vehicular Technology 04 (2013) 99-108 105 Experiments were performed to detect a tank and APC objects in the Lab, ten times for each object from several positions. Experiments to test speed of the three methods were done by calculating the process time for every iteration from the key point calculation from kinect™ to the completion of comparisons between the data and the database. Table 2 lists experiment results concerning accuracy and processing time of the three methods. From Table 2 can be seen that the SIFT method provides better accuracy and faster processing time. The SIFT method can detect 13 times in 20 experiments (65%), with an average processing time of 1519.2 ms. The SURF method can detect 8 times (50%), with an average processing time of 2667.4 ms. Based on experiments it can be seen that the less feature point (uniqueness) in an image and the ruination of lighting, the smaller the possibility of the object can be detected. It was also found that the process time of the SIFT method being faster than the SURF method Figure 13. Experiment to detect APC Table 2. Experiment results concerning accuracy and processing time using SIFT and SURF methods No Object name Detection Process time (ms) SIFT SURF SIFT SURF 1 Panser √ √ 1,511 2,329 2 Panser √ √ 1,489 2,505 3 Panser √ - 1,526 - 4 Panser √ √ 1,519 2,634 5 Panser √ √ 1,508 2,554 6 Panser - - - - 7 Panser - - - - 8 Panser √ - 1,482 - 9 Panser √ - 1,521 - 10 Panser √ - 1,583 - 11 Tank √ √ 1,511 2,876 12 Tank - - - - 13 Tank - - - - 14 Tank √ √ 1,531 2,825 15 Tank √ √ 1,541 2,829 16 Tank √ √ 1,525 2,787 17 Tank - - - - 18 Tank - - - - 19 Tank - - - - 20 Tank √ - 1,502 - Figure 14. The number of keypoints using SIFT and SURF methods 0 200 400 600 800 1000 1200 1400 0 5 10 15 20 T h e n u m b e r o f k e y p o in ts Experiments SIFT SURF M. Mirdanies et al. / Mechatronics, Electrical Power, and Vehicular Technology 04 (2013) 99-108 106 because the number of key point in the SIFT method is less than the SURF method. Figure 14 shows the comparison of number of key point using SIFT and SURF method. Twenty times of the experiments results show that the average number of SIFT key point is 274.1, while SURF is 1008.9. B. Shared Memory Figure 15 shows experiment result of the communication between the three programs using shared memory. In Figure 15 it can be seen that the three programs can communicate each other using shared memory. It can be seen from the data congruence displayed on the command prompt of each program with the data display on the screen. Experiments to see a comparison of the process time before and after the program being divided into three parts using the SIFT method can be seen in Table 3. The object recognition process was calculated since the image from kinect™ was loaded from the shared memory until the whole process of comparisons between the data from kinect™ and the database is complete. The experiments results proved that Figure 15. Communication experiment using shared memory mechanism (a) (b) Figure 16. Data transmission experiments via LAN; (a) computer vision; (b) balistic computer and motor control M. Mirdanies et al. / Mechatronics, Electrical Power, and Vehicular Technology 04 (2013) 99-108 107 object recognition process become faster than before, with an average processing time is 430.2 ms. Experiment to see object recognition time in the case of two to four objects at a time can be seen in Table 4. Table 4 indicates that the average time required to detect two objects is 618.4 ms, three objects is 682.4 ms and four objects is 756.2 ms. Figure 16 shows flow of data sent from computer vision to ballistic computer or servo motor control system. In Figure 16 it can be seen that data from the computer vision was received correctly by the ballistic computer that is Panser: -0.396863; -0.0530291 = 0.985. Experiments to examine the performance of RCWS in the case of moving weapon model towards a selected object can be seen in Figure 17. Experiments results show that the weapon model can move towards a selected object, which is proven by the laser pointer direction that leads to the object. V. CONCLUSIONS Object recognition using SIFT and SURF methods have been successfully implemented on RCWS. The object recognition algorithms can detect multiple objects simultaneously. Experiments results show that the SIFT method gives better performance than the SURF method. Process time can be minimized by optimizing the algorithm, and dividing the program into three Figure 17. RCWS experiments in the case of moving weapon model Table 3. Experiments to see object recognition process time after program was divided into three parts No Object Detected (using SIFT method) Processing time (ms) 1 Panser √ 418 2 Panser √ 407 3 Panser √ 430 4 Panser √ 405 5 Panser √ 409 6 Panser - - 7 Panser - - 8 Panser √ 387 9 Panser √ 409 10 Panser √ 468 11 Tank √ 453 12 Tank - - 13 Tank - - 14 Tank √ 461 15 Tank √ 481 16 Tank √ 461 17 Tank - - 18 Tank - - 19 Tank - - 20 Tank √ 404 Table 4. Experiments to see object recognition process time to detect two to four objects at once No Object Number Processing time (ms) 1 2 637 2 2 619 3 2 634 4 2 605 5 2 597 6 3 677 7 3 688 8 3 683 9 3 701 10 3 663 11 4 748 12 4 755 13 4 759 14 4 765 15 4 754 M. Mirdanies et al. / Mechatronics, Electrical Power, and Vehicular Technology 04 (2013) 99-108 108 parts, each part runs independently and communicates with each other using shared memory. The average processing time to detect an object is 430.2 ms, two object is 618.4 ms, three objects is 682.4 ms, and four objects is 756.2 ms. The data of the identified object can be processed by the ballistic computer in realtime manner and the weapon model can move towards the desired object. Further research may be done by increasing training data number for each object to obtain more robust object recognition. ACKNOWLEDGEMENT Authors would like to thank to the Research Center for Electrical Power and Mechatronics - Indonesian Institute of Sciences, School of Electrical Engineering And Informatics - Bandung Institute of Technology, and the Ministry of Research and Technology for the opportunity and financial support, also to Iwan Muhammad Erwin, Adi Yasri Bahri, Riyo Wardoyo, and all those who have helped conducting this research. REFERENCES [1] M. Mirdanies, et al., Kajian Kebijakan Alutsista Pertahanan dan Keamanan Republik Indonesia. Jakarta: LIPI Press, 2013. [2] N. V. Bagwe, "A study of the Microsoft’s Kinect camera," Indian Institute of Technology, Bombay2012. [3] A. Yilmaz, et al., "Object Tracking: A Survey," ACM Computing Surveys, vol. 38, no. 4, 2006. [4] A. Djajadi, et al., "A Model Vision of Sorting System Application using Robotic Manipulator," Telkomnika, vol. 8, no. 2, pp. 137-148, 2010. [5] D. L. G, "Object Recognition from Local Scale-Invariant Features," in International Conference on Computer Vision, Corfu, 1999. [6] G. Bradski and A. Kaehler, "Learning OpenCV," M. Loukides, Ed., ed. Sebastopol: O’Reilly Media, Inc., 2008. [7] H. Bay, et al., "SURF: Speeded Up Robust Features," Computer Vision - ECCV, vol. 3951, pp. 404-417, 2006. [8] I. Corp., "The OpenCV Reference Manual (Release 2.4.2)," ed, 2012. [9] I. Corp., "The OpenCV Tutorials (Release 2.4.2)," ed, 2012. [10] I. Corp., "The Open CV User Guide," ed, 2012. [11] Microsoft. Using Winsock (Windows) [Online]. Available: http://msdn.microsoft.com/en- us/library/windows/desktop/ms740632%28v =vs.85%29.aspx [12] Microsoft. Creating Named Shared Memory [Online]. Available: http://msdn.microsoft.com/en- us/library/windows/desktop/aa366551%28v =vs.85%29.aspx [13] D. G. Lowe, "Distinctive Image Features from Scale-Invariant Keypoints," International Journal of Computer Vision, vol. 60, no. 2, pp. 91-110, 2004. [14] R. Laganiere, OpenCV 2 Computer Vision Application Programming Cookbook. Birmingham: Packt Publishing, 2011. 10.14203/j.mev.2013.v4.99-108