Начиная с начала 2000 года осуществляется внедрение GHIS в здравоохранении, в рамках принятого проекта о реформирование информ Mathematical Problems of Computer Science 44, 42--50, 2015. 3D Scanner from Two Web Cameras Aram V. Gevorgyan1 and Hakob G. Sarukhanyan2 1Russian-Armenian University, 2Institute for Informatics Automation Problems of NAS RA e-mail: aramgv@gmail.com, hakop@ipia.sci.am Abstract In this paper we present an approach how to build 3D scanner using two ordinary web cameras. This approach is based on stereovision and is beneficial from the other 3D scanners that it doesn’t need laser, structured light or other expensive sensors. We put the object in front of the cameras at a distance about 40-50 cm and rotate it at small degrees. For every rotation we compute the disparity map from two cameras and then get the depth map in the form of point cloud. Then we filter out our object from the scene and then merge object point clouds from different views to get a full 3D model. Keywords: Stereovision, Depth map, Registration, ICP, 3D model. 1. Introduction The acquisition of 3d models from real objects is a very actual problem and has many applications in cinema, video games, education software and others. Device that can automatically acquire 3D is called 3D scanner. There are different types of 3D scanners: contact 3D scanners that need physical touch to get 3D information, or non-contact, that usually use laser or structured lighting. But most of these devices are expensive. In this article, we represent a system for getting 3D model from 2 web cameras using stereovision. This approach is beneficial from the others that it’s enough to have two general web cameras and no other additional equipment. Stereovision is a technique that gives 3D information ( X, Y, Z coordinates) of the scene using 2 or more cameras. It usually consists of the following steps: camera calibration, rectification, stereo correspondence (obtaining disparity map), triangulation (obtaining depth map from disparity map). We will speak about these steps in details in Section 2. Image with its depth (Z coordinate) is usually called 2.5D. From the left and the right camera two images we get point cloud of the object from one scene of view, but it’s not enough to have an object 3D model. To build the object 3D model we need to take object images from different scenes and then merge these point clouds into one common, this process is known as registration and is covered in Section 2. 42 mailto:gevorgka@gmail.com mailto:hakop@ipia.sci.am A. Gevorgyan and H. Sarukhanyan 43 2. Related Work 2.1 Calibration Camera calibration is the process of relating the ideal model of the camera to the actual physical device and of determining the position and orientation of the camera with respect to a world reference system. Camera internal parameters include focal length, image format and principal point. External parameters contain information about camera position and orientation in a world, rotation and translation vectors. If we know the internal and the external parameters we can say that the camera is calibrated. Cameras calibration is needed for rectification and depth calculation. Camera calibration is usually done by using chessboard pattern. More detailed calibrations is described in [1],[2], [3]. 2.1 Rectification If the two cameras aligned to be coplanar, the two images have the same image plane, so the corresponding points have the same row coordinates and the stereo correspondence problem is reduced from 2D to 1D. But in real world it is impossible to place two cameras ideally coplanar so rectification is needed. Rectification is a process of reprojecting image planes onto the common plane parallel to the line between optical centers that is called baseline. Rectification is done using epipolar geometry [4]. Epipolar geometry is the intrinsic projective geometry used in stereovision. It is independent of scene structure, and only depends on the cameras' internal parameters and relative pose. Main concepts of epipolar geometry: The epipole is the point of intersection of the baseline with the image plane. An epipolar plane is the plane defined by a 3D point (or equivalently image point) and the optical centres. An epipolar line is the intersection of an epipolar plane with the image plane. Fig. 1. The epipolar constraint. Epipolar constraint: We must search the correspondent point of the point in its correspondent epipolar line in the other image. 3D Scanner from Two Web Cameras 44 2.2 Stereo Correspondence Stereo correspondence is a problem of finding matching pixels in the left and right images corresponding to the same points on the 3D surface. If images are rectified - all epipolar lines are parallel and the corresponding points lie on the same row in both images. Suppose (𝑥𝑥, 𝑦𝑦) are coordinates of the point in the left image and (𝑥𝑥′, 𝑦𝑦′) in the right image, where y= 𝑦𝑦′ because images are rectified. After finding the corresponding points we can compute the disparity map, where disparity for each pixel is 𝑥𝑥 − 𝑥𝑥′. The disparity measures the displacement of a point between the two images. Correspondence problem is complicated by occlusion fact, some pixels are visible only in one of the images. There are plenty of stereo matching algorithms. They usually can be divided into two classes: local methods and global methods. In local methods disparity computation at a given point depends only on intensity values within a finite window. In contrast to local methods, in global methods disparity computation is done for all reference image pixels at once. This is achieved by minimizing energy function that consists of data term and smoothness term. Local methods are fast enough but their accuracy is not good and may be inacceptable. While global methods have good accuracy but they are time consuming. Most commonly used stereo correspondence algorithms are: Block Matching, Semi-Global Matching [5] and Graph Cuts. Block Matching is a local method, Graph Cut is a global method that gives very good results but works slow and Semi-Global matching gives almost the same results as global methods but it works extremely faster. A good survey of stereo correspondence algorithms is in [6]. 2.3 Triangulation If we know the geometric arrangement of the cameras, then we can calculate the depth map from the disparity map by triangulation. Let us consider the simple case: Suppose we have 2 cameras with parallel optical axes. Let the baseline be perpendicular to the optical axes of the cameras and parallel to x-axis.The distance between the cameras is d and the cameras have equal focal length f. Fig. 2. A simplified stereo imaging system. A. Gevorgyan and H. Sarukhanyan 45 Consider (x, y, z) are coordinates of a point in three-dimensional world, where 𝑧𝑧 is a depth. Let this point have (xl, yl) and (𝑥𝑥𝑥𝑥, 𝑦𝑦𝑥𝑥) coordinates in the left and right image planes of the respective cameras. Then from the similar triangles we have: z f df z d xl xr xl xr = ⇒ = − − , where xl − xr is disparity. Thus, the depth at various scene points may be recovered by knowing the disparities of the corresponding image points. Also we can compute: ( ) ( ) ( ) ( ) d xl xr d yl yr x , y xl xr xl xr + + = = − −2 2 . As we can see depth is inversely proportional to disparity, so when disparity is near 0, small disparity differences make for large depth differences, and when disparity is large, small disparity differences do not change the depth by much. So the depth is computed more accurately for objects near the camera. Having the depth map we can generate the point cloud. Point cloud is a set of data points in some coordinate system. More about how to compute the depth is in [7], and how to compute the object distance from web cameras – in [8]. 2.4 Registration Registration is a process of merging point clouds from different scenes into one common model. In other words, registration transforms multiple 3D point clouds into the same coordinate system so as to align overlapping components of these sets. The most popular and commonly used algorithm of registration is the Iterative Closest Point (ICP). There are many variants of the algorithm, but all these variants affect one of six stages of the algorithm: 1. Selection subsets of two point clouds 2. Matching corresponding pairs 3. Weighting the corresponding pairs appropriately 4. Rejecting certain pairs based on looking at each pair individually or considering the entire set of pairs 5. Compute an error metric based on the point pairs 6. Minimizing the error metric More about ICP and its variants are in [9], [10], [11]. 3D Scanner from Two Web Cameras 46 3. Realization For our realization we use OpenCV - the biggest image processing library [12], [13] and Point Cloud Library [14] - 2D/3D images and point clouds processing library. For data capture we use two Logitech HD Pro Webcam C920 web cameras. We calibrate our cameras with 8x6 chessboard pattern printed on A2 paper and tightly attached on the glass. It is very important to have a flat and difficult stretching surface for calibration pattern. We also tried with A4 paper, but it didn't give the sufficient results. Also illumination is very important for calibration. As it was mentioned above, Stereo Block Matching (SBM) is one of the best stereo correspondence algorithms combining speed and accuracy, so we choose SBM for computing disparity map. We place an object at about 40-50 cm from the cameras and start rotating it about 1-3 degree angle. We capture the left and right camera images for every rotation and compute disparity maps for them. For every disparity map, having cameras parameters, by triangulation we compute the depth map in the form of a point cloud. Then we segment our object from the point cloud by its depth values, to have point cloud only of the object with no background. a) b) A. Gevorgyan and H. Sarukhanyan 47 c) Fig. 3. Point clouds of the object from different positions. Then the process of registration starts. At first we take the first point cloud as a global model. Then at each next step we take the next consecutive point cloud and merge it to the global model, transforming it to the global models coordinates and adding to it. Merging of two point clouds consists of the following steps: 1. Finding correspondence between point clouds 2. Rejecting bad correspondences 3. Finding transformation matrix based on correspondences and transform point cloud 4. If convergence criteria is reached stop, else go to step 1 a) 3D Scanner from Two Web Cameras 48 b) c) Fig. 4. Constructed 3D model. 4. Conclusion and Future Work As it was mentioned, our approach is based on stereovision and it is beneficial from the others that it was cheaper and more available. There are some other 3D scaners based on stereovision [15], [16], [17], but they need laser or sctructured light. Benefits of our aproach that it is enough to have only two ordianry web cameras. Our constructed 3D model is not ideal and its quality can be enhanced. For future work we plan to use color variant of ICP algorithm, that is good for the objects that don't have good geometric features. Due to many factors some holes appears at disparity map, so we also plan to try hole filling algorithms for disparity maps, that may enhance the registrations results. A. Gevorgyan and H. Sarukhanyan 49 References [1] P. Hillman, “White paper: Camera Calibration and Stereo Vision”, Lochrin Terrace, Edinburgh EH3 9QL, Tech. Rep, 2005. [2] Z. Zhang, “A flexible new technique for camera calibration”, IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1330–1334, 2000. [3] Z. Zhang, "Camera calibration with one-dimensional objects", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 26, pp. 892-899, 2004. [4] Epipolar Geometry tutorial. [Online]. Available: http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/OWENS/LECT10/node3. html [5] H. Hirschmuller, “Accurate and Efficient Stereo Processing by Semi Global Matching and Mutual Information.” IEEE Computer Vision and Pattern Recognition, vol. 2, pp. 807-814 2005. [6] D. Scharstein and R. Szeliski, “A taxonomy and evaluation of dense two-frame stereo correspondence algorithms”, International Journal of Computer Vision, vol. 47, pp. 7-42, 2002. [7] P. J.Bagga, “Real time depth computation using stereo imaging”, Journal Electrical and Electronic Engineering, vol. 1, pp. 51-54, 2013. [8] M. A. Mahammed, A. I. Melhum and F. A. Kochery, “Object distance measurement by stereo vision”, International Journal of Science and Applied Information Technology, vol. 2, pp. 05-08, March 2013. [9] P. J. Besl and H. D. Mckay, “A method for registration of 3-Dshapes”, IEEE Trans. Pattern Anal. Mach. Intell., pp. 239–256, 1992. [10] S. Rusinkiewicz and M. Levoy, “Efficient variants of the ICP algorithm”, 3-D Digital Imaging and Modeling, pp. 145–152, 2001. [11] A. Johnson and S. Kang, “Registration and integration of textured 3D data”, In Proc. 3DIM ’97, Ottawa, pp. 234–241, 1997. [12] G. Bradski and A. Kaehler, Learning OpenCV: Computer Vision with the OpenCV Library, 1st ed., O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, 2008. [13] The OpenCV website. [Online]. Available: http://opencv.org/ [14] The Point Cloud Library website. [Online]. Available: http://pointclouds.org/ [15] Z. Lv and Z. Zhang, "Build 3D Scanner System based on Binocular Stereo Vision", Journal of Computers, vol. 7., No 2, February 2012. [16] Tzung-Han Lin, "Resolution adjustable 3D scanner based on using stereo cameras", Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2013 Asia-Pacific, 2013. [17] M. Pashaei, S. M. Mousavi, "Implementation of a low cost structured light scanner", Int. ch. Photogramm. Remote Sens. Spatial Inf. Sci., vol. XL-5/W2., pp.477-482, 2013. Submitted 04.09.2015, accepted 20.11.2015 http://pointclouds.org/ 3D Scanner from Two Web Cameras 50 3D սկաներ երկու վեբ տեսախցիկից Ա. Գևորգյան և Հ. Սարուխանյան Ամփոփում Սույն հոդվածում ներկայացված է երկու սովորական տեսախցիկից 3D սկաներ կառուցելու եղանակ: Այն հիմնված է ստերեոտեսողության վրա և տարբերվում է մնացած 3D սկաներներից նրանով, որ չի պահանջում լազեր, համակարգված լույս կամ այլ թանկարժեք սենսորներ: Օբյեկտը տեղադրվում է տեսախցիկներից մոտ 40-50 սանտիմետր հեռավորության վրա և պտտվում փոքր անկյուններով: Ամեն պտույտի համար հաշվարկվում է անհավասարությունների քարտեզը, որից ստացվում է խորությունների քարտեզը` կետերի ամպի տեսքով: Այնուհետև առանձնացվում է օբյեկտը տեսարանից, միավորվում են տարբեր տեսարաններից կետերի ամպերը և ստացվում է ամբողջական 3D մոդել: 3D сканер из двух веб камер А. Геворкян и А. Саруханян Аннотация В данной статье представлен метод построения 3D сканера при помощи двух обычных веб камер. Этот метод основан на стереозрении и отличается от других 3D сканеров тем, что не требует лазера, структурированного света или других дорогих сенсоров. Объект помещается на расстоянии 40-50 сантиметров перед камерами и вращается на маленькие градусы. Для каждого вращения вычисляется карта смещений по двум камерам и строится карта глубины в виде облака точек. Далее мы отделяем объект от сцены и соединяем облака точек объекта из разных ракурсов, чтобы получить полную 3D модель. 2. Related Work 2.1 Calibration 2.1 Rectification 2.2 Stereo Correspondence 2.3 Triangulation 2.4 Registration 3. Realization 4. Conclusion and Future Work References