Microsoft Word - BRAIN_7_issue_3_version_2.doc 5 Automatic Anthropometric System Development Using Machine Learning Long The Nguyen Irkutsk National Technical University, Lermontov 83, 664074, Irkutsk, Russia thelongit88@gmail.com Huong Thu Nguyen Irkutsk National Technical University, Lermontov 83, 664074, Irkutsk, Russia thuhuongyb@gmail.com Abstract The contactless automatic anthropometric system is proposed for the reconstruction of the 3D-model of the human body using the conventional smartphone. Our approach involves three main steps. The first step is the extraction of 12 anthropological features. Then we determine the most important features. Finally, we employ these features to build the 3D model of the human body and classify them according to gender and the commonly used sizes. Keywords: Random Forest, feature extraction, anthropometric features, data classification, 3D-model, anthropometry, image processing, artificial intelligence. 1. Introduction The development of an automatic anthropometric system is a challenging problem which has various potential applications in medical monitoring, fitness and clothing industry. In this paper, we take this challenge using the state-of-the-art methods of artificial intelligence and image processing involving features analysis. The selection and dimension reduction of features are two methods commonly used to reduce the feature space. They are important components in the classification in various fields. One of the challenges in the classification is a very large number of features. Features analysis and classification are challenging research topics of computer science. In this article, we present a new approach to anthropometric features extraction and classification. We select the most valuable features to model the human body. 2. Related work The state-of-the-art image processing algorithms and automatic extraction of the human body features are widely used in many fields, such as non-contact measurement of body size (Lin, 2008), the construction of a 3D-model of the human body (Wang, 2010; Lin, 2012; Han, 2015). At the moment, most of the classification algorithms can handle only a limited amount of data. In Quinlan (1985) and Shepherd (1983) the authors proposed an approach using the Decision Tree algorithm in a hierarchical tree structure used to classify objects on the basis of series of rules. The decision tree classification method is very efficient and easy to understand. However, engineers must be accurate when it comes to the application using the Decision Trees in the building classification models as follows: the efficiency of classification based on the decision tree (series of rules) largely depends on the training data. Support Vector Machine (SVM) is widely used in the field of identification and classification as well; here readers may refer to Cortes (1985), Wang (2005). In Aixin Sun (2002) and Linli Xu (2005), the authors published a method for SVM which can conduct classification pretty good for text classification tasks, as well as many other applications. SVM is a binary classifier which operates only when data are presented with a maximum of two classes. This means that for classifying data into more than two classes, they must use SVM several times in the area, which leads to an increase in time. Research shows that there are many areas to improve the classification algorithms such as the use of hybrid algorithms, kernel based methods, and also features extraction, which is one of the main ways to boost the performance. Integral transforms play an important role in image and signal processing. For more details (Sidorov, 2014). BRAIN. Broad Research in Artificial Intelligence and Neuroscience Volume 7, Issue 3, August 2016, ISSN 2067-3957 (online), ISSN 2068 - 0473 (print) 6 3. Anthropometric System Our purpose is to develop an automatic measurement and modeling system based on 2D images (front and side images). This system used to image processing methods and machine learning algorithms. Our system has 3 main parts; there are human body feature extraction, training and testing processes, and the classification for new data. The novelty of our approach: - Classification of anthropometric features based on machine learning algorithms. - Development a non-contact anthropometric program for the smartphones on operation system Android. - Construction of a 3D-model of the human body based on the results of anthropometric features extraction. Our system can also be used to integrate to different environments, such as online shopping websites to support users fitting their clothes sizes and medical applications. The flowchart of our anthropometric system is described in figure 1. Figure 1. Flowchart of anthropometric system We propose an efficient, simple and robust human body feature extraction based on the front and side images of a human body. Description of anthropometric data - men/women: Dataset based on an experiment is used to test the system data describing the anthropometric features of men, includes 12 sizes of the human body, which are presented in figure 2. L. The Nguyen, H. Thu Nguyen - Automatic Anthropometric System Development using Machine Learning 7 Figure 2. Human body sizes for men/women. Two main methods are used in the system: Graph cuts method and Iterative Closest Point (ICP) algorithm. Besides, we also use these techniques in image processing: Canny edge detection operator (Liyuan Li, 2006) and morphology are used to find the body silhouette. Histogram equalization is used for adjusting image intensities to enhance contrast. We propose to use the method of supervised Graph cuts image segmentation method to improve the quality segmentation of the human body parts. The method Graph Cuts finds the optimal solution to a binary problem. However, when each pixel can be assigned many labels, finding the solution can be computationally expensive. For the following type of energy, series of graph cuts can be used to find a convenient local mínimum. Such as follows ( ) ( ) ( ) qp Nqp qppp Pp p ffVfiDfE ,, , ,∑∑ ∈∈ += , (1) where PPN ×⊂ is a set of neighborhood pixels. )( pp fD is a function derived from the observed data that measures the cost of assigning the label fp to the pixel p. ),( , qpqp ffV measures the cost of assigning the labels fp, fq to the adjacent pixels p, q and is used to impose spatial smoothness. Energy functions of the form (1) can be justified on Bayesian grounds using the well-known Markov Random Fields (MRF) formulation (S. Geman,1984), (S. Li, 1995). Figure 3 describes the steps implemented Graph cuts algorithm for the segmentation of human body parts. The results obtained are 5 main sections that include the hands, the legs, the center of the body (chest, waist, hips), and the head. The result of the display image is taken from the human image database, which was collected by us (Нгуен, 2016). BRAIN. Broad Research in Artificial Intelligence and Neuroscience Volume 7, Issue 3, August 2016, ISSN 2067-3957 (online), ISSN 2068 - 0473 (print) 8 Figure 3. (3.a) – The flowchart of Graph cuts method; (3.b)- the result of Graph cuts image segmentation. The proposed method was tested on ten human subjects and the defined feature points were correctly extracted by Iterative Closest Point (ICP) for 2D curves. Among these feature points, there are 15 points with geometrical properties that perfectly indicate the concavity and convexity of the curves corresponding to the definitions of the landmarks related to garment measurements. The key concept of the standard ICP algorithm can be summarized in two steps: - Compute correspondences between the two scans. - Compute a transformation which minimizes the distance between corresponding points. It is forced to add a maximum matching threshold dmax. In most implementations of ICP, the choice of dmax represents a tradeoff between convergence and accuracy. A low-value result in bad convergence, a large value causes incorrect correspondences to pull the final alignment away from the correct value. Figure 4 describes the steps of the algorithm which determines the point features closest to object boundary. The result of the algorithm is described by images cut from the program (Нгуен, 2016). L. The Nguyen, H. Thu Nguyen - Automatic Anthropometric System Development using Machine Learning 9 Figure 4. Flowchart and results of ICP algorithm Random Forest (Breiman, 2001; Breiman, 2002) is a classification method developed by Leo Breiman at the University of California, Berkeley. In fact, Random Forest uses a method called “bagging” - stands for the “bootstrap aggregating” idea and Ho's “random subspace method” to construct a collection of decision trees with controlled variations. Let us briefly outline the basic idea of a Random Forest algorithm below. - At each tree split, a random sample of m features is drawn, and only those m features are considered for splitting. Typically pm = or p 2 log , where p is the number of features. - For each tree grown on a bootstrap sample, the error rate for observations left out of the bootstrap sample is monitored. This is called the “out-of-bag” error rate. The Random Forest is a powerful classification method because of the following. First, errors are minimized as a result of a random forest, synthesizing through training (learner). The second, random choice at every stage in the Random Forest will reduce the correlation between the learners in the synthesis of the results. In addition, we also found that the total error of layered forest trees depends on their individual errors in forest trees, as well as the correlation between the trees. The article uses the wrapper model (Christopher Tong, 2000) with the objective function for the evaluation, Random Forest algorithm is shown in figure 5. BRAIN. Broad Research in Artificial Intelligence and Neuroscience Volume 7, Issue 3, August 2016, ISSN 2067-3957 (online), ISSN 2068 - 0473 (print) 10 Figure 5. Flowchart of data classification We propose to use the method (Dong Thi Ngoc Lan, 2012) to evaluate and find out good sets of features from the original sets of features as follows: - Step 1: Create m subsets of features from n sets of original features. Each set has 2(n/m) features. Including n/m equal features , n/m random features. - Step 2: Use Random Forest to calculate estimates of subsets of features, then receiving a set of values of f(i)(i = 1, .., m). ∑ = = m i i kf j w 1 . - Step 3: The weight of each feature i is calculated by this formula: 0= ij k if the i feature is not selected in the j feature. 1= ij k if the i feature is selected in the j feature. - Step 4: Developing a new set includes p of the best features. - Step 5: Return to step 1 when meeting one of these two conditions: the number of features is smaller than the permitted threshold, the number of loops is determined. Algorithm 1: Proposed algorithm to select “Important features” In this paper, we focus on presenting the result of classification for anthropometric features (men/women) based on Random Forest algorithm. To select optimal features from original features (12 features) using proposed algorithm, which improves from Random forest algorithm. The dataset - a two-dimensional table 45 x 12 includes 50 records, each record has 12 features as training data. Records in the dataset are divided into classes designated by XS (extra small), S (small), M (medium), L (large) and XL (extra large) are based on standards (Beretta Clothing Chart), (Nguyen The Long, 2015) and the dataset 5x12, both verification data. We set up parameters for both of process 1 and process 2: m=3 subsets of features from n=12 sets of original features. Process 1: Realization of the basic algorithm Random Forest on the anthropometric dataset for men/women will be performed 5 times. Each time of run will perform diagonally verification with the number of trees 100, 200, 300, 400, 500, respectively. The results are shown in table 1. L. The Nguyen, H. Thu Nguyen - Automatic Anthropometric System Development using Machine Learning 11 Table 1. Average run time, Average value, Standard deviation of basic Random forest algorithm – number of trees :100,200,300,400,500 Number of trees Average run time Average value Standard deviation Minimum value Maximum value 100 0.2589 0.0348 0.0250 0.0139 0.0606 200 0.3851 0.0276 0.0200 0.0152 0.0352 300 0.9660 0.0217 0.0125 0.0121 0.0336 400 1.6625 0.0183 0.0150 0.0076 0.0270 500 3.2027 0.0166 0.0115 0.0102 0.0254 Process 2: We employ the algorithm (see Algorithm 1 above) to select the optimal features from original features (12 features) on the human database. From 12 original features, we divided into m subset based on sample function. Each subset includes n/m random features. Where n is the amount of features and m is the parameter to split. Then we have a new file named “Important Features” which includes 4 features – These are the optimal features from the 12 features. We do the same part 1 with “Important Features” to classify with the RF algorithm. Table 2 contains results of the RF classification with “Important Features”. Table 2. Average run time, Average value, Standard deviation of proposed algorithm – number of trees: 100, 200, 300, 400, 500 to select “Important features”. Number of trees Average run time Average value Standard deviation Minimum value Maximum value 100 0.04236 0.0116 0.00833 0.00463 0.0202 200 0.18836 0.0092 0.00667 0.00516 0.0117 300 0.5026 0.02178 0.00416 0.0040 0.0221 400 0.6564 0.00726 0.0050 0.0071 0.009 500 1.4270 0.00553 0.00383 0.00513 0.0085 4. Application and Results of the Classification for the Reconstruction of 3D-models. In the training process we defined that labels 0, 1, 2, 3, 4, 5 are matched with the model's sizes, i.e. 0: XS - extra small, 1: S - small, 3: M - medium, 4: L - large, 5:XL - extra large. Thus, when the testing process returns the label of each record we compared them to find the fittest model with the object in the image. 3D-models were built with the support of the library Min3D (Min3D library) and MakeHuman (Make human library), and the theory is based on the method of analysis of integrated dynamic models. The program was built by Java programming language. The database has 100 models, the corresponding body size XS, S, M, L, XL. Each body size has 20 models that are based on the various parameters of each body. Our goal is to use 4 optimal features (basic anthropometric features - height, chest, waist, hip) "Important Features", which are selected from the proposed algorithm respectively. We created a formula to calculate and find out which model in the library is the best fitting one: ( ) ( ) ( ) ( ) −+−+−+−= ∑ = N i iiii HHWWCCBBMinModel 1 2 222 Where: chest (B), waist circumference (W), hip circumference (C), height (H). BRAIN. Broad Research in Artificial Intelligence and Neuroscience Volume 7, Issue 3, August 2016, ISSN 2067-3957 (online), ISSN 2068 - 0473 (print) 12 Figure 6. The result of building a 3D model based on RF and SVM classification with “Important features”. From the chart of figure 6, we found that "Important Features" gave the best 3D model, which fits with the object in the image. The pattern is close to 90% compared with the true size. Apply classification algorithm RF increases the accuracy of the results and reduces computing time for the program. There are many methods for data classifying. One of them is the method of the support vector machine (SVM). The SVM method is represented by Vladimir N. Vapnik (1995) in Support Vector Machines (SVM) - a set of learning algorithms similar with the supervisor has two main tasks: the classification and the regression analysis. In this article we use the method of the SVM classification problem for the size of the human body with 5 classes to compare the performance between SVM methods and Random Forest algorithm. This type of SVM training includes minimizing the error function: ∑ = + N i i T cww 12 1 ξ (2) subject to the constraints: ( )( ) Nibxwy iii T i ...,; 101 =≥−≥+ ξξφ (3) Where c is the capacity constant, w is the vector of coefficients, b is a constant, and ξi represents parameters for handling non-separable data (inputs). The index i labels the N training cases. The comparison of results (average time and error of algorithm) obtained using SVM and Random Forest are shown in tables 3. The error of the SVM algorithm is calculated based on equation (2) and the error of algorithm Random Forest is calculated by OOB (out-of-bag) error. Table3. Performance of SVM and Random Forest for data classification with an optimal set of data SVM RF-100 RF-200 RF-300 RF-400 RF-500 Average time 0.600 0.2589 0.3851 0.9660 1.6625 3.2027 Error of algorithm 0.22 0.025 0.0200 0.0125 0.0150 0.0115 In two cases, using the SVM classifier and Random Forest with trees 100, 200, 300, 400, 500 datasets before and after optimization commented as follows: the running time of Random Forest is greater comparing with SVM, because more trees are generated, many cases will be considered. In particular, increasing the number of trees, while labeling is long, but Random Forest provides higher accuracy SVM. Based on anthropometric features and machine learning algorithms, we have built an Android app in the smartphone environment. This app can automatically extract L. The Nguyen, H. Thu Nguyen - Automatic Anthropometric System Development using Machine Learning 13 anthropometric features (12 features). The user must stand in front of the smartphone camera and takes 2 pictures. Then input their height (centimeters) for calibration. The application automatically extracts human parameters to enable adequate 3D models reconstruction. The results of the Android application are demonstrated in figure 7 and figure 8. Figure7. Model 3D of women body. Figure 8. Model 3D of man body. BRAIN. Broad Research in Artificial Intelligence and Neuroscience Volume 7, Issue 3, August 2016, ISSN 2067-3957 (online), ISSN 2068 - 0473 (print) 14 5. Conclusion In this article, we constructed the tool for the reconstruction of an accurate 3D-model of the human body based on non-contact measurements using the conventional smartphone camera. In order to improve the efficiency of the optimal features selection and classification, we suggested using the Random Forest algorithm. The article details the steps of the proposed algorithm, and performed experiments to prove the correctness of our approach. We run experiments using two datasets, which are based on international standards of men and women body size. The experiments were performed and then evaluated the results obtained from the original Random Forest program and the proposed method, the analysis, and comparison of the schedule. The experimental data shows that the proposed method allows the Random Forest algorithm work faster, more stable and results are more accurate. 6. Acknowledgements The authors are thankful to Dr. Denis Sidorov for his kind interest in this work and supervision. We give thanks to Aleksei Zhukov for valuable discussions of Random Forest. References Quinlan, J.R. (1985). Decision trees and multi-valued attributes. In J.E. Hayes and D. Michie (Eds.), Machine intelligence 11. Oxford University Press (in press). Shepherd, B.A. (1983). An appraisal of a decision-tree approach to image classification. Proceedings of the Eighth International Joint Conference on Artificial Intelligence. Karlsruhe, West Germany: Morgan Kaufmann. Cortes, C. & Vapnik, V.(1985). Support-Vector Networks. Machine Learning, Volume 20, No. 3, 273-297. Lipo, W. (2005). Support Vector Machines: theory and applications. Volume 177. Springer Science and Business Media. Sun, A., Lim, E.P., & Ng. Sun, W.K. (2002). Web classification using support vector machine. Proceedings of the 4th International Workshop on Web Information and Data Management, McLean, Virginia, USA, 2002 (ACM Press). Xu, L. & Schuurmans, D. (2005). Unsupervised and Semi-Supervised MultiClass Support Vector Machines. AAAI, 904-910. Lin, Y.-L. & Wang, M.J.J (2008).: Automatic Feature Extraction from Front and Side Images. In: International conference on Industrial Engineering and Engineering Management, IEEE, Singapore, 1949-1953. Lin, Y.-L. & Wang, M.-J.J (2010). Constructing 3D Human Model from 2D Images. In: International conference on Industrial Engineering and Engineering Management, IEEE, Xiemen, 1902-1906. Lin, Y.-L. & Wang, M.-J.J.(2012). Constructing 3D Human Model from Front and Side Images. Expert Systems with Applications, Vol. 39, No. 5, 5012-5018. Han, E. (2015). 3D Body-Scanning to Help Online Shoppers and the Perfect Clothes fit. The Sydney Morning Herald. National Newspaper (Australia). Molina, L.C., Belanche, L., & Nebot, A. (2002). Feature Selection Algorithms: A Survey and Experimental Evaluation. ICDM 2002: 306-313 Breiman, L. (2002), Manual On Setting Up, Using, And Understanding Random Forests V3.1. Retrieved from http://oz.berkeley.edu/users/breiman/Using_random_forests_V3.1.pdf Breiman, L. (2001). Random Forests, Machine Learning Journal Paper, vol. 45. Beretta Clothing Chart. Retrieved from http://www.ableammo.com/catalog/Beretta_Clothing_Chart.php Svetnik, V., Liaw, A., & Tong, C. (2000). Variable Selection in Random Forest with Application to Quantitative Structure Activity Relationship, Biometrics Research, Merck Co., Inc. P.O. Box 2000 RY33-300, Rahway, NJ 07065, USA L. The Nguyen, H. Thu Nguyen - Automatic Anthropometric System Development using Machine Learning 15 Geman, S. & Geman, D. (1984). Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images, IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 6, 721- 734. Li, S. (1995), Markov Random Field Modeling in Computer Vision. Springer- Verlag. Min3D library. Retrieved from https://code.google.com/p/min3d Make human library. Retrieved from http://www.makehuman.org Nguyen The Long, Nguyen Thu Huong, & Zhukov, A. (2015). Studies of Anthropometrical Features using Machine Learning Approach. Supplementary Proceedings of the 4th International Conference on Analysis of Images, Social Networks and Texts (AIST). CEUR Workshop Proceedings, 96-105. Dong Thi Ngoc Lan (2012). Nghien cuu, xay dung phuong phap trich chon thuoc tinh nham lam tang hieu qua phan lop doi voi du lieu da chieu, luan van thac sy CNTT, Dai hoc Cong nghe - Dai hoc Quoc gia Ha Noi, 2012, 36-43. Support Vector Machine (SVM). Retrieved from http://www.statistica.ru/branches-maths/metod- opornykh-vektorov-supported-vector-machine-svm/ Liyuan Li, Ruijiang Luo, Weimin Huang, & How-Lung Eng (2006). Context-Controlled Adaptive Background Subtraction. Proceeding Ninth IEEE International Workshop on Performance Evaluation of Tracking and Surveillance (PETS), New York, USA, 31-38. Сидоров Д.Н., Нгуен Т.Л. Нгуен Т.Х (2016). Программа бесконтакной антропометрии для смартфонов на операционной системе Андроид, № 2015661864. Canny, J. (1986). A computational approach to edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 8, No. 6, 679-698. Sidorov, D. (2014). Integral Dynamical Models: Singularities, Signals and Control, World Scientific Series on Nonlinear Science Series A, volume 87, World Scientific.