Microsoft Word - BRAIN_7_issue_3_version_2.doc


5 

 

Automatic Anthropometric System Development Using Machine Learning  
 

Long The Nguyen 

Irkutsk National Technical University, Lermontov 83, 664074, Irkutsk, Russia 
thelongit88@gmail.com  

 

Huong Thu Nguyen 

Irkutsk National Technical University, Lermontov 83, 664074, Irkutsk, Russia 
thuhuongyb@gmail.com 

 

Abstract 

The contactless automatic anthropometric system is proposed for the reconstruction of the 

3D-model of the human body using the conventional smartphone. Our approach involves three main 

steps. The first step is the extraction of 12 anthropological features. Then we determine the most 

important features. Finally, we employ these features to build the 3D model of the human body and 

classify them according to gender and the commonly used sizes.  

Keywords: Random Forest, feature extraction, anthropometric features, data classification, 

3D-model, anthropometry, image processing, artificial intelligence. 

 

1. Introduction 

The development of an automatic anthropometric system is a challenging problem which has 

various potential applications in medical monitoring, fitness and clothing industry. In this paper, we 

take this challenge using the state-of-the-art methods of artificial intelligence and image processing 

involving features analysis. The selection and dimension reduction of features are two methods 

commonly used to reduce the feature space. They are important components in the classification in 

various fields. One of the challenges in the classification is a very large number of features. Features 

analysis and classification are challenging research topics of computer science. In this article, we 

present a new approach to anthropometric features extraction and classification. We select the most 

valuable features to model the human body. 

 

2. Related work 

The state-of-the-art image processing algorithms and automatic extraction of the human 

body features are widely used in many fields, such as non-contact measurement of body size (Lin, 

2008), the construction of a 3D-model of the human body (Wang, 2010; Lin, 2012; Han, 2015). At 

the moment, most of the classification algorithms can handle only a limited amount of data. In 

Quinlan (1985) and Shepherd (1983) the authors proposed an approach using the Decision Tree 

algorithm in a hierarchical tree structure used to classify objects on the basis of series of rules. The 

decision tree classification method is very efficient and easy to understand. 

However, engineers must be accurate when it comes to the application using the Decision 

Trees in the building classification models as follows: the efficiency of classification based on the 

decision tree (series of rules) largely depends on the training data. Support Vector Machine (SVM) 

is widely used in the field of identification and classification as well; here readers may refer to 

Cortes (1985), Wang (2005). 

In Aixin Sun (2002) and Linli Xu (2005), the authors published a method for SVM which 

can conduct classification pretty good for text classification tasks, as well as many other 

applications. SVM is a binary classifier which operates only when data are presented with a 

maximum of two classes. This means that for classifying data into more than two classes, they must 

use SVM several times in the area, which leads to an increase in time. Research shows that there are 

many areas to improve the classification algorithms such as the use of hybrid algorithms, kernel 

based methods, and also features extraction, which is one of the main ways to boost the 

performance. Integral transforms play an important role in image and signal processing. For more 

details (Sidorov, 2014). 



BRAIN. Broad Research in Artificial Intelligence and Neuroscience 

Volume 7, Issue 3, August 2016, ISSN 2067-3957 (online), ISSN 2068 - 0473 (print) 

 

 6

3. Anthropometric System 

Our purpose is to develop an automatic measurement and modeling system based on 2D 

images (front and side images). This system used to image processing methods and machine 

learning algorithms. Our system has 3 main parts; there are human body feature extraction, training 

and testing processes, and the classification for new data. The novelty of our approach: 

- Classification of anthropometric features based on machine learning algorithms. 

- Development a non-contact anthropometric program for the smartphones on operation 

system Android. 

- Construction of a 3D-model of the human body based on the results of anthropometric 

features extraction. 

Our system can also be used to integrate to different environments, such as online shopping 

websites to support users fitting their clothes sizes and medical applications. The flowchart of our 

anthropometric system is described in figure 1. 

 

 

 
 

Figure 1. Flowchart of anthropometric system 

 

We propose an efficient, simple and robust human body feature extraction based on the front 

and side images of a human body. Description of anthropometric data - men/women: Dataset based 

on an experiment is used to test the system data describing the anthropometric features of men, 

includes 12 sizes of the human body, which are presented in figure 2. 



 
L. The Nguyen, H. Thu Nguyen - Automatic Anthropometric System Development using Machine Learning  

 

7 

 

 
 

Figure 2. Human body sizes for men/women. 

 

Two main methods are used in the system: Graph cuts method and Iterative Closest Point 

(ICP) algorithm. Besides, we also use these techniques in image processing: Canny edge detection 

operator (Liyuan Li, 2006) and morphology are used to find the body silhouette. Histogram 

equalization is used for adjusting image intensities to enhance contrast. 

We propose to use the method of supervised Graph cuts image segmentation method to 

improve the quality segmentation of the human body parts. The method Graph Cuts finds the 

optimal solution to a binary problem. However, when each pixel can be assigned many labels, 

finding the solution can be computationally expensive. For the following type of energy, series of 

graph cuts can be used to find a convenient local mínimum. Such as follows 

 

( ) ( ) ( )
qp

Nqp

qppp

Pp

p
ffVfiDfE ,,

,

,∑∑
∈∈

+=  ,    (1) 

where PPN ×⊂   is a set of neighborhood pixels. )(
pp

fD is a function derived from the observed 

data that measures the cost of assigning the label fp to the pixel p. 

),(
, qpqp

ffV measures the cost of assigning the labels fp, fq to the adjacent pixels p, q and is 

used to impose spatial smoothness. Energy functions of the form (1) can be justified on Bayesian 

grounds using the well-known Markov Random Fields (MRF) formulation (S. Geman,1984), (S. Li, 

1995). 

 Figure 3 describes the steps implemented Graph cuts algorithm for the segmentation of 

human body parts. The results obtained are 5 main sections that include the hands, the legs, the 

center of the body (chest, waist, hips), and the head. The result of the display image is taken from 

the human image database, which was collected by us (Нгуен, 2016). 

  



BRAIN. Broad Research in Artificial Intelligence and Neuroscience 

Volume 7, Issue 3, August 2016, ISSN 2067-3957 (online), ISSN 2068 - 0473 (print) 

 

 8

 
 

Figure 3. (3.a) – The flowchart of Graph cuts method; (3.b)- the result of Graph cuts image segmentation. 

 

The proposed method was tested on ten human subjects and the defined feature points were 

correctly extracted by Iterative Closest Point (ICP) for 2D curves. Among these feature points, there 

are 15 points with geometrical properties that perfectly indicate the concavity and convexity of the 

curves corresponding to the definitions of the landmarks related to garment measurements.  

 

The key concept of the standard ICP algorithm can be summarized in two steps: 

- Compute correspondences between the two scans. 

- Compute a transformation which minimizes the distance between corresponding points. It 

is forced to add a maximum matching threshold dmax. In most implementations of ICP, the choice of 

dmax represents a tradeoff between convergence and accuracy. A low-value result in bad 

convergence, a large value causes incorrect correspondences to pull the final alignment away from 

the correct value. Figure 4 describes the steps of the algorithm which determines the point features 

closest to object boundary. The result of the algorithm is described by images cut from the program 

(Нгуен, 2016). 

 



 
L. The Nguyen, H. Thu Nguyen - Automatic Anthropometric System Development using Machine Learning  

 

9 

 

      
 

 
Figure 4. Flowchart and results of ICP algorithm 

 

 

Random Forest (Breiman, 2001; Breiman, 2002) is a classification method developed by 

Leo Breiman at the University of California, Berkeley. In fact, Random Forest uses a method called 

“bagging” - stands for the “bootstrap aggregating” idea and Ho's “random subspace method” to 

construct a collection of decision trees with controlled variations. 

Let us briefly outline the basic idea of a Random Forest algorithm below. 

- At each tree split, a random sample of m features is drawn, and only those m features are 

considered for splitting. Typically pm = or p
2

log , where p is the number of features. 

- For each tree grown on a bootstrap sample, the error rate for observations left out of the 

bootstrap sample is monitored. This is called the “out-of-bag” error rate. 

The Random Forest is a powerful classification method because of the following. First, 

errors are minimized as a result of a random forest, synthesizing through training (learner). The 

second, random choice at every stage in the Random Forest will reduce the correlation between the 

learners in the synthesis of the results. In addition, we also found that the total error of layered 

forest trees depends on their individual errors in forest trees, as well as the correlation between the 

trees. 

The article uses the wrapper model (Christopher Tong, 2000) with the objective function for 

the evaluation, Random Forest algorithm is shown in figure 5. 



BRAIN. Broad Research in Artificial Intelligence and Neuroscience 

Volume 7, Issue 3, August 2016, ISSN 2067-3957 (online), ISSN 2068 - 0473 (print) 

 

 10

 
 

Figure 5. Flowchart of data classification 

 

We propose to use the method (Dong Thi Ngoc Lan, 2012) to evaluate and find out good sets 

of features from the original sets of features as follows: 

- Step 1: Create m subsets of features from n sets of original features. Each set has 2(n/m) 

features. Including n/m equal features ,  n/m random features. 

- Step 2: Use Random Forest to calculate estimates of subsets of features, then receiving a 

set of values of  f(i)(i = 1, .., m). 

∑
=

=
m

i
i

kf
j

w

1

.  

- Step 3: The weight of each feature i is calculated by this formula: 

0=
ij

k
   

if the i feature is not selected in the j feature. 

1=
ij

k
   

if the i feature is selected in the j feature. 

- Step 4: Developing a new set includes p of the best features. 

- Step 5: Return to step 1 when meeting one of these two conditions: the number of 

features is smaller than the permitted threshold, the number of loops is determined. 

 

Algorithm 1: Proposed algorithm to select “Important features” 

 

In this paper, we focus on presenting the result of classification for anthropometric features 

(men/women) based on Random Forest algorithm. To select optimal features from original features 

(12 features) using proposed algorithm, which improves from Random forest algorithm. The dataset 

- a two-dimensional table 45 x 12 includes 50 records, each record has 12 features as training data. 

Records in the dataset are divided into classes designated by XS (extra small), S (small), M 

(medium), L (large) and XL (extra large) are based on standards (Beretta Clothing Chart), (Nguyen 

The Long, 2015)  and the dataset 5x12, both verification data. We set up parameters for both of 

process 1 and process 2: m=3 subsets of features from n=12 sets of original features.  

Process 1: Realization of the basic algorithm Random Forest on the anthropometric dataset 

for men/women will be performed 5 times. Each time of run will perform diagonally verification 

with the number of trees 100, 200, 300, 400, 500, respectively. The results are shown in table 1. 

 

 

 

 



 
L. The Nguyen, H. Thu Nguyen - Automatic Anthropometric System Development using Machine Learning  

 

11 

 

Table 1. Average run time, Average value, Standard deviation of basic Random forest 

algorithm – number of trees :100,200,300,400,500 

Number of 

trees 

Average run 

time 

Average 

value 

Standard 

deviation 

Minimum 

value 

Maximum 

value 

100 0.2589 0.0348 0.0250 0.0139 0.0606 

200 0.3851 0.0276 0.0200 0.0152 0.0352 

300 0.9660 0.0217 0.0125 0.0121 0.0336 

400 1.6625 0.0183 0.0150 0.0076 0.0270 

500 3.2027 0.0166 0.0115 0.0102 0.0254 

 

Process 2: We employ the algorithm (see Algorithm 1 above) to select the optimal features 

from original features (12 features) on the human database. From 12 original features, we divided 

into m subset based on sample function. Each subset includes n/m random features. Where n is the 

amount of features and m is the parameter to split. Then we have a new file named “Important 

Features” which includes 4 features – These are the optimal features from the 12 features. We do the 

same part 1 with “Important Features” to classify with the RF algorithm. Table 2 contains results of 

the RF classification with “Important Features”. 

 

Table 2. Average run time, Average value, Standard deviation of proposed algorithm – 

number of trees: 100, 200, 300, 400, 500 to select “Important features”. 

Number of 

trees 

Average run 

time 

Average 

value 

Standard 

deviation 

Minimum 

value 

Maximum 

value 

100 0.04236 0.0116 0.00833 0.00463 0.0202 

200 0.18836 0.0092 0.00667 0.00516 0.0117 

300 0.5026 0.02178 0.00416 0.0040 0.0221 

400 0.6564 0.00726 0.0050 0.0071 0.009 

500 1.4270 0.00553 0.00383 0.00513 0.0085 

 

4. Application and Results of the Classification for the Reconstruction of 3D-models. 

In the training process we defined that labels 0, 1, 2, 3, 4, 5 are matched with the model's 

sizes, i.e. 0: XS - extra small, 1: S - small, 3: M - medium, 4: L - large, 5:XL - extra large. Thus, 

when the testing process returns the label of each record we compared them to find the fittest model 

with the object in the image. 3D-models were built with the support of the library Min3D (Min3D 

library) and MakeHuman (Make human library), and the theory is based on the method of analysis 

of integrated dynamic models. The program was built by Java programming language. The database 

has 100 models, the corresponding body size XS, S, M, L, XL. Each body size has 20 models that 

are based on the various parameters of each body. Our goal is to use 4 optimal features (basic 

anthropometric features - height, chest, waist, hip) "Important Features", which are selected from 

the proposed algorithm respectively. We created a formula to calculate and find out which model in 

the library is the best fitting one: 

( ) ( ) ( ) ( )








−+−+−+−= ∑
=

N

i

iiii
HHWWCCBBMinModel

1

2 222
 

Where: chest (B), waist circumference (W), hip circumference (C), height (H). 



BRAIN. Broad Research in Artificial Intelligence and Neuroscience 

Volume 7, Issue 3, August 2016, ISSN 2067-3957 (online), ISSN 2068 - 0473 (print) 

 

 12

 

               
 
Figure 6. The result of building a 3D model based on RF and SVM classification with “Important features”. 

 

From the chart of figure 6, we found that "Important Features" gave the best 3D model, 

which fits with the object in the image. The pattern is close to 90% compared with the true size. 

Apply classification algorithm RF increases the accuracy of the results and reduces computing time 

for the program. 

There are many methods for data classifying. One of them is the method of the support 

vector machine (SVM). The SVM method is represented by Vladimir N. Vapnik (1995) in Support 

Vector Machines (SVM) - a set of learning algorithms similar with the supervisor has two main 

tasks: the classification and the regression analysis. In this article we use the method of the SVM 

classification problem for the size of the human body with 5 classes to compare the performance 

between SVM methods and Random Forest algorithm. This type of SVM training includes 

minimizing the error function: 

∑
=

+
N

i

i

T
cww

12

1
ξ (2) 

subject to the constraints: ( )( ) Nibxwy
iii

T

i
...,; 101 =≥−≥+ ξξφ  (3) 

 

Where c is the capacity constant, w is the vector of coefficients, b is a constant, and ξi 
represents parameters for handling non-separable data (inputs). The index i labels the N training 

cases. The comparison of results (average time and error of algorithm) obtained using SVM and 

Random Forest are shown in tables 3. The error of the SVM algorithm is calculated based on 

equation (2) and the error of algorithm Random Forest is calculated by OOB (out-of-bag) error. 

 

Table3. Performance of SVM and Random Forest for data classification with an optimal set of data 

 SVM RF-100 RF-200 RF-300 RF-400 RF-500 

Average time 0.600 0.2589 0.3851 0.9660 1.6625 3.2027 

Error of algorithm 0.22 0.025 0.0200 0.0125 0.0150 0.0115 

 

In two cases, using the SVM classifier and Random Forest with trees 100, 200, 300, 400, 

500 datasets before and after optimization commented as follows: the running time of Random 

Forest is greater comparing with SVM, because more trees are generated, many cases will be 

considered. In particular, increasing the number of trees, while labeling is long, but Random Forest 

provides higher accuracy SVM. Based on anthropometric features and machine learning algorithms, 

we have built an Android app in the smartphone environment. This app can automatically extract 



 
L. The Nguyen, H. Thu Nguyen - Automatic Anthropometric System Development using Machine Learning  

 

13 

 

anthropometric features (12 features). The user must stand in front of the smartphone camera and 

takes 2 pictures. Then input their height (centimeters) for calibration. The application automatically 

extracts human parameters to enable adequate 3D models reconstruction. The results of the Android 

application are demonstrated in figure 7 and figure 8.  

 

Figure7. Model 3D of women body. 

 

 

         Figure 8. Model 3D of man body. 



BRAIN. Broad Research in Artificial Intelligence and Neuroscience 

Volume 7, Issue 3, August 2016, ISSN 2067-3957 (online), ISSN 2068 - 0473 (print) 

 

 14

5. Conclusion 

In this article, we constructed the tool for the reconstruction of an accurate 3D-model of the 

human body based on non-contact measurements using the conventional smartphone camera. In 

order to improve the efficiency of the optimal features selection and classification, we suggested 

using the Random Forest algorithm. The article details the steps of the proposed algorithm, and 

performed experiments to prove the correctness of our approach. We run experiments using two 

datasets, which are based on international standards of men and women body size. The experiments 

were performed and then evaluated the results obtained from the original Random Forest program 

and the proposed method, the analysis, and comparison of the schedule. The experimental data 

shows that the proposed method allows the Random Forest algorithm work faster, more stable and 

results are more accurate. 

 

6. Acknowledgements 

The authors are thankful to Dr. Denis Sidorov for his kind interest in this work and 

supervision. We give thanks to Aleksei Zhukov for valuable discussions of Random Forest. 

 

References  

Quinlan, J.R. (1985). Decision trees and multi-valued attributes. In J.E. Hayes and D. Michie (Eds.),  

Machine intelligence 11. Oxford University Press (in press). 

Shepherd, B.A. (1983). An appraisal of a decision-tree approach to image classification. 

Proceedings of the Eighth International Joint Conference on Artificial Intelligence. Karlsruhe, 

West Germany: Morgan Kaufmann. 

Cortes, C. & Vapnik, V.(1985). Support-Vector Networks. Machine Learning, Volume 20, No. 3, 

273-297. 

Lipo, W. (2005). Support Vector Machines: theory and applications. Volume 177. Springer Science 

and Business Media. 

Sun, A., Lim, E.P., & Ng. Sun, W.K. (2002). Web classification using support vector machine. 

Proceedings of the 4th International Workshop on Web Information and Data Management, 

McLean, Virginia, USA, 2002 (ACM Press). 

Xu, L. & Schuurmans, D. (2005). Unsupervised and Semi-Supervised MultiClass Support Vector 

Machines. AAAI, 904-910. 

Lin, Y.-L. & Wang, M.J.J (2008).: Automatic Feature Extraction from Front and Side Images. In: 

International conference on Industrial Engineering and Engineering Management, IEEE, 

Singapore, 1949-1953.  

Lin, Y.-L. & Wang, M.-J.J (2010). Constructing 3D Human Model from 2D Images. In: 

International conference on Industrial Engineering and Engineering Management, IEEE, 

Xiemen, 1902-1906. 

Lin, Y.-L. & Wang, M.-J.J.(2012). Constructing 3D Human Model from Front and Side Images. 

Expert Systems with Applications, Vol. 39, No. 5, 5012-5018. 

Han, E. (2015). 3D Body-Scanning to Help Online Shoppers and the Perfect Clothes fit. The Sydney 

Morning Herald. National Newspaper (Australia). 

Molina, L.C., Belanche, L., & Nebot, A. (2002). Feature Selection Algorithms: A Survey and 

Experimental Evaluation. ICDM 2002: 306-313  

Breiman, L. (2002), Manual On Setting Up, Using, And Understanding Random Forests V3.1. 

Retrieved from http://oz.berkeley.edu/users/breiman/Using_random_forests_V3.1.pdf 

Breiman, L. (2001). Random Forests, Machine Learning Journal Paper, vol. 45.  

Beretta Clothing Chart. Retrieved from  

 http://www.ableammo.com/catalog/Beretta_Clothing_Chart.php 

Svetnik, V., Liaw, A., & Tong, C. (2000). Variable Selection in Random Forest with Application to 

Quantitative Structure Activity Relationship, Biometrics Research, Merck Co., Inc. P.O. Box 

2000 RY33-300, Rahway, NJ 07065, USA 



 
L. The Nguyen, H. Thu Nguyen - Automatic Anthropometric System Development using Machine Learning  

 

15 

 

Geman, S. & Geman, D. (1984). Stochastic Relaxation, Gibbs Distributions, and the Bayesian 

Restoration of Images, IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 6, 721-

734. 

Li, S. (1995), Markov Random Field Modeling in Computer Vision. Springer- Verlag. 

Min3D library. Retrieved from https://code.google.com/p/min3d 

Make human library. Retrieved from http://www.makehuman.org 

Nguyen The Long, Nguyen Thu Huong, & Zhukov, A. (2015). Studies of Anthropometrical Features 

using Machine Learning Approach. Supplementary Proceedings of the 4th International 

Conference on Analysis of Images, Social Networks and Texts (AIST). CEUR Workshop 

Proceedings, 96-105. 

Dong Thi Ngoc Lan (2012). Nghien cuu, xay dung phuong phap trich chon thuoc tinh nham lam 

tang hieu qua phan lop doi voi du lieu da chieu, luan van thac sy CNTT, Dai hoc Cong nghe - 

Dai hoc Quoc gia Ha Noi, 2012, 36-43. 

Support Vector Machine (SVM). Retrieved from http://www.statistica.ru/branches-maths/metod-

opornykh-vektorov-supported-vector-machine-svm/ 

Liyuan Li, Ruijiang Luo, Weimin Huang, & How-Lung Eng (2006). Context-Controlled Adaptive 

Background Subtraction. Proceeding Ninth IEEE International Workshop on Performance 

Evaluation of Tracking and Surveillance (PETS), New York, USA, 31-38. 

Сидоров Д.Н., Нгуен Т.Л. Нгуен Т.Х (2016). Программа бесконтакной антропометрии для 

смартфонов на операционной системе Андроид, № 2015661864. 

Canny, J. (1986). A computational approach to edge detection. IEEE Transactions on Pattern 

Analysis and Machine Intelligence, Vol. 8, No. 6, 679-698. 

Sidorov, D. (2014). Integral Dynamical Models: Singularities, Signals and Control, World Scientific 

Series on Nonlinear Science Series A, volume 87, World Scientific.