Template for an Acta IMEKO event paper


ACTA IMEKO 
ISSN: 2221-870X 
April 2017, Volume 6, Number 1, 33-42 

 
ACTA IMEKO | www.imeko.org April 2017 | Volume 6 | Number 1 | 33 

Metrological characterization of 3D biometric face 
recognition systems in actual operating conditions 
Giovanni Betta1, Domenico Capriglione3, Mariella Corvino1, Alberto Lavatelli2, Consolatina Liguori3, 
Paolo Sommella3, Emanuele Zappa2 

1DIEI, University of Cassino and of Southern Lazio Cassino (FR), Italy  
2Department of Mechanical Engineering, Politecnico di Milano, Via La Masa, 1, Milano, Italy  
3DIIn, University of Salerno, via Giovanni Paolo II, 132, Fisciano (SA), Italy 

 
Section: RESEARCH PAPER  

Keywords: face recognition; measurement uncertainty; 3D features; stereo vision; Image classification 

Citation: Giovanni Betta, Domenico Capriglione, Mariella Corvino, Alberto Lavatelli, Consolatina Liguori, Paolo Sommella, Emanuele Zappa, Metrological 
characterization of 3D biometric face recognition systems in actual operating conditions, Acta IMEKO, vol. 6, no. 1, article 6, April 2017, identifier: IMEKO-
ACTA-06 (2017)-01-06 

Section Editor: Paul Regtien, The Netherlands 

Received May 31, 2016; In final form February 2, 2017; Published April 2017 

Copyright: © 2017 IMEKO. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 License, which permits 
unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited 

Funding: The research Project partially described in this paper has been partially funded by a PRIN Grant by the Italian Ministry of Education, University and 
Research (MIUR) 

Corresponding author: Giovanni Betta, email: betta@unicas.it 

 
1. INTRODUCTION 
Facial Recognition (FR) systems are well-known computer 

vision applications whose aim is to identify (or verify) a person 
upon the measurement of some selected facial features [1]. 
Moreover, FR systems exploit digital images and 
photogrammetry to provide contact-less identification [2], [3]. 
Those systems spread widely due to an  increasing  demand  in  
the   fields   of   security  assurance    and    automatic    identity  

 
verification [4] - [6]. In fact, it is possible to find applications 
showing good performances and good maturity [7] when 
operating under constrained conditions, so that the person to 
be identified lies in a controlled environment and faces the 
camera(s) with a specified orientation. However, the extension 
of FR systems to the generic unconstrained environment 
(variable lighting, variable orientation of the person toward the 
camera, moving person…) is still problematic [8], [9], since 

ABSTRACT 
Nowadays, face recognition systems are going to widespread in many fields of application, from automatic user login for financial 
activities and access to restricted areas, to surveillance for improving security in airports and railway stations, to cite a few. 
In such scenarios, the architectures based on stereo vision and 3D reconstruction of the face are going to assume a predominant role 
because they can generally assure a better reliability than solutions based on a single camera (which make use of a single image 
instead of a couple of images). To realize such systems, different architectures can be considered by varying the positioning of the pair 
of cameras with respect to the face of the subject to be identified, as well as both kind and resolution of camera considered. These 
parameters can affect the correct decision rate of the system in classifying the input face, especially in presence of image uncertainty.    
In this paper, several 3D architectures differing in camera specifications and geometrical positioning of the camera pair (with respect 
to the input face) are realized and compared. The detection of facial features in the images is made by adopting a popular method 
based on the Active Appearance Model (AAM) algorithm. 3D position of facial features is then obtained by means of stereo 
triangulation. The performance of the realized systems has been compared in terms of sensitivity to the quantities of influence and 
related uncertainty, and of typical indexes for the analysis of classification systems. Main results of such comparison show that the 
best performance can be reached by reducing the distance between cameras and subject to be identified and by minimizing the 
horizontal angle between the plane containing the camera pair axis and the face to be identified.  


ACTA IMEKO | www.imeko.org April 2017 | Volume 6 | Number 1 | 34 

person identification occurs with high uncertainty and low 
repeatability, which is the main problem in the forensic context. 
In order to tackle this critical issue, the research activity focused 
on the analysis of human perception with the aim of describing 
mathematically the processes that lead to the identification of a 
person [10] and then to implement in FR algorithms able to 
operate in uncontrolled environments [11]. The main principle 
of these algorithms is to focus on the analysis of a set of 
biometric features rather than on the analysis of the full 3D 
shape of a face. This is done by processing the image(s) of a 
person with suitable feature matching and extraction algorithms 
prior to stereo triangulation. 

The most of the biometric FR systems proposed in literature 
are based on the analysis of the appearance of frontal and non-
frontal faces acquired in different instants, without metrological 
issues [2], [7], [12]. Therefore, it is not possible for those 
systems to deal with pose variation or the natural variation of 
face expression. Conversely, the authors proposed a system that 
provides a fusion between the point-cloud 3D reconstruction 
and the pure image feature analysis by using a stereoscopic 
vision rig [13], [14]. The stereoscopic setup acquires 
synchronously from several different calibrated cameras the 
images of a person’s face, so that it is possible to triangulate the 
feature points and express the biometric characteristics in the 
3D space using couples of cameras grabbing the face from 
different orientation. Using different couples of synchronized 
stereo cameras, it is possible to evaluate the effect of the pose 
variability without the interaction of face expression variation, 
since images are acquired in synchrony from stereo pairs having 
different orientations towards the face. 

In any case, the process of image formation is yet complex 
and stochastic, so that the generic image of the face is affected 
by an uncertainty contribution that propagates throughout all 
the processing stages. In addition, the final classification result 
is uncertain, up to a point that it may be impossible to find a 
perfect matching between images of the same person acquired 
in different moments [11]. Hence, there is a clear risk in 
accepting the generic classification results without a proper 
analysis of the uncertainty [15], [16]. 

In this context, the scientific literature proposes a good 
variety of approaches in order to limit or reduce the risk of an 
incorrect classification by the formulation of a priori 
uncertainty models [2], [8],  [17] - [20]. In this way, given a set 
of uncertainty sources, it is possible to solve the problem in a 
Bayesan framework so that the most probable matching can be 
computed. However, these approaches have a limit since they 
can deal only with the supposed uncertainty sources, whereas 
the actual image can be characterized by other sources of 
uncertainty, which could be not included in the analysis model. 
This situation brings to a general overestimation or 
underestimation of uncertainty. Nonetheless, uncertainty 
propagation is not performed in the way advised by the ISO-
GUM standard. 

In addition, the evaluation of the performance of FR 
systems is still an open problem. Typically, this process relies on 
the estimation of the recognition reliability from a given 
database of biometric images. Furthermore, recognition 
reliability is judged with the help of synthetic indexes that 
express the probability of having a false positive or a false 
negative [16], [21], [22]. If perfectly run, this procedure is able 
to estimate the performance only for a given set of faces, thus it 
is hard to estimate the inference of such a result in the generic 
case. Oppositely, the authors proposed in previous papers a 

metrological approach to the problem of characterizing FR 
systems [14], [23]. This has been done by analyzing the physical 
relations between the uncertainty in the final classification result 
and the uncertainty of input influence quantities [24]. The 
experimental activity demonstrated that the performance of 
these kinds of systems depends on several aspects from image 
acquisition to the classification procedure through the 
biometrical algorithm. The quantification of these uncertainty 
contributions lead the authors to formulate an original 
classification method which was able to improve the 
classification performance with respect to traditional score-
based approaches [25], [26]. 

Even if the authors were able to propose an accurate 
metrology based method to evaluate the performances of FR 
systems, there are still open question in the design phase: what 
is the optimal layout of new FR systems? Does the face 
recognition algorithm interfere with the selection of the optimal 
vision rig? In literature, different approaches have been 
presented [27] - [29] to optimize the generic vision system, but 
only little is known on how to choose an architecture for a 
specific problem. 

In this scenario, the aim of this work is to compare, from a 
metrological point of view, different architectures of a 3D 
system for face recognition. In particular, starting from the 
approaches proposed in [23]- [26] for evaluating the confidence 
level of the classifier output, a detailed study of the influence on 
the metrological performance of different camera 
configurations (in terms of different position with respect to 
the subject), different kind of cameras (in terms of sensor and 
resolution), and different poses (in terms of subject face 
rotation with respect to the couple of cameras) is reported in 
this paper. The comparison is based upon the analysis of 
uncertainty in both feature extraction and the classification 
process. This study is claimed to provide useful indications for 
system designers, whose task in practical applications is to select 
the best compromise between the system performances (in 
terms of correct classification rate), geometrical constraints (i.e. 
vision rig arrangement) and hardware constraints (i.e. cost of 
optics or acquisition hardware). 

In the following, after a brief recall of the developed 
biometric system, the description of the processing for face 
recognition and the comparison of the metrological 
performance of different measurement configurations are 
shown. 

2. THE BIOMETRIC SYSTEM 
2.1. System for the images acquisition  

As mentioned before, the image database was created with a 
peculiar multiple stereovision rig. The setup is described 
schematically in Figure 1 and consists of an aluminum frame 
carrying a stereovision matrix composed by 3 stereo pairs 
having 3 different nominal orientations toward the face: 0°, 5° 
and 10°. The cameras of the first row of the matrix have an 
attitude of −22.5° towards the observer, while the second row 
cameras have an attitude of +22.5° towards the observer. This 
choice leads to a relative angle (incidence) of 45° between each 
camera of the pair, which is a good compromise between 
accuracy in depth measurement (which increases with 
increasing incidence [30]) and avoiding the risk of view 
occlusion (which, conversely, is more probable with higher 
incidence of the stereo pair [31]). In addition, a seventh camera 
has been positioned at mid-height (cam 7 in Figure 1b), 


ACTA IMEKO | www.imeko.org April 2017 | Volume 6 | Number 1 | 35 

between the two rows with zero orientation and zero attitude, 
to collect passport-like photos that can be used for the 
validation of 2D recognition techniques. The configuration of 
the vision rig allows the field of view size to be approximately 
300~400 mm, that means face registration can be performed 
with the subject standing about 1000 mm away from the 
cameras. 

Stereoscopic 3D point reconstruction is obtained by 
coupling the view of each top camera with the corresponding 
bottom camera. The cameras have the characteristics listed in 
Table 1 and are all equipped with 25 mm focal length fixed 
optics. Camera calibration is performed pair-by-pair with the 
well-known Zhang method [32] by means of the Camera 
Calibration Toolbox from J.Y. Bouget. Image acquisition is 
performed with 3 IEEE1394 PCI adapters (for cameras 1 to 6) 
and a Gigabit Ethernet adapter (for camera 7). The 
synchronization of acquisition between the 7 cameras is 
provided by a common rising edge trigger source. Exposure 
time and aperture are regulated in order to have similar 
luminance with similar depth of focus. With the help of a 
LabVIEW Virtual Instrument, the images of all cameras are 
simultaneously recorded in bitmap format. The control system 
acquires sequences of images with a user-defined interval 
(default set 5 s), in order to obtain multiple images of the same 
person in about the same position but with small variation of 
face expression that naturally happens between each capture.  

2.2. Biometric algorithm 
The biometric algorithm considered is composed by two 

main steps: 
a) Segmentation of the biometric features in the images 

acquired by the upper and lower camera with the help of 
an Active Appearance Model (AAM) algorithm [33]. 

b) Triangulation of the 2D points of the biometric features 
to retrieve a 3D mask representation of the biometric 
features. 

The main phase, a), is the segmentation of 2D features out 
of a generic image of a human face. The scientific literature 
proposes several ways to perform this task and a good review 
can be found in [1] and [2]. The authors chose the AAM 
algorithm due to its effectiveness in matching both the shape 
and appearance of a given pattern [34]. The shape itself is 
defined as a 2D points set describing the geometry of a target 
body [35]. The AAM algorithm is able to deal with shape 
modification since it uses a dynamic model of shapes in the 
form of Principal Component Analysis (PCA), so that any given 
shape 𝑥𝑆 can be expressed in the form: 
𝑥𝑆 = �̅� + [𝜙] ⋅ 𝑏 , (1) 
where  �̅� represents the average shape, [𝜙] represents the 
matrix of principal components [𝜙1|𝜙2| … |𝜙n]  that express 
deformation modes and 𝑏 a vector of real numbers that set the 
model deformable shape parameters.  

The appearance is defined as the texture (a map of level of 
gray or color) of a portion of the target. The AAM algorithm 
deals with appearance variation with appearance PCA models 
that arrange all pixel intensity variations of the images around 
the mean shape. The PCA formulation gives for the general 
appearance vector 𝑔𝑎: 
𝑔𝑎 = �̅� + �𝜙𝑔� ⋅ 𝑏𝑔 , (2) 

where  �̅� represents the average appearance, �𝜙g� represents 
the matrix of principal components �𝜙g,1�𝜙g,2� … �𝜙g,n] which 
express luminance modes and 𝑏g a vector of real numbers that 
set the models variable luminance parameters.  

Shape and Appearance models are based on the analysis of 
training images. So, once the model formulation is set, it is 
necessary to submit a sample of images where facial features 
have been manually annotated in order to define the Principal 
Components [𝜙] and �𝜙g�. The annotation process consists in 
tracing univocal landmarks that describe the most important 
facial traits on each image. In this work, the biometric 
description involves 58 landmarks that define the shape of jaw, 
mouth, nose, eyes, and eyebrows. The choice of the 58 points 
to be matched agrees with methodologies documented in other 
studies [36], [37], in order to be able to compare data. 

Once the AAM is trained, it is able to find biometric 
landmarks in the generic image of one’s face. Given the high 
difference in attitude, it has been necessary to train a model for 
the upper camera and a model for the lower camera. The 

 
a) 

 
b) 

 
c) 

Figure 1. Design scheme of the vision rig used for FR purposes: a) upper view b) front view c) lateral view. 

Table 1. Characteristics of the vision equipment. 

Camera ID Model Sensor 

1, 2 AVT – Pike F-145B 1388X1038 px  color CCD 
progressive 

3, 4, 5, 6 AVT – Marlin F-131B 1280X1024 px CMOS sensor 
7 IDS – GigE UI-5490SE-M 3840X2748 px CMOS sensor 

Person

0° Pair 5° Pair 10° Pair

Cam 1,7,2 Cam 3,4 Cam 5,6

Cam 1
AVT Pike

Cam 3
AVT Marlin

Cam 5
AVT Marlin

Cam 7
Ids UI

Cam 2
AVT Pike

Cam 4
AVT Marlin

Cam 6
AVT Marlin

-22.5°
Attitude

22.5°
Attitude

0°
Attitude

Person

Cam 1,3,5

Cam 2,4,6

Cam 7


ACTA IMEKO | www.imeko.org April 2017 | Volume 6 | Number 1 | 36 

software implementation is provided by an Application 
Programming Interface dedicated to AAM provided by [38], 
[39]. 

Subsequently, for the generic face it is possible to match a 
set of points in the stereo pair and then to triangulate them with 
the epipolar constraint [28]. Eventually a 3D mask of a face is 
recorded.  

Figure 2 schematizes the main phases for the realization of 
the 3D mask. 

2.3. The identity database  
The image acquisition system described before is then ready 

to be tested. The first step consists in the acquisition of a 
suitable database of facial images. The database was built by 
acquiring images of 117 volunteers, using the vision rig 
described in Section 2.1. For each of the 117 individual three 
series of images have been recorded: 

- One set with subject’s face oriented towards the point in 
the middle between cam1 and cam2 (0° neck rotation); 

- One set with 10° neck rotation on the right hand side; 
- One set with 20° neck rotation on the right hand side. 
Neck rotation is controlled by asking the person to look 

toward a point that has been fixed at the wanted orientation 
with respect to the 0° neck rotation. In this sense, the values of 
neck rotation used in this work are to be considered as 
approximate values, with the only aim to get indications on the 
variation of the recognition accuracy with the neck rotation. 

Each set contains 9 pictures of the same subject in the same 
position, acquired with a rate of 1 images every 5 s. In this way 
it is possible to record natural minor changes in head position 
and in facial expression and let the AAM model cover them. 
On the whole, 162 images for every person have been acquired. 
The position of the person with respect to the vision rig, as well 
as the relative distance between cameras and face, has not been 
controlled. This was chosen on purpose, since the aim of this 
work is to evaluate the performance of the system in operating 
conditions: modern identity verification systems, in fact, do not 
require constraining the person to a rigid frame and most of 
them work in completely unconstrained environments [3]. 

As previously underlined, AAM models should be trained 
properly upon a set of images, which is supposed to sample 
correctly the actual variability of face appearance. Therefore, as 
the number of facial images used for the training increases, the 
AAM model is able to cover a broader range of shape and 

texture variations. However, when the number of images used 
for training grows, the handling of such a great size of 
information becomes hard, the weight of random effects over 
deterministic phenomena becomes higher and the final 
reliability of face recognition becomes lower [13]. A good 
compromise has been found building a model using 200 images 
from 50 people belonging to the database of 117 individuals. 
Note that images from all the couples of cameras and with the 
3 different head rotation (0°, 10° and 20°) were used to create 
the model.  

It is not convenient to train the AAM model with the same 
images that will be used in personal identification (i.e. to test 
the accuracy of the results). Because of this reason, images used 
for AAM model training were not included in the database used 
for the evaluation of the performances of the FR. 

3. PROCESSING FOR FACE RECOGNITION 
3.1. Evaluation of scores 

In order to perform identity verification, the 3D mask of a 
person is compared with each of the 3D masks included in an 
identity database. The comparison outputs a value named score, 
which is computed by means of the weighted sum of squared 
differences between a collection of masks stored in an identity 
database. Therefore, for the i-th mask stored in the database the 
i-th score, Si is given by:  

𝑆𝑖 =
∑ (𝑊𝑘∗(𝑉𝑘,𝑖−𝑉𝑘,𝑟𝑟𝑟)

2)𝑛𝑘=1   

𝑛
 , (3) 

where, Vk,i are the coordinates of the k-th point for the i-th 
individual, Wk is the weight of the k-th point of the mask. Then, 
the score 𝑆𝑖 represents the sum of squared discrepancy between 
the 3D coordinates of the mask to be recognized and the 
corresponding coordinates of each mask in the database. Prior 
to the evaluation of point-to-point distances (𝑉𝑘,𝑖 − 𝑉𝑘,𝑟𝑟𝑟), a 
roto-translation is computed in order to move the coordinate 
frame of one mask to be onto the coordinate frame of the other 
mask with a rigid motion. The roto-translation allows the 
compensation of differences in position and orientation of the 
subject with respect to the stereoscopic system in acquisition. 
The analysis of the variance of the identification parameters 
calculated on identifying the same person at different neck 
rotation conditions generates the weights 𝑊𝑘. The procedure 
for weight estimation is: images of the same individual are 
submitted to FR, for the 𝑘-th landmark the variance 𝑉𝑉𝑟𝑘 of 

 
Figure 2. The schematic of biometric algorithm with featured detected on stereoscopic images and the obtained 3D mask. 

Stereo acquisition Segmentation

Biometric features in 3D space

AAM Triangulation


ACTA IMEKO | www.imeko.org April 2017 | Volume 6 | Number 1 | 37 

the 3D coordinates is estimated (𝑉𝑉𝑟𝑘 is an estimate of 
identification uncertainty), the weight for the 𝑘-th landmark is 
defined as the inverse of 𝑉𝑉𝑟𝑘 [13]. 

3.2. Sources of score variability and bias 
As previously described, the evaluation of Si (by means of 

(3)), results from a suitable sequence of operations performed 
starting on the input images. In particular, they are: 

i. 2D image acquisition; 
ii. 2D images processing by means of AAM algorithm; 
iii. triangulation; 
iv. roto-translation. 
Each one of these operations could introduce uncertainty 

and/or bias on the Si estimate [24], [40].  
As for step i, the acquisition conditions, in terms of 

luminance, lens defocus, presence of motion on the image itself 
(due for example to vibration of the system during the image 
acquisition or to the subject movement), and of face pose angle 
can modify the images with respect to ones contained in the 
identity database, thus changing the positions of the landmarks 
on 2D pair of images, and consequently on the resulting 3D 
mask. As an example, Figure 3 shows such effects for the case 
of luminance, motion blur and lens defocus: the interpolation 
of the landmarks significantly changes as the motion blur or 
lens defocus is present, whereas it is weakly influenced by 
luminance. As a consequence, it is expected that Si will be 
affected by such quantities of influence, in particular by motion 
blur and lens defocus. As an example, Figure 4 reports the 
scores evaluated at different levels of lens defocus and when 
the subject #1 of the identity database and the pair cam1-2 are 
considered. For a sake of clarity, the scores evaluated 
considering only the first seven subjects (of the identity 
database) have been reported. As you can see, if the level of 

lens defocus increases, the value of the score, S1, increases, 
whereas the distances between the classes (subjects of the 
identity database) decrease, thus making worse the system 
performance in terms of correct decision rate. Similar trends 
have been observed when other input subjects or other pairs of 
cameras (3-4 and 5-6) are considered.  

Then, the image uncertainty propagates in the next step (step 
ii). With reference to the AAM algorithm, the uncertainty in the 
building of the Shape Model and the Appearance Model [20] 
determines the accuracy of feature localization. Since the 
weights, Wk, of (3) are defined in the training phase on the basis 
of a number of a pair of images for each of all subjects, an 
intrinsic score variability should be considered to take into 
account such aspect. As an example, Figure 5 reports, for a 
subject of the database, the nine images used for the training of 
the AAM, together with the scores. The first image on the left 
in Figure 5 is used as a reference.  

As for step iii, the triangulation is realized by means of the 
method proposed by Zhang [27] and a residual error is mainly 
due to the non-ideality of the calibration phase.   

As for step iv, the roto-translation is made by means of an 
iterative procedure and the residual error is mainly linked to the 
noise on the 3D features of the masks.  

3.3. Uncertainty modelling for Score 
The effects of triangulation (step iii) and roto-translation 

(step iv) are here modelled by means of residual systematic Score 
greater than zero also for the correct class, whilst the 
measurement uncertainty introduces variability on the Score of 
the correct class that strictly depends on the image acquisition 
condition (step i) and on the accuracy of AAM (step ii). 
Therefore, in the following, all the systematic effects are directly 
included in the measured Score, whilst only the measurement 
uncertainty (due to the acquisition process and to the accuracy 
of AAM) is considered. 

As for the uncertainty component related to step i, a 
statistical approach [26] is considered to estimate the 
uncertainty model corresponding to each quantity of influence. 
In particular, as for the quantities of influence luminosity, lens 
defocus and motion blur, new couples of artificial images are 
used to achieve further images characterized by the desired 
values of the quantities of influence. The new couples of images 
were generated by applying suitable digital filtering on the 
reference images (i.e. the ones contained in the identity 
database, hereinafter denoted as database A). For each pair of 
cameras, several values of variation (with respect to the 
reference image contained in the identity database) have been 
considered in order to generate the new images (see Table 2). 
The variations were applied to luminosity (11 different values), 
lens defocus (7 different values) and motion blur (11 different 

 
Figure 4. Effects of the lens defocus on the scores.  

 
Figure 3. Effects of the acquisition conditions (the red arrows point the 
areas mainly affected by the quantity of influence):  
a) luminance, b) motion blur, c) lens defocus. 

a)

b)

c)


ACTA IMEKO | www.imeko.org April 2017 | Volume 6 | Number 1 | 38 

values). As for the image luminosity, the values considered are 
referred to a 8-bit grey scale, whereas, as for lens defocus and 
motion blur the values considered are referred to the standard 
deviation (in pixels) of the Gaussian filter employed. 

On the contrary, as for the pose face angle, for each subject, 
the images acquired at different neck rotation (see Section 2.3) 
were considered, obtaining the image types summarized in 
Table 2.  

In this way new databases (one for each pair of cameras, 1-2, 
3-4, and 5-6, hereinafter denoted as database B) of images 
designed for the uncertainty estimation are achieved.  

Then, for each considered condition, the procedure adopted 
for uncertainty estimation for each pair of cameras is the 
following: 

- for each subject of these new databases, the 3D features 
are evaluated applying the AAM algorithm and the 
triangulation; 

- the Scores with respect to the recorded data in the 
identity database  of the subject itself is estimated; 

- on the Scores obtained for all the subjects, a statistical 
analysis is made, and the uncertainty is estimated 
according to the ISO-GUM, by using the simple model 
described in the following equation.  

(𝑢𝑆)𝑙 = �
�µ𝑆
2�𝑙
3

+ (𝜎𝑆
2)𝑙 , (4)  

where, (uS)l is the contribution due to the l-th quantity of 
influence on the Score uncertainty, (µS)l and (σS)l are respectively 
the mean and the sample standard deviation of the measured 
Scores related to the database B. Then, for each quantity of 
influence, the identification of the relationship 
between (uS)l  and the value of l can be based on a large 
experimental analysis and on the fitting of simple models onto 

experimental data [26]. 
As for the uncertainty related to step ii, (uS)AAM, we pose: 

(𝑢𝑆)𝐴𝐴𝐴 = �
�µ𝑆

2�𝐴𝐴𝐴
3

+ (𝜎𝑆
2)𝐴𝐴𝐴 , (5) 

where (µS)AAM and (σS)AAM are respectively the mean and the 
sample standard deviation of all of the measured Scores achieved 
for each subject when the nine images (related to the database 
A) are considered. 

As for the overall uncertainty on the scores, uS, all of the 
quantities of influence are considered uncorrelated with the 
other ones, then the combined uncertainty on the score is 
evaluated as: 

𝑢𝑆 = �∑ (𝑢𝑆)𝑙
2𝐴

𝑙=1 + (𝑢𝑆)𝐴𝐴𝐴
2  , (6) 

where M is the number of the considered quantities of 
influence. Applying these models, the uncertainty of the Scores is 
evaluated. The hypothesis made of uncorrelated quantities has 
been also verified through experiments. 

3.4. Classification procedure 
Starting from the Scores and the related uncertainty, uS, 

estimated as described in the previous sections, the 
classification procedure provides the subject identification and 
the corresponding related confidence level (CL). The output of 
such a procedure is a classification list in which all the possible 
identified subjects are included with their CL.  

To these aims, at first, the probability that the input subject 
belongs to each j-th class, Pj , is evaluated; then, on the basis of 
the obtained probability, the classification list is created with a 
selection of the probable classes; and finally, the confidence 
level of each class in the list is evaluated. In more detail, the 
decision algorithm (see Figure 6) is composed by three steps 
[23], [25]: 

- STEP A: for each subject, j , present in the database, the 
measured Score, Sj, is used together with the 
corresponding uncertainty, uS, in order to evaluate the 
probability, Pj, that the input subject is the subject j. In 
particular, Pj represents the probability that the Score of 
the j-th subject is equal to zero given a measured value 
(Sj), considering the score as a random variable (see 
Figure 6), whose standard deviation is equal to uS [25];  

 
b) 
 

Figure 5. Definition of Wk: a) Images acquired by Camera 1 for subject #1, b) Scores evolution (the first image of the subject is used as reference). 

Table 2. Values of variations of the quantities of influence with respect to the 
reference images of the identity database. 

Quantity of Influence Values Units 

Luminosity  [-30, -10, -3, 0, +3, +10, +30] Grey levels 
Lens defocus (σFilter) [0, 3, 5, 7, 10, 15, 20] Pixels 
Motion blur (σFilter) [0, 3, 9, 15, 17, 19, 23, 27, 30, 50, 70] Pixels 

Pose angle  [0, 10, 20] Degree 

1 2 3 4 5 6 7 8 9
0

0.05

0.1

0.15

0.2

poses 

S
co

re
s

 
ACTA IMEKO | www.imeko.org April 2017 | Volume 6 | Number 1 | 39 

- STEP B: using all the estimated probabilities the 
classification list is created including only the subjects 
whose probability is greater than a threshold, TH; 

- STEP C: the confidence level, CLj, of the subjects in the 
classification list is evaluated as CLj=Pj/K where K is 
equal to the sum of all the Pj greater than TH. 

As for the selection of TH, it should be chosen as best 
trade-off between sensitivity and selectivity required by the 
specific application. Indeed, it is expected that the higher is TH, 
the higher is the selectivity and the lower is the sensitivity of the 
system. Therefore, as an example, on the basis of the 
requirements of the application in terms of True Acceptance 
Rate (TAR evaluated as True Positive/all positive) and of the 
false acceptance rate (FAR evaluated as False Positive/all 
negative) the value of TH could be selected [26]. 

4. PERFORMANCE COMPARISON 
In this section, the comparison of the three considered 

system architectures (i.e. Cam1-2, Cam3-4, Cam5-6) is reported. 
In particular, at first, the comparison of all uncertainty 
components involved in (4) and (5) is made. Then, the 
recognition performance of the systems is presented by 
comparing suitable figures of merit, typically adopted for the 
analysis of classification systems [40]. 

4.1. Score uncertainty estimation 
In this section the uncertainty components related to the 

acquired 2D images (luminosity, lens defocus, motion blur and 
pose angle) and to the accuracy of AAM are reported for each 
pair of cameras.   

In particular, Figure 7 shows the value of (uS)l calculated by 
means of (4) and Table 3 reports the value of (uS)AAM estimated 
by means of (5).  

Looking at Figure 7 and Table 3,  some considerations can 
be drawn: 

- For each pair of cameras, the image luminosity is the 
quantity of influence corresponding to the lowest 
uncertainty value ((uS)LUM), whereas the pose angle 
represents the quantity responsible of the highest 
uncertainty values ((uS)ANGLE). 

- Except for the pose angle, for each quantity of influence 
and whatever be its deviation from the corresponding 

value on the reference image, the lowest value of 
uncertainty is achieved with the pair Cam1_2, whereas 
the pair Cam5_6 shows always the highest value of 
uncertainty.  
These results are mainly due to the smaller distance 
(from the subject to be identified) and orientation of the 
pair Cam1_2 with respect to the other pairs (see Figure 
1). 

- As for the pose angle, pair Cam1_2 shows the worst 
performance. These results are due to the different 
orientation of camera pairs from the subject to be 
identified. In particular, the pair Cam1_2 is the most 
well aligned with the person to be identified, thus it 
results as the more sensitive to the pose angle.     

- The uncertainty due to the AAM algorithm, (uS)AAM, is 
comparable with the uncertainty due to the luminosity 
of the acquired images (uS)LUM, while it is lower than the 

a) 
 

b) 
 

c) 
 

d) 
 

Figure 7. (uS)l versus the quantities of influence: 
a) luminosity, b) lens defocus, c) motion blur, d) pose angle. 

The classification list is composed only by the 
subjects with a probability, Pj, greater than a 

threshold, TH.

STEP A

𝐾 = ∑𝑃𝑗;        𝐶𝐶𝑗 =
𝑃𝑗
𝐾

ith subject with CLi

jth subject with CLj

mth subject with CLm

STEP B

STEP C

 
Figure 6. Main steps of the classification procedure. 

-30 -20 -10 0 10 20 30
0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

Luminance (grey levels)

(u
S
) L

U
M

 
Cam1-2
Cam3-4
Cam5-6

0 2 4 6 8 10 12 14 16 18 20
0

0.05

0.1

0.15

0.2

0.25

σFilter (pixels)

(u
S
) S

F
G

 
Cam1-2
Cam3-4
Cam5-6

0 10 20 30 40 50 60 70
0

0.05

0.1

0.15

0.2

0.25

σFilter (pixels)

(u
S
) M

O
T

 
Cam1-2
Cam3-4
Cam5-6

0 2 4 6 8 10 12 14 16 18 20
0

0.1

0.2

0.3

0.4

0.5

Angle [°]

(u
S
) A

N
G

LE

 
Cam1-2
Cam3-4
Cam5-6


ACTA IMEKO | www.imeko.org April 2017 | Volume 6 | Number 1 | 40 

uncertainty due to the other quantities of influence. 

4.2. Classification performance 
In order to evaluate and compare the overall classification 

performance of the three system architectures, the following 
figures of merit have been considered. 

- Correct classification (CC): the right class is identified with the 
highest CL value; 

- Abstention (AB): the right class is in the classification list but 
there are some other classes with the same CL, thus the 
system does not provide any decision; 

- Missed classification (MC): the classification list is empty; 
- Wrong classification (WC): either the correct class is not in the 

classification list or it is present without the highest value of 
CL. 

These performance indexes have been evaluated by considering 
the following values of TH: 0.3, 0.5, 0.7. In this way, it is possible 
to compare the three system architectures for different degrees of 
selectivity and sensitivity.  

The recognition procedure has been run by considering the 
images of database B. Table 4 compares the considered figures 
of merit for different values of TH and camera pairs.  

As expected, whatever the pair of cameras, the value of TH 
affects the values of the figures of merit. In particular: 

- the lower is TH, the lower is the MC percentage and the 
higher are the AB and WC percentages; 

- the best performance in terms of CC is observed for the 
considered intermediate value of TH. 

In addition, for a given value of TH: 
- Even if pairs Cam1-2 and Cam3-4 are based on different 

hardware (see Table 1), they show similar performance 
in terms of all the considered figures of merit. 
Therefore, it seems that the type of sensor weakly 
influence the performance of the system. 

- Pair Cam5-6 shows the best performance in terms of 
CC and MC only when TH=0.7, whereas, it seems the 

worst solution for TH=0.3 and TH=0.5. Therefore, 
since the pairs Cam3-4 and Cam5-6 share the same 
hardware (see Table 1), the different performance can 
be imputable to their distances from the input subject 
(in the case of Cam5-6 such a distance is the largest 
one). 

Consequently, depending on the performance constraints 
required by the application or the trade-off to be satisfied, a 
designer/user can select the most suitable solution in terms of 
both arrangement of pair of cameras and value of TH. 

5. CONCLUSIONS 

The paper has compared the metrological performance of 
different architectures for face recognition based on 3D 
features. The study has been conducted by considering a 
popular algorithm, the AAM, for facial feature detection and 
then stereo triangulation for 3D position measurement and by 
considering the classification procedure proposed by authors in 
previous papers. The work took into account the main causes 
of uncertainty generally affecting the performance of face 
recognition systems with a direct impact on the reliability of the 
final decision in the classification stage in terms of Correct 
Classification, Abstention, Missed Classification and Wrong 
Classification percentages. 

In particular, the following main results can be drawn: 
- for the considered sensor resolutions the kind of 

hardware does not influence the metrological and the 
classification performance of the system; 

- the distance between the subject (to be recognized) and 
the pair of cameras generally affects the metrological 
performance of the system. In particular, the pair 
Cam1_2 shows slightly better performance in terms of 
uncertainty on the score due to the variation of image 
luminosity, lens defocus and motion blur (see Figures 
7a-7c); 

- the horizontal angle between the plane containing the 
camera pair axis and the face to be identified, generally 
affects the metrological performance of the system in 
terms of uncertainty on the score (see Figure 7d). Such 
phenomenon is more evident for the pair Cam1_2 
because it is the pair most well aligned with the person 
to be identified, thus it is more sensitive to the pose;  

- as for the classification performance, the best 
performance is shown by pairs Cam1_2 and Cam3_4 if 
low values of TH are considered (i.e. highest selectivity 
and the lowest sensitivity), whereas the pair Cam5_6 
shows best performance if high values of TH are 
considered. Such results confirm that the distance 
between cameras and the subject affects the 
performance, and that a designer could also prefer the 
pair Cam5_6 if the specific application constraints 
require the lowest selectivity and the highest sensitivity 
(i.e. a high value of TH). 

The quantification of such kind of information could be 
very useful also for system designers and/or users, which, in 
practical applications, have often to select the best trade-off 
between camera arrangements and system performance. 
Future developments will investigate the possibility of 
exploiting more pairs of camera at a time with the aim of fusing 
the information extracted, looking for improving the system 
classification performance.   

Table 3. Values of (uS)AAM for the pairs of cameras. 

 Cam1_2 Cam3_4 Cam5_6 

(µS)AAM 0.140 0.139 0.17 
(σS)AAM 0.006 0.005 0.02 
(uS)AAM 0.080 0.080 0.10 

Table 4. Performance comparison of the pairs of cameras for different TH. 

TH 
Figure of 

merit 
Cam1_2 Cam3_4 Cam5_6 

0.3 

CC [%] 84.7 82.3 70.9 
AB [%] 3.3 3.6 3.4 
MC [%] 1.0 2.3 2.3 
WC [%] 11.0 11.8 23.3 

0.5 

CC [%] 88.0 85.7 87.5 
AB [%] 1.1 1.1 1.6 
MC [%] 7.6 10.0 6.2 
WC [%] 3.3 3.1 4.6 

0.7 

CC [%] 77.7 76.8 84.2 
AB [%] 0.2 0.5 0.7 
MC [%] 21.8 22.3 14.4 
WC [%] 0.3 0.3 0.7 


ACTA IMEKO | www.imeko.org April 2017 | Volume 6 | Number 1 | 41 

REFERENCES 

[1] S. Li e A. Jain, Handbook of Face Recognition, Springer London, 
2011. 

[2] D. Yadav e A. Bhattacharya, “Face identification methodologies 
in videos”, Communication Technologies (GCCT), 2015 Global 
Conference on, 2015. 

[3] A. Danelakis, T. Theoharis e I. Pratikakis, “A survey on facial 
expression recognition in 3D video sequences,” Multimedia 
Tools and Applications, vol. 74, n. 15, pp. 5577-5615, 2015. 

[4] D. White, P. Jonathon Phillips, C. Hahn, M. Hill e A. O’Toole, 
“Perceptual expertise in forensic facial image comparison,” 
Proceedings of the Royal Society B: Biological Sciences, vol. 282, 
n. 1814, 2015. 

[5] Z. Sufyanu, F. Mohamad e A. Ben-Musa, “A proposed integrated 
human recognition for security reassurance,” American Journal 
of Applied Sciences, vol. 12, n. 2, pp. 155-165, 2015. 

[6] Y. Park, G. Joung, Y. Song, N.-S. Yun e J. Kim, “A study on the 
safety management system of a passenger ship using biometrics,” 
Journal of Nanoelectronics and Optoelectronics, vol. 11, n. 2, pp. 
194-197, 2016. 

[7] M. Hassaballah e S. Aly, “Face recognition: Challenges, 
achievements and future directions,” IET Computer Vision, vol. 
9, n. 4, pp. 614-626, 2015. 

[8] M. Haghighat, M. Abdel-Mottaleb e W. Alhalabi, “Fully 
automatic face normalization and single sample face recognition 
in unconstrained environments,” Expert Systems with 
Applications, vol. 47, pp. 23-34, 2016. 

[9] H. Yang e I. Patras, “Mirror, mirror on the wall, tell me, is the 
error small?,” 2015. 

[10] R. Larsen, K. Hilger, K. Skoglund, S. Darkner, R. Paulsen, M. 
Stegmann, B. Lading, H. Thodberg e H. Eiriksson, “Some issues 
of biological shape modelling with applications,” Lecture Notes 
in Computer Science (including subseries Lecture Notes in 
Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 
2749, pp. 509-519, 2003. 

[11] M. Vidhyalakshmi e E. Poovammal, “A survey on face detection 
and person re-identification,” Advances in Intelligent Systems 
and Computing, vol. 410, pp. 283-292, 2016. 

[12] C. Wöhler, 3D Computer Vision: Efficient Methods and 
Applications, Springer London, 2012. 

[13] E. Zappa, R. Testa, M. Barbesta e M. Gasparetto, “Uncertainty 
of 3D facial features measurements and its effects on personal 
identification,” Measurement: Journal of the International 
Measurement Confederation, vol. 49, n. 1, pp. 296-307, 2014. 

[14] G. Betta, D. Capriglione, M. Corvino, C. Liguori e A. Paolillo, 
“Face based recognition algorithms: A first step toward a 
metrological characterization,” IEEE Transactions on 
Instrumentation and Measurement, vol. 62, n. 5, pp. 1008-1016, 
2013. 

[15] D. White, J. Dunn, A. Schmid e R. Kemp, “Error rates in users 
of automatic face recognition software,” PLoS ONE, vol. 10, n. 
10, 2015. 

[16] Bundesamt für Sicherheit in der Informationstechnik,  “Study: 
An investigation into the performance of facial recognition 
systems relative to their planned use in photo identification 
documents – BioP I” Public final report. Available on line: 
http://www.bsi.bund.de. 

[17] M. De-la-Torre, E. Granger, R. Sabourin e D. Gorodnichy, “An 
adaptive ensemble-based system for face recognition in person 
re-identification,” Machine Vision and Applications, vol. 26, n. 6, 
pp. 741-773, 2015. 

[18] A. Punnappurath, A. Rajagopalan, S. Taheri, R. Chellappa e G. 
Seetharaman, “Face Recognition Across Non-Uniform Motion 
Blur, Illumination, and Pose,” IEEE Transactions on Image 
Processing, vol. 24, n. 7, pp. 2067-2082, 2015. 

[19] Y.-T. Chou, S.-M. Huang e J.-F. Yang, “Class-specific kernel 
linear regression classification for face recognition under low-

resolution and illumination variation conditions,” Eurasip 
Journal on Advances in Signal Processing, vol. 2016, n. 1, pp. 1-
9, 2016. 

[20] A. Fathi, P. Alirezazadeh e F. Abdali-Mohammadi, “A new 
Global-Gabor-Zernike feature descriptor and its application to 
face recognition,” Journal of Visual Communication and Image 
Representation, vol. 38, pp. 65-72, 2016. 

[21] S. A. Rizvi, P. Phillips e H. Moon, “Verification protocol and 
statistical performance analysis for face recognition algorithms,” 
1998. 

[22] P. Wang, Q. Ji e J. Wayman Jr., “Modeling and predicting face 
recognition system performance based on analysis of similarity 
scores,” IEEE Transactions on Pattern Analysis and Machine 
Intelligence, vol. 29, n. 4, pp. 665-670, 2007. 

[23] G. Betta, D. Capriglione, M. Gasparetto, E. Zappa, C. Liguori e 
A. Paolillo, “Managing the uncertainty for face classification with 
3D features,” In proceedings of 2014 IEEE International 
Instrumentation and Measurement Technology Conference 
(I2MTC 2014), p. 412-417, Montevideo, Uruguay, 12-15 May, 
2014. 

[24] G. Betta, D. Capriglione, M. Corvino, M. Gasparetto, E. Zappa, 
C. Liguori e A. Paolillo, “Metrological performance comparison 
of biometric system architectures for 3D face recognition,” XXI 
IMEKO World Congress “Measurement in Research and 
Industry”, September 4, 2015, Prague, Czech Republic. 

[25] G. Betta, D. Capriglione, M. Corvino, C. Liguori e A. Paolillo, “A 
proposal for the management of the measurement uncertainty in 
classification and recognition problems,” IEEE Transactions on 
Instrumentation and Measurement, vol. 64, n. 2, pp. 392-402, 
2015. 

[26] G. Betta, D. Capriglione, M. Gasparetto, E. Zappa, C. Liguori e 
A. Paolillo, “Face recognition based on 3D features: 
Management of the measurement uncertainty for improving the 
classification,” Measurement: Journal of the International 
Measurement Confederation, vol. 70, pp. 169-178, 2015. 

[27] S. Zhang, Handbook of 3D Machine Vision: Optical Metrology 
and Imaging, CRC Press, 2013. 

[28] R. Hartley e A. Zisserman, Multiple View Geometry in 
Computer Vision, Cambridge University Press, 2004. 

[29] H. Wechsler, Reliable Face Recognition Methods: System 
Design, Implementation and Evaluation, Springer US, 2009. 

[30] D. Szalóki, K. Csorba e G. Tevesz, “Optimizing camera 
placement in motion tracking systems,” 11th International 
Conference on Informatics in Control, Automation and 
Robotics, ICINCO 2014, 2014. 

[31] X. Chen e J. Davis, “An occlusion metric for selecting robust 
camera configurations,” Machine Vision and Applications, vol. 
19, n. 4, pp. 217-222, 2008. 

[32] Z. Zhang, “A flexible new technique for camera calibration,” 
Pattern Analysis and Machine Intelligence, IEEE Transactions 
on, vol. 22, n. 11, pp. 1330-1334, 2000. 

[33] T. Cootes, G. Edwards e C. Taylor, “Active appearance models,” 
Lecture Notes in Computer Science (including subseries Lecture 
Notes in Artificial Intelligence and Lecture Notes in 
Bioinformatics), vol. 1407, pp. 484-498, 1998. 

[34] G. Edwards, T. Cootes e C. Taylor, “Advances in Active 
Appearance Models,” Proceedings of the 1999 7th IEEE 
International Conference on Computer Vision (ICCV'99), 1999, 
Kerkyra, Greece. 

[35] G. Edwards, C. Taylor e T. Cootes, “Interpreting face images 
using active appearance models,” 3rd IEEE International 
Conference on Automatic Face and Gesture Recognition, FG 
1998, Nara, Japan. 

[36] E. Zappa, P. Mazzoleni e Y. Hai, “Stereoscopy based 3D face 
recognition system,” 10th International Conference on 
Computational Science 2010, ICCS 2010; Amsterdam, 
Netherlands. 


ACTA IMEKO | www.imeko.org April 2017 | Volume 6 | Number 1 | 42 

[37] E. Zappa e P. Mazzoleni, “Reliability of personal identification 
base on optical 3D measurement of a few facial landmarks,” 10th 
International Conference on Computational Science 2010, ICCS 
2010; Amsterdam, Netherlands. 

[38] M. Stegmann, “The AAM-API: An open source Active 
Appearance Model implementation,” Medical Image Computing 
and Computer-Assisted Intervention, MICCAI 2003 - 6th 
International Conference Proceedings; Montreal, Canada, 2003. 

[39] M. Stegmann, B. Ersbøll e R. Larsen, “Fame - A Flexible 
Appearance Modeling Environment,” IEEE Transactions on 
Medical Imaging, vol. 22, n. 10, pp. 1319-1331, 2003. 

[40] F. Tortorella, An optimal reject rule for binary classifiers, Lecture 
Notes in Computer Science (including subseries Lecture Notes in 
Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 
1876 LNCS, 2014, pp. 611–620. 

 
	Metrological characterization of 3D biometric face recognition systems in actual operating conditions