INT J COMPUT COMMUN, ISSN 1841-9836
8(3):416-424, June, 2013.

Erosion based Method for Quantification of Facial Palsy

M. Găianu, G. Cristescu, D. M. Onchiş

Mihail Găianu
West University of Timişoara
România, 300223 Timişoara, Vasile Pârvan No. 4,
E-mail: mgaianu@info.uvt.ro

Gabriela Cristescu
Aurel Vlaicu University of Arad
România, 310130 Arad, Bd. Revoluţiei, Nr. 77,
E-mail: gcristescu@inext.ro

Darian M. Onchiş
University of Vienna
Austria, Nordbergstrasse 15, A-1090, Vienna
E-mail:darian.onchis@univie.ac.at

Abstract:
This paper presents a novel 3D face recognition method developed by means of erosion
of a contractible topological space. The procedure was is involved in the quantification
of facial palsy using pre-marked points. The recognition is done in two steps: in the
first step a detection of the pre-marked mimic points on a human face is performed and
in the second step the mutual distances between points are calculated. A procedure of
deducing the mobility or the immobility of points and the amplitude of mimic points
movement follows.
Keywords: distance, deletable point, interest point of mimic, Univalue Segment
Assimilating Nucleus.

1 Introduction and main results

Face recognition is a challenging task, that has been extensively investigated during the last
years by many authors (e.g. [1, 2]). While most of the previous has face recognition works based
on 2D images, the development of 3D scanning techniques moved the domain towards the 3D face
recognition. In a controlled environment, the actual 2D face recognition techniques can achieve
acceptable accuracy, but inadequate for more challenging applications, especially in the presence
of variation of lighting conditions and positions. Therefore, the approach using 3D recognition,
by including the complete geometrical information, provides potential to alleviate the impact of
lightening and position and may improve the recognition performance.
Detecting and tracking markers on the 3D human face is a challenging task. A standard pro-
cedure in this field is to use the level set algorithm ( [1]). An important aspect is to measure
the geodesic distances between markers applied on the 3D human face in order to emphasize the
interest points of mimic and to evaluate the improvement of the face mobility. The purpose of
this paper is to elaborate a method of detecting the interest points of face mimic, followed by
computing the mutual distances between them together with a method of deducing the move-
ment and its amplitude. Nowadays physicians use to measure these distances by means of the
classical rule, which is an operation affected by considerable error. A computer aided procedure
of points detecting and distance computing benefits of the possibility of increasing the resolution,
which diminishes the error size.

Copyright c⃝ 2006-2013 by CCC Publications


Erosion based Method for Quantification of Facial Palsy 417

The patient is placed in a two mirror system, which was developed by Frey, Jenny, Giovanoli
and Stussi in ( [2]). In the first step, the interest points of mimic are marked with a water-
resistant pen on the patient’s face (see Figure 1), obtaining a mask. The mimic interest points
are:

• Central points: Central Nose (CN), Philtrum (PH=CM);

• Left side points: Left Ala of the Nose (LAN), Left Brow (LBP), Left Lower Eyelid (LLE),
Left Mouth Corner (LMC), Left Midlateral Point of the Lower Lip (LML), Left Midlateral
Point of the Upper Lip (LMU), Left Tragus (LTR), Left Upper Eyelid (LUE);

• Right side points: Right Ala of the Nose (RAN), Right Brow (RBP), Right Lower Eyelid
(RLE), Right Mouth Corner (RMC), Right Midlateral Point of the Lower Lip (RML),
Right Midlateral Point of the Upper Lip (RMU), Right Tragus (RTR), Right Upper Eyelid
(RUE).

In a computer image each interest point of the mimic, i.e. each marked point of the mask, is
represented by an area consisting in a set of pixels colored by the marker called Univalue Segment
Assimilating Nucleus (usually denoted by USAN) [14]. Marking is necessary, because otherwise
the points cannot be identified at the same place, especially on the side views of the face of the
two mirror images.

Figure 1: Pattern of the markers

Phases of diagnose computer aided process presented in this paper are:

• Operation 1: Marking interest points of mimic on patient’s face;

• Operation 2: Acquiring the image of patient’s face from video camera;

• Operation 3: Selecting a restricted area including one interest point of mimic;

• Operation 4: Detecting the contour of USAN (Univalue Segment Assimilating Nucleus);

• Operation 5: Determining a representing pixel of USAN by erosion and detecting a marker
with the algorithm 1;

• Operation 6: Preserving the coordinates of the representing pixel;


418 M. Găianu, G. Cristescu, D. M. Onchiş

• Operation 7: Decision: If all interest points are detected:

– No - Repeat from Operation 3 to Operation 6;

– Yes - Go to Operation 8;

• Operation 8: Computing mutual distances between pixels representing interest points of
mimic and introducing them into the database;

• Operation 9: Repeating Operations 2 - 7 and computing the maximum variation of each
distance;

• Operation 10: Decision on the mobility by comparing the variation of distances;

• Operation 11: Presenting the result to the physician.

The paper is structured as follows. In the second section the algorithm used during the
proposed procedure of the computational diagnose aid is described. In the third section, after
the detection (see Figure 2), we calculate the mutual distances between the markers on the face
and use them to describe the face mobility. Some numerical results are included.

2 Active contours models and interest points detection

There are two main approaches in active contours models based on the mathematic imple-
mentation: snakes and level sets. Snakes explicitly move predefined snake points based on an
energy minimization scheme, while level set approaches move contours implicitly as a particular
level of a function.
As image segmentation methods, there are two kinds of active contour models according to the
force evolving the contours: edge- and region-based. Edge-based active contours use an edge
detector, frequently based on the image gradient, to find the boundaries of sub-regions and to
attract the contours to the detected boundaries.

The first model of active contour was proposed by Kass, Witkin and Terzopoulos in ( [5])
and named snakes due to the appearance of contour evolution. This method originates in various
erasing techniques, which started with the algorithms of Rosenfeld ( [8]). The same class of chain
techniques are used in linear programming, for example in fast solving a class of transportation
problems ( [4]). The snake method consists in selecting a set of points, called snake points, which
is connected and defines the contour. The snakes points are initially placed at further distance
from the boundary of the object. Then, each point moves towards the optimum coordinates,
according to some specific rule. The snake points eventually stop on the boundary of the object.
The classic snakes provide an accurate location of the edges only if the initial contour is given
sufficiently closed to the edges because they make use only of the local information along the
contour. Estimating a proper position of initial contours without prior knowledge is a difficult
problem. Also, classic snakes cannot detect more than one boundary simultaneously because
the snakes maintain the same topology during the evolution. That is, snakes cannot split into
multiple boundaries or merge from multiple initial contours.

For that reason more authors discuss approach using the level set method, which was first
proposed by Osher and Sethian [7], [13]. A formulation to implement active contours in this way,
was proposed by Osher and Sethian ( [11], [7]). Level sets represent a contour implicitly via a
two-dimensional Lipschitz-continuous function ϕ(x, y) : Ω → R defined on the image plane Ω.
This function results by solving a differential equation. This equation is digitized into a finite


Erosion based Method for Quantification of Facial Palsy 419

differences equation, which describes the frontier of the domain by level sets.

The model we use in this paper takes into account the pixel by pixel description of the image,
originating in the fundamental image analysis literature ( [8]). Each image is represented as a
n × m matrix, where n is the number of pixels on horizontal and m is the number of pixels on
the vertical of the screen. So, the image is a subset of a network Z2(h) = {(ih, jh)|i, j ∈ Z2}
over a rectangle from R2, of sides equal to the dimensions of the computer screen. The number
h is given by the selected resolution, representing the length of the side of a pixel (without loss
of generality, a pixel is considered to be a square). So, the structure in which we work is that
of a totally bounded metric space embedded into the linear space R2. Also, we consider it as
a topological space, with the topology induced by the metric. The number h is considered to
be the measurement unit for computing and decision making and it is finally converted into the
desired unit according to the requirement of the user.

A subset of an image, which is the object of our computation procedure, is the digitization
of a contractible topological space from mathematical point of view, i.e. a topological space that
reduces to a point by means of a homotopy. The following concepts and properties are necessary
in order to describe the procedures. Let X ⊆ R2 a contractible topological space. We consider,
without loss of generality, the digitization of the space X by means of the method of digitization,
f : R2 → Z2,

f(x, y) = ([x +
1

2
], [y +

1

2
]), (x, y) ∈ R2.

If x(ih, jh) ∈ Z2(h) is a grid point then its 4-neighborhood is the set V4(x) = {x, (ih, (j −
1)h), (ih, (j +1)h), ((i−1)h, jh), ((i+1)h, jh)} and its 8-neighborhood is the set V8(x) = V4(x)∪
{((i − 1)h, (j − 1)h), ((i + 1)h, (j − 1)h), ((i − 1)h, (j + 1)h), ((i + 1)h, (j + 1)h)}. The concepts
of connectivity used in this paper are the 4-connectivity (i.e. defined using V4(x) by horizontal
and vertical lines) and the 8-connectivity (i.e. defined using V8(x) by horizontal, vertical, at 45◦

and at 135◦ lines) by means of arcs, as defined in [8].

Proposition 1. The digital image f(X) of a contractible topological space X ⊆ R2 is a 8-
connected set.

Proof. If X is contractible then it is path connected. Denote by S = f(X) ⊂ Z2(h)
and consider the 8-connectivity in S. Suppose that S has an isolated point a = (kh, lh) ,
k ∈ N, l ∈ N. Let us consider the pixel having the center (kh, lh) = f(a), i.e. the square
D = [kh − h/2, kh + h/2) × [lh − h/2, lh + h/2). There is at least a point from X situated
in the rectangle D, say point b. Therefore f(b) = a. Since a is an isolated point of S, The
squares of side h and centers ((k − 1)h, (l − 1)h), ((k − 1)h, lh), ((k − 1)h, (l + 1)h), (kh, (l − 1)h,
(hk, (l +1)h), ((k +1)h, (l −1)h), ((k +1)h, lh), ((k +1)h, (l +1)h), i.e. the 8-neighbors of a, are
not in S = f(X). So, none of these squares contain points of X. Therefore, X is not connected,
which contradicts the hypothesis that space X is contractible. ⋄

Let S ⊂ Z2 a finite subset. After removing a point x = (i, j) ∈ S from S the number of
connected components of S or of Z2 \ S may increase or decrease. Let us suppose that S is
4-connected (the case of the 8-connectivity is similar). A. Rosenfeld proved in [8] that if S ⊂ Z2
is a finite subset, S \ {x} does not have less 4-connected components than S if and only if
V4(x)∩S ̸= ∅. Also, Z2 \(S \{x}) has not more 8-connected components than Z2 \S if and only
if V8(x) ∩ (Z2 \ S) ̸= ∅. If x ∈ S such that V4(x) ∩ S is contained in a 4-connected component
of V8(x) ∩ S, then S \ {x} does not have more 4-connected components than S. More, if Z2 \ S
does not have more 4-connected components than S, then V4(x) ∩ S is included in a 4-connected
component of V8(x) ∩ S. As consequence, the following concept is introduced in [8].


420 M. Găianu, G. Cristescu, D. M. Onchiş

Definition 2. An element x ∈ S, which verifies the conditions: 1) V4(x) ∩ S ̸= ∅, 2) V8(x) ∩
(Z2 \ S) ̸= ∅, and 3) V4(x) ∩ S is included in a 4-connected component of V8(x) ∩ S, is called a
4-deletable element.

As consequence of the above mentioned properties, Rosenfeld has proven in [8] that:

Theorem 3. Let S ⊂ Z2 a finite set with the property that Z2 \ S has only one 8-connected
component. Then S \{x} has the same number of 4-connected components as S and Z2 \(S \{x})
has an unique 8-connected component if and only if x is 4-deletable.

In [8] the deletable elements of a set are classified into two classes: extremities and corners.
A point x ∈ S is called extremity if card(V4(x) ∩ S) = 1. A point x ∈ S is said to be a corner
if V4(x) ∩ S has exactly two neighbors of x in S. Every simple 4-connected set S having more
than one element has at least two deletable elements.
In case of the 8-connectivity, a point x ∈ S is said to be an extremity if V8(x) ∩ S has either
exactly one element, or two 4-neighboring elements, or three 4-neighboring elements out of which
only one is in V4(x). A point x ∈ S is said to be a corner if V8(x) ∩ S has exactly three, four or
five 4-neighboring elements, out of which only two are in V4(x).
The Diagnose Procedure consists in a succession of computer programs, as it follows:

i. Program for frames acquiring divides the succession of pictures taken from the face of
the patient into frames, called images. The film is of AVI (Audio Video Interleaved)
format, divided into frames having PNG (Portable Network Graphics) format, in order
to allow procession. The resolution of the film is 720 × 576 pixels, with 25fps (frames
per second). The frames are coded either on 8 bits (PNG-8) or on 24 bits (PNG-24),
including the transparency on 256 levels. The result is a collection of 4150 images, indexed
by i ∈ {1, 2, ..., 4150}.

ii. Program for detection of the mimic interest centers on patient’s face, and computation of
mutual distances between them. The pixel by pixel erosion algorithm below presents the
manner of processing each image in order to detect the coordinates of each central pixel
representing each marker on patient’s face. Also, the manner of validation and storage
of each marker is included. The search of these points is optimized in the algorithm
we cut the margins, which do not offer relevant information. In this manner we reduce
the number of color tests. Each center of marker is the center of an USAN area, which
is approximately ellipse (circle) shaped. The USAN area is detected based on pixel by
pixel color tests, detecting the contour and erasing each USAN area following the contour.
The erosion means changing the color from the artificial USAN area color into face color.
After determining the coordinates of centers of all USAN areas from an image, the mutual
distances between these pixels is computed and stored into a database. The database
contains all these distances taken from all the 4150 images from our collection.

iii. Program for computing the difference between an image and the initial one. Initially the
patient’s face is in relaxed position, which gives the so called initial image. In order to take
the other images from the collection, the patient is asked to perform various movements of
the face. The differences between the corresponding distances between the mimic interest
centers give the information on the existence of a displacement on the face from an image
to another one. The maximum positive difference and the minimum negative difference are
chosen from the database, together with the images in which they appear. These images
are presented to the physician, together with the numerical results, converted into required
units.


Erosion based Method for Quantification of Facial Palsy 421

iv. Decision on the movement variability is taken by analyzing the mutual distances and com-
paring them from an image to the next one. If the distance from a center, for example CN
to another one, for example RAN does not change and the rest of the distances from LAN
to LTR, CN, RAN are variable, then it follows that LAN is the moving point. It means
that the patient moved ones face. The decision is also stored for further investigations and
comparisons in order to assess the evolution of patient’s face mobility in various stages of
the treatment.

The algorithm 1 detects the markers LTR, RTR, CN and others. The marker is being searched
in a area of 40 × 60 pixels.

3 Numerical results

As a result of Operation 6 one obtains a collection of distinct points on the face of the patient.
Usually physicians measure the distance between each pair of these points using the rule, i.e.

using the Euclidean distance in two dimensions. This is the reason we use this distance in our
experiment. Increased accuracy may occur if another kind of metric is used, for example the
Earth Mover’s Distance. The Earth Mover’s Distance was introduced in the computer vision


422 M. Găianu, G. Cristescu, D. M. Onchiş

Figure 2: Detection of the points on the patient’s face

Table 1: Mutual distances between the interest points of face mimic from figure 2

Location of interest points Mutual distance
RTR↔RBP 55.569776
RTR↔RUE 49.497475
RTR↔RLE 42.438190
RTR↔RAN 49.091751
LTR↔LBP 53.150729
LTR↔LUE 46.400431
LTR↔LLE 40.718546
LTR↔LAN 46.097722
CN↔RUE 29.000000
CN↔RLE 31.016125
CN↔RAN 40.853396
CN↔LUE 23.000000
CN↔LLE 25.495098
CN↔LAN 34.014703
RBP↔LBP 61.008196
RBP↔RUE 14.764823
RUE↔LBP 55.317267
RUE↔RLE 11.000000
RLE↔RAN 30.413813
RLE↔LUE 53.150729
LBP↔LUE 12.165525


Erosion based Method for Quantification of Facial Palsy 423

community in 1997 by Rubner, Guibas and Tomasi [10], [9] to overcome inconsistencies with
perceptual similarity observed in (weighted) Lp -norms and quadratic forms.
The innovation of the paper consists in a more elaborated face model which allows quantifica-
tion of facial palsy. Based on this model, a new scoring system for mimic function recovery is
developed. More accurate results which are relevant for plastic surgery as well as for face models
in computer vision are obtained. Another advantage of the method described in this paper is
the real time data acquisition and processing. A fast high resolution real time 3D video system
is used to acquire data from patients. They are filmed and analyzed before and after any kind
of treatments in a standardized pre-treatment and follow-up scheme. If a patient is repeatedly
analyzed in the above described manner, a database is constructed, which exactly document
and visualize the recovery course of facial palsy. The system is able to recognize shape, texture,
movements and illumination conditions of the face and is automatic in acquiring and processing
patient’s data. This simplifies the process of measuring and - at the same time - enhance the
clinical methodology in terms of diagnostic and prognostic accuracy and convenience for patients.
For example, the mutual distances between the mimic interest points on the face from figure 2
are in table 1. They are presented to the physician to be used in medical diagnose process. In
this case the color to detect, i.e. the markers’ color, is green and the function to maximize is a
three dimensional vectorial one, obtained based on a normalized RGB description, in which the
first coordinate refers to green. The order relation in R3 is the lexicographic one.
An important consequence of the technique described in this paper is the automatic evaluation
of the facial palsy’s degree of the patient. The system is able to directly perform a 3D face
recognition thus supporting medical staff in exact planning the patient’s medical procedures.
The models allow to determine the position and movements of facial features with high accuracy,
avoiding model bias in fitting to particularities of individual faces and diseases. The procedure
described in this paper is setting up a high-end up to date standardized 3D facial palsy grading
system, which is been used to monitor and document post-treatments course for facial palsy
patients and to compare various surgical outcomes from different centers around the world. In
fact, we propose a software that may lead to performing a standardized grading system and a
comparison of 3D human face data.
The detection and tracking techniques may be extended from image data to video data. Since
video is more complex, even more efficient retrieval techniques need to be developed using a
multi-projection system as in [6], applying the powerful perception-oriented Earth Mover’s Dis-
tance to video data.

4 Conclusions and Future Works

Detection in large image or other multimedia databases highly depends on the underlying
similarity model. In future work, we plan to extend our detection and tracking concept tech-
niques from image data to video data. Since video is more complex, even more efficient retrieval
techniques need to be developed using a multi-projection system as in ( [6]).
We plan to employ generative face models (i.e., models that can produce synthetic face images)
which can be used for visualization of prognosis appearance of the patient’s mimic and facial
movements. Predictions of facial movements can be based on the statistical analysis of empir-
ical data such as the past progression (e.g., preoperative/postoperative appearance) of facial
movements of the subject or the class of subjects sharing similar pathologies. The ability to
generate synthetic images of a face can further be exploited for biofeedback training of mimic
musculature. Another important aspect in the future work will be the ability of quantifying and
assessing mimic function by spontaneous facial movements as well.


424 M. Găianu, G. Cristescu, D. M. Onchiş

Bibliography

[1] A. F. Abate, M. Nappi, D. Riccio, and G. Sabatino, 2d and 3d face recognition: A survey,
Pattern Recognition Letters, 28(14):1885 - 1906, 2007.

[2] M. Frey, A. Jenny, P. Giovanoli, and E. Stussi, Development of a new documentation system
for facial movements as a basis for the international registry for neuromuscular reconstruction
in the face, Plast Reconstr Surg, 93(7):1334-1349, 1994.

[3] M. Găianu, Contributions to the research of some problems of image recognition and retrieval
(in Romanian), Thesis, West University of Timişoara, 2011.

[4] P.L. Hammer, S. Rudeanu, Boolean methods in operations research and related areas,
Springer-Verlag, 1968.

[5] M. Kass, A. Witkin, and D. Terzopoulos,Snakes, active contour model, International Journal
of Computer Vision, 321-331, 1988.

[6] D. Onchis and H. Feichtinger. Constructive reconstruction from irregular sampling in multi-
window spline-type spaces, General Proceedings of the 7th ISAAC Congress, London, 2010.

[7] S. Osher and J. A Sethian. Fronts propagating with curvature-dependent speed: Algorithms
based on hamilton-jacobi formulations, Journal of Computational Physics, 79(1):12 - 49, 1988.

[8] A. Rosenfeld, Conectivity in Digital Pictures, J. ACM, 17(1):146-160, 1970.

[9] Rubner, Y., Guibas, L. and Tomasi, C. The Earth Mover’s Distance, Multi-Dimensional Scal-
ing, and color based image retrieval. In Proc. of the ARPA Image Under- standing Workshop,
661-668, 1997.

[10] Rubner, Y. and Tomasi, C., Perceptual Metrics for Image Database Navigation, Kluwer
Academic Publishers, 2001.

[11] J. A. Sethian, A fast marching level set method for monotonically advancing fronts, Proceed-
ings of the National Academy of Sciences of the United States of America, 93(4):1591-1595,
1996.

[12] J. A. Sethian. Level set methods and fast marching methods - evolving inter- faces in compu-
tational geometry, fluid mechanics, computer vision, and materials science. In Interfaces in
Computational Geometry, Fluid Mechanics, Computer Vision, and Materials Science. Cam-
bridge University Press, 1998.

[13] J. Sethian, Level Set Methods and Fast Marching Methods. Cambridge Monographs on Ap-
plied and Computational Mathematics, Cambridge University Press, 2nd ed., 1999.

[14] S.M. Smith, J.M. Brady, SUSAN - a new approach to low level image processin, Int. Journal
of Computer Vision, 23(1):45-78, 1997.