ap-4-10.dvi


Acta Polytechnica Vol. 50 No. 4/2010

Analysis of Feature Point Distributions for
Fast Image Mosaicking Algorithms

A. Behrens, H. Röllinger

Abstract

In many algorithms the registration of image pairs is done by feature point matching. After the feature detection is
performed, all extracted interest points are usually used for the registration processwithout further feature point distribution
analysis. However, in the case of small and sparse sets of feature points of fixed size, suitable for real-time image mosaicking
algorithms, a uniform spatial feature distribution across the image becomes relevant. Thus, in this paper we discuss and
analyze algorithms which provide different spatial point distributions from a given set of SURF features. The evaluations
show that a more uniform spatial distribution of the point matches results in lower image registration errors, and is thus
more beneficial for fast image mosaicking algorithms.

Keywords: feature detection, point distribution, image registration, image mosaicking.

1 Introduction
Alignmentandstitchingof images into seamlessphoto-
mosaics is most widely used in computer vision [1].
Besides the approach to minimize pixel-to-pixel dis-
similarities directly, a common way to combine im-
age pairs into one image composition is performed us-
ing only sparse sets of interest points. Here, distinc-
tive feature points are first extracted and described
from the images, and then matched using a similar-
ity measurement. Many different feature-based algo-
rithms, like SIFT [2], SURF [3], GLOH [4], MOPs [5]
or [6] have been proposed for extracting distinctive
image features. Their robustness, repeatability and
invariance to different illumination changes and im-
age transformations have also been widely evaluated
by [4, 7, 8, 9].
On the one hand, a high level of distinctiveness of

the features, expressed by their descriptors and the
filter response values of the detector, leads to robust
matching and precise image alignment. On the other
hand, these characteristics alone do not always result
in a low registration error, since the spatial feature
distribution in the images is also relevant. Images
which showonly a high contrast in local image regions
(see Fig. 1) usually create only clusters of strong fea-
tures and provide an overall non-uniform spatial fea-
ture distribution. Since the estimation of a global im-
age transformationbetweenagiven imagepair is based
on matched feature points, image regions with many
interest points will create only small registration er-
rors, whereas the error in regionswith feature clusters
becomes larger. Thus, only the combinationofdistinc-
tive and also spatially well uniformed distributed fea-
ture points leads to good global image transformation
with low registration errors. While in many natural

image scenes feature points can usually be extracted
across the whole image, spatial distribution is often
a relevant consideration for medical images, e.g. en-
doscopic images of the internal urinary bladder wall.
These images often show only sparsely located struc-
tures with high contrast, e.g. vasculature or lesions
(see Fig. 1), and thus impede robust image mosaick-
ing.

Fig. 1: Left: Landscape with a high contrast region in
the lower right corner. Right: A noisy and low contrast
endoscopic bladder image showing a lesion

During cystoscopy, where a rigid endoscope is in-
troduced into the bladder, an inspection and analysis
of bladder cancer, the 4th most common disease with
about 52810 new cases among males in the United
States estimated in 2009 [10], is performed by a physi-
cian based on endoscopic images captured by a video
endoscope system. Since sufficient illumination of the
bladderwall is onlyprovided if the endoscope is guided
close to thebladderwall, the field of view is very small.
This results indifficulties in orientationandnavigation
for the physician. To provide navigation assistance
during cystoscopy, a feature-based mosaicking algo-
rithm composing local or global panoramic overview
images from single video images has been proposed

12


Acta Polytechnica Vol. 50 No. 4/2010

by [11, 12, 13, 14]. For this application, a fixed num-
ber of feature points is required for real-time process-
ing [14]. It is therefore necessary to generate an ade-
quate sample set of the extracted features list for ro-
bust image mosaicking.
While some registration algorithms have been

developed that create feature distributions based
on local non-maximum suppression [5] or space-
partitioning [15], these different techniques have not
been evaluated. Thus, in this paper we analyze these
different feature distribution algorithms and evaluate
their average image registration errors.
The paper is organized as follows: In section 2 the

mosaicking algorithm is described. The distribution
algorithms for selecting a feature sample set are de-
scribed in section 3. Section 4 presents the evaluated
data, and the conclusions and futurework are given in
section 5.

2 Mosaicking algorithm

In the following sections only a brief overview is given
of the mosaicking algorithm. A more detailed de-
scription can be found in [12, 14]. During the mo-
saicking process, image pairs of a captured video se-
quence are sequentially stitched and blended to com-
pose a successively growing panoramic overview im-
age. When endoscopic images are used, the lens dis-
tortions of the fish-eye optics are compensated in a
preprocessing step. Then, distinctive feature points
are extracted and described by their local neighbor-
hood region. Based on these sparse feature sets, point
correspondences arematched between two subsequent
images. Thus, a global image-to-image transforma-
tion, a so-called homography, is estimated. The two
images are registered by applying the transformation.
Finally, the overlap regions are blended by an interpo-
lation algorithm. Iteratively, all images of the video
sequence are then sequentially processed to build the
final panoramic overview image.

2.1 Feature detection

Feature extraction is performed by the Speeded Up
RobustFeatures (SURF) detector [3]. Based ona very
basic approximationof theHessianmatrix, the feature
strength is calculated by

det(H)=
∥∥∥∥∥
[

Lxx Lxy

Lxy Lyy

]∥∥∥∥∥ ≈
DxxDyy − (0.9 · Dxy)2,

(1)

where Lxx represents the convolution of the Gaussian

second order derivative
∂2

∂x2
G with the input image.

Instead of iteratively reducing the image size, integral
images and scalable box filters Dxx, Dyy, Dxy are used
to speed up the blob filter response. The distinctive

feature points are localized in the space domain and
over scales by non-maximum suppression. Feature de-
scriptors are then extracted ina square regioncentered
around the point of interest. Split up into smaller
4 × 4 square sub-regions, the Haar wavelet responses
in horizontal hx and vertical direction hy are calcu-
lated and composed to a four-dimensional descriptor

vector �d4D =
(∑

hx,
∑

hy,
∑

|hx|,
∑

|hy|
)
. The

concatenation of the sixteen sub-regions then results
in a descriptor vector �d of length 64.

2.2 Matching and registration

Thepoint correspondencesbetween two imagesarede-
termined by matching the SURF feature descriptors.
Based on the least similarity measurement

Δ�dij =min
j

∣∣∣∣�di − �dj∣∣∣∣2, (2)
a feature descriptor �di of the first image is compared
to all feature descriptors �dj of the second imagewithin
the 64 dimensional feature space, and vice versa. The
minimum squared error Δ�dij then leads to the best
point match �di ↔ �dj. To increase the distinctive-
ness and robustness of the point correspondences, only
matches with the ratio

Δ�dij

Δ�d2ndij
< τ (3)

between the best and the second best match are con-
sidered.
After single points of an image pair are matched,

the homography H is estimated. The applied affine
transformationmodel provides six degrees of freedom,
parameterized by a translation vector �tT = (tx, ty),
rotation angle α, scales sx and sy, and skew a. In
homogeneous coordinates, the homography matrix H
can be written as

H=

⎡
⎢⎢⎣

a b c

d e f

0 0 1

⎤
⎥⎥⎦ =

[
A �t
�0T 1

]
(4)

with

A=

[
1 a

0 1

][
sx 0

0 sy

][
cos(α) −sin(α)
sin(α) cos(α)

]
. (5)

To ensure robust homography estimation, unavoid-
able false point correspondences are rejected by the
RANSAC (RANdom SAmple Consensus) model fit-
ting algorithm[16]. Basedon iterativelyandrandomly
selected point correspondences �p, a homography Ĥ is
estimated and the number of inliers is determined by
the threshold operation∣∣∣∣�pi − Ĥ · �pj∣∣∣∣2 < d. (6)

13


Acta Polytechnica Vol. 50 No. 4/2010

If point correspondences �pi ↔ �pj satisfy eq. 6 with es-
timation Ĥ and the given pixel distance d, they are
marked as inliers. Finally, the estimated homography
Ĥ with the highest number of inliers is selected as the
final image transform and is used for registration by
warping the second image into the reference system of
the first image.

3 Feature selection
Since the computational complexity of the SURF de-
tector increases linearlywith number of featurepoints,
the limitation to a fixed number is often desired for
real-time mosaicking algorithms [14]. Also calculat-
ing of the feature descriptors is computationally more
intensive than locating the feature and the filter re-
sponse value itself (eq. 1). Thus, feature extraction
is separated into two steps. First, the SURF feature
detector is applied to the whole image to detect the
locations and the strengths of potential interest points
based on the filter response values. Then a sample
set with a fixed number of the points is chosen from
all the point candidates. To assure constant compu-
tational complexity, only these features are then char-
acterized by the 64-dimensional SURF descriptor and
passed to the matching process. Since the usage of
fewer feature points leads to less robust homography
estimations and registration results, the selection of an
adequate and representative sample set is very impor-
tant.

3.1 Top N selection

A straightforward approach is to select the first N-th
strongest feature points from the whole set of interest
points. For each feature point, the location and the
response value of the blob filter is calculated accord-
ing to eq. 1, and is used for comparison. After the
extracted interest points have been sorted in descend-
ing order of their filter response values, the first N-th
feature points are selected. Since this algorithmgener-
ates only a sample set based on the filter response, no
additional information about the spatial distribution
of the feature points is exploited.

3.2 Adaptive non-maximal
suppression

To select a fixed number of interest points from im-
ages which are local maxima and whose response val-
ues are also significantly greater than those of all of
their neighborswithin radius r, Brown et al. [5] devel-
oped an adaptive non-maximal suppression (ANMS)
strategy. Conceptually, this can be performed by ini-
tializing the suppression radius r =0andthen increas-
ing it until the desired number of N feature points is
obtained. In practice, this operation can be done by
first sorting all local maxima by their response values,

and then creating a second list sorted by decreasing
suppression radius. The first entry in the list repre-
sents the global maximum, which is not suppressed
at any radius. As the radius decreases from infinity,
feature points are added to the list. To increase the
robustness, a second constraint is defined, which re-
quires that a neighbor feature has sufficiently greater
strength. Thus, the minimum suppression radius ri is
determined by

ri =min
j

|�xi − �xj|,

with f(�xi) < c · f(�xj), �xj ∈ I,
(7)

where �xi describes the position (x, y) of the feature
point, f(�xi) its strength, and I the set of all fea-
ture point positions. Theparameter c =0.9 represents
the robustness factor, which adjusts how significantly
greater the strength of a neighbor feature must be for
suppression to take place. For each feature point its
radius is then determined by eq. 7, and the feature list
is sorted by the radius in descending order. Then, the
first N entries of the list are selected. This sample set
now provides features which are spatially distributed
across the whole image (cf. Fig. 4 (right column)).

3.3 k-d tree partitioning

Another selection method that considers the spatial
data distribution is based on space-partitioning of
high-dimensional data sets. Cheng et al. [15] devel-
oped an algorithm using a 2-dimensional k-d tree.
Here, the feature points are separated into M rect-
angular image regions and cells, respectively. In a re-
cursivemanner, eachpartition cuts the regionwith the
current highest variance in two, using themediandata
value as the dividing line. The position of the divid-
ing line is stored as the node’s data. For each non-
terminal node of the binary tree, the feature points of
the related region are then handed over to the left and
right child node, respectively, until the given number
of M cells is obtained. Finally, each leaf node con-
tains a list of the features which are located within
the related image partition. An example is given in
Fig. 2. From each cell �N/M�, the strongest features
are selected as the output sample set, cf. Fig. 4 (mid-
dle column). While Cheng [15] uses a balanced tree,
which limits the number of leaf nodes to be a power of
two,we have extended the algorithm to handle any in-
teger number of points. Since only one feature is taken
from each cell, the partitions are divided until the de-
sired number of features N is obtained. Thus, a direct
comparisonbetween k-d tree partitioning,ANMS, and
top N selection becomes more feasible.

4 Evaluation and results
The three feature distribution algorithms (top N,
ANMS, and k-d tree) are evaluated using three data

14


Acta Polytechnica Vol. 50 No. 4/2010

0  

5
 

4

3

2

1

 1 2 3 4 5

x = 2.6 

y = 2.275 y = 2 

x = 1.1 x = 1 x = 4.1 x = 4.15 

(0.8 2.1) (1.4 0.2) (0.5 4.2) (1.5 3.9) (3.7 1.4) (4.5 0.9) (4.8 3.2)(3.5 3.5)

Fig. 2: Above: 2-dimensional k-d tree partitions with fea-
ture points. Below: k-d tree

tree-clinic

houses

bladder

Fig. 3: Data sets used for evaluation

sets (Fig. 3) which show varying contrast across the
whole image. Each image pair represents one scene
from two slightly different point of views, but still pro-
viding a large overlap region. Since the images are not
synthesized, no ground truthdata of the correct image
composition is given. Thus, for each image pair con-
trol points �p′i ↔ �p′j are manually set and matched.
Based on these points, a homography is also estimated
according to eq. 4, and the registration error

e =
1
2

(∣∣∣∣�p′i −Ĥ · �p′j∣∣∣∣2 + ∣∣∣∣�p′j −Ĥ−1 · �p′i∣∣∣∣2
)
(8)

is calculated. These error values are used as references
in later evaluations.
To evaluate different feature distributions, SURF

features are first extracted and listed for each data
set. Each algorithm of section 3 is then applied to the
point lists to generate a sample set of only N feature
points, resulting in different point distributions. Rep-
resentative examples for each data set are shown in
Fig. 4. The figure shows, that the selection of the top
N strongest features leads to single point clusters in
all three data sets. In example, many SURF features
show a high filter response value in the tree region of
the tree-clinic sequence, since the dark branches have
a high contrast against the bright background. By
contrast, the k-d tree and also the ANMS algorithm
provide amore spatiallyuniformdistributed set of fea-
ture points across the whole image and the relevant
overlap regions, respectively. As shown in Fig. 4(c),
more feature points are also located in the vasculature
in the image center in the case of k-d tree partition-
ing and ANMS. Since this region is also present in
the overlap region of the bladder sequence (cf. Fig. 3),
consideration of these features during the homography
estimation should provide a small registration error.
After the different feature lists have been gener-

ated, they are passed to the matching process, and
a global image-to-image transformation for each sam-
ple set is estimated, as described in section 2.2. For
a quantitative evaluation, the homographies are then
applied to the ground truth control points �p′i ↔ �p′j
and the final registration errors are calculated using
eq. 8. Fig. 5 shows the reproduction error character-
istics for each distribution method over the selected
number of feature points. Since the RANSAC algo-
rithm provides robust point matches and rejects out-
liers on the basis of a randomprocess,mean error val-
ues are calculated. Thus, Fig. 5 shows the averaged
registration errors and the number of inliers over the
number of feature points for eachdata set. The results
of the tree-clinic and houses sequence are averagedby
15 measurements, and the graphs of the bladder se-
quence are based on 20 measurements.
As can be seen from the characteristics, the mean

registration errors calculated from feature points dis-
tributed by k-d tree partitioning and ANMS are usu-
ally smaller than with the top N method. Especially
for a small number of points, the error difference in-
creases. At higher numbers, the different algorithms
provide almost the same registration errors. This is
obvious, since a larger point set usually leads to a spa-
tial distributionwith a higher variance. The results of
the tree-clinic sequence also show that robust match-
ing of the first N strongest features is not feasible until
at least aminimum number of N =210 feature points

15


Acta Polytechnica Vol. 50 No. 4/2010

tree-clinic sequence with 200 features points

houses sequence with 50 features points

bladder sequence with 80 features points

Fig. 4: For each data set top N selection (left), k-d tree partitioning (middle), and ANMS (right) is performed

is obtained, but still resulting in high error. By con-
trast, the variance of the mean errors of the k-d tree
andANMS sample sets ismuch smaller. Although the
registration errors of k-d tree partitioning and ANMS
are almost equal in all image sequences, the number of
inliersused for robusthomographyestimation ishigher
in the case of ANMS. Against this, the complexity of
ANMS with up to O((N − 1)!) is much higher than
O(logN)) of the k-d tree. Thus, the integration of
the k-d tree partitioning in fast and real-time image
mosaicking algorithms like [14] should be preferred.

5 Conclusions
An evaluation of the ANMS and k-d tree algorithms,
which provide a spatially well uniformed feature dis-
tribution, has shown that the mean error of an im-
agepair registrationcanbe greatly reduced, compared
to a sample set based on strongest feature selection.
This difference becomes significant in images show-
ing regions of different varying contrast. For appli-

cations that require a fixed or limited number of inter-
est points, e.g. real-time image mosaicking algorithms
for medical computer assistance systems, the usage of
feature distribution algorithms is also beneficial. In
future work, further evaluationswill be made onmore
medical imagedata sets and timemeasurements of the
distribution algorithms integrated in a real-time image
mosaicking algorithm for fluorescence endoscopy. The
generation of feature point sets based on an adaptive
algorithm considering the feature strengths and their
distribution according to the current image informa-
tion will also be analyzed.

Acknowledgement

We would like to thank Prof. Dr.-Ing. Til Aach, Insti-
tute of Imaging andComputerVision, RWTHAachen
University, Germany, for supervising this project.
Additionally we would like to thank Olympus &

Winter Ibe GmbH for providing the endoscopic image
sequences.

16


Acta Polytechnica Vol. 50 No. 4/2010

tree-clinic

houses

bladder

Fig. 5: Mean registration errors (black) and numbers of inliers (red) of each distribution method (Top N, ANMS, k-d Tree)
over the number of feature points. The registration errors using the ground truth homography are highlighted in green

17


Acta Polytechnica Vol. 50 No. 4/2010

References

[1] Szeliski, R.: Image Alignment and Stitching: A
Tutorial, Microsoft Research, 2006, Tech. Rep.
MSR-TR-2004-92.

[2] Lowe, D.: Distinctive image features from scale-
invariant keypoints, In International Journal of
ComputerVision, 2004, vol.60, no. 2, pp. 91–110.

[3] Bay, H., Ess, A., Tuytelaars, T., Van Gool, L.:
SURF: Speeded Up Robust Features, In Com-
puter Vision and Image Understanding (CVIU),
2008, vol. 110, no. 3, pp. 346–359.

[4] Mikolajczyk,K., Schmid,C.: Aperformanceeval-
uation of local descriptors, In Pattern Analysis
andMachine Intelligence, IEEETransactions on,
2005, vol. 27, no. 10, pp. 1615–1630.

[5] Brown, M., Szeliski, R., Winder, S.: Multi-Image
Matching using Multi-Scale Oriented Patches,
In IEEE Conference on Computer Vision and
Pattern Recognition (CVPR’05), 2005, vol. 1,
pp. 510–517.

[6] Shi, J., Tomasi, C.: Good Features to Track, In
IEEE Conference on Computer Vision and Pat-
tern Recognition (CVPR’94), 1994, pp. 93–600.

[7] Dorko, G.: Selection of Discriminative Regions
and Local Descriptors for Generic Object Class
Recognition, PhD thesis, 2006.

[8] Schmid, C., Mohr, R., Bauckhage, C.: Evalua-
tion of Interest Point Detectors, In International
Journal of Computer Vision, 2000, vol.37, no. 2,
pp. 151–172.

[9] Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zis-
serman,A.,Matas, J., Schaffalitzky,F.,Kadir,T.,
VanGool, L.: AComparisonofAffineRegionDe-
tectors, In Int. J. Comput. Vision, 2005, vol. 65,
no. 1–2, pp. 43–72.

[10] AmericanCancer Society: Cancer Facts and Fig-
ures 2009, 2009.

[11] Miranda-Luna, R., Daul, C., Blondel, W.,
Hernandez-Mier,Y.,Wolf,D., Guillemin, F.: Mo-
saicking of Bladder Endoscopic Image Sequences:
Distortion Calibration and Registration Algo-
rithm, In Biomedical Engineering, IEEE Trans-
actions on, 2008, vol. 55, no. 2, pp. 541–553.

[12] Behrens, A.: Creating Panoramic Images for
Bladder Fluorescence Endoscopy, In Acta Poly-
technica Journal of Advanced Engineering, 2008,
vol. 48, no. 3, pp. 50–54.

[13] Behrens,A., Stehle,T.,Gross, S.,Aach,T.: Local
and Global Panoramic Imaging for Fluorescence
Bladder Endoscopy, In Engineering in Medicine
and Biology Society, EMBC 2009. 31th Annu.
Int. Conf. of the IEEE, 2009, pp. 6990–6993.

[14] Behrens, A., Bommes, M., Stehle, T., Gross, S.,
Leonhardt, S., Aach, T.: A Multi-Threaded
Mosaicking Algorithm for Fast Image Composi-
tion of Fluorescence Bladder Images, In Proceed-
ings SPIE Medical Imaging 2010: Visualization,
Image-Guided Procedures, and Modeling, 2010,
vol. 7625.

[15] Cheng, Z., Devarajan, D., Radke, R. J.: Deter-
mining vision graphs for distributed camera net-
worksusing feature digests, InEURASIPJournal
on Adcances in Signal Processing, 2007, Article
ID 57034.

[16] Fischler, M. A., Bolles, R. C.: Random Sample
Consensus: A Paradigm for Model Fitting with
Applications to Image Analysis and Automated
Cartography,Communications of theACM, 1981,
vol. 24, no. 6, pp. 381–395.

About the authors

Alexander BEHRENS was born in Bückeburg,
Germany in 1980. He received his Dipl.-Ing. degree
in electrical engineering from theLeibnizUniversity of
Hannover, Hannover, Germany, in 2006. After receiv-
ing the degree, he worked as a Research Scientist at
the university’s Institut für Informationsverarbeitung.
Since 2007, he hasbeen aPh.D. candidate at the Insti-
tute of Imaging andComputerVision, RWTHAachen
University, Aachen, Germany. His research interests
are inmedical imageprocessing, signalprocessing,pat-
tern recognition, and computer vision.

HendrikRÖLLINGERwasborn in1983 inAachen,
Germany. He has been studying Computer Engineer-
ing at RWTH Aachen University since 2003, and will
graduate with a Dipl.-Ing. degree in 2010.

Alexander Behrens
Hendrik Röllinger
E-mail: alexander.behrens@lfb.rwth-aachen.de,
hendrik.roellinger@lfb.rwth-aachen.de
Institute of Imaging & Computer Vision
RWTH Aachen University
52056, Aachen, Germany

18