AP08_3.vp


1 Introduction
Cancer of the urinary bladder is the fourth most common

malignancy among males and one of the top eight cancers for
women in industrial countries. According to the American
Cancer Society, 68,810 new cases and 14,100 deaths are esti-
mated in 2008 in the United States [1]. Bladder cancer tends
to occur most commonly in individuals over 60 years of age.
The major risk factors are cigarette smoking and exposure to
aromatic amines, used in the chemical and dye industry.

Bladder cancer can be diagnosed during a cystoscopy, in
which an endoscope is introduced through the urethra into
the bladder, which is filled with isotonic saline solution. Malig-
nant tissues of the bladder wall can then be removed with the
use of endoscopic tools, e.g. a resectoscope cutting loop.

In white light illumination, small and flat tumors, whose
structures do not differ strongly from the surrounding tissue
are difficult to recognize and could thus be overlooked during
the therapy. To reduce this risk, the visualization of tumor tis-
sue can be improved by a photodynamic diagnosis (PDD) sys-
tem. This technology uses imaging with fluorescent light,
which is activated by the marker substance 5-aminolaevulinic
acid (5-ALA), accumulated in malignant tissue. Thus, the con-
trast between tumor and benign tissue is enhanced and per-
mits easier differentiation, as illustrated in Fig. 1.

A common disadvantage during an endoscopy is the lim-
ited field of view of the endoscope. The physician can exam-
ine only a small part of the whole operating field at once. This

causes difficulties in navigation, especially in hollow organs.
Instead, a panoramic image provides an overview of the
whole region of interest and links images taken from different
angles of view. This additional information facilitates visual
control and navigation, especially during a cystoscopy, and
can be documented in medical evidence protocols.

We have therefore developed an image mosaicing algo-
rithm, which stitches single images of a PDD bladder video
sequence and finally provides an expanded panoramic image
of the urinary bladder.

This paper is organized as follows: In section 2 we discuss
the panorama algorithm in detail. Further optimizations are
given in section 3. Section 4 describes the results and per-
spectives of the algorithm. Finally, section 5 summarizes the
proposed approach of our image mosaicing algorithm for
endoscopic PDD images.

2 Image mosaicing algorithm
The image mosaicing algorithm processes single endo-

scopic PDD images provided by a video sequence. First, in a
preprocessing step we separate the relevant image informa-
tion of the input images. Then the SIFT features [4] of two
images are detected and matched. To refine the feature point
correspondences we adapt and apply the RANSAC algorithm
[7] to reject outliers. Subsequently we stitch the two images
together and interpolate the overlapping region using a lin-
ear cross blending method. Then we apply our algorithm
iteratively to the next input images. Finally a complete
panoramic image of the bladder is built.

2.1 Image acquisition
During a cystoscopy the endoscopic images, showing the

internal urinary bladder wall, are captured by a PDD video
cystoscopy system. In this process the bladder wall is illumi-
nated by a PDD light source. An external camera is attached
at the tail end of the rigid cystoscope, as shown in Fig. 2, and
captures video images with a resolution of 720×576 pixels at
a frame rate of 25 frames per second. The video frames are
transmitted to the computer video system and are processed.

50 ©  Czech Technical University Publishing House http://ctn.cvut.cz/ap/

Acta Polytechnica Vol. 48  No. 3/2008

Creating Panoramic Images for Bladder
Fluorescence Endoscopy
A. Behrens

The medical diagnostic analysis and therapy of urinary bladder cancer based on endoscopes are state of the art in urological medicine. Due
to the limited field of view of endoscopes, the physician can examine only a small part of the whole operating field at once. This constraint
makes visual control and navigation difficult, especially in hollow organs. A panoramic image, covering a larger field of view, can
overcome this difficulty. Directly motivated by a physician we developed an image mosaicing algorithm for endoscopic bladder fluorescence
video sequences. In this paper, we present an approach which is capable of stitching single endoscopic video images to a combined panoramic
image. Based on SIFT features we estimate a 2-D homography for each image pair, using an affine model and an iterative model-fitting
algorithm. We then apply the stitching process and perform a mutual linear interpolation. Our panoramic image results show a correct
stitching and lead to a better overview and understanding of the operation field.

Keywords: Image mosaicing, stitching, panoramic, bladder, endoscopy, cystoscopy, fluorescence, photo dynamic diagnosis, PDD.

White light PDD

Fig. 1: Papillary tumors in different illuminations


2.2. Image preprocessing
In a preprocessing step we subsample the images by a fac-

tor of four to reduce computational time and the resolution of
the final panoramic image. Then we separate the relevant
image information within the elliptical shape from the sur-
rounding dark image region of the input images (see Fig. 3),
using Otsu’s thresholding method [2].

Thus, we transform the RGB input image to a gray value
image and calculate a binary mask, which represents the two
classes elliptical and surrounding region. Otsu’s algorithm is
a thresholding method for separating two classes of pixels so
that their between-class variance � b

2 is maximal. The optimal
threshold �t is then determined by

� �b
t T

bt t
2

0
2( ) max ( )� �

� �
(1)

with
� � � � �b t t t t t

2
1 2 1 2

2( ) ( ) ( ) ( ( ) ( ))� � (2)

and the two class probabilities �1, �2 and their mean levels �1,
�2.

After a further eroding operation we extract the elliptical
region using the binary mask, as shown in Fig. 3.

2.3 Feature detection
Common stitching algorithms are based on registrations

methods, which can be categorized into pixel-based align-
ment, feature-based methods, and global registration [3].
In this project we have chosen a feature-based method, since
the PDD bladder images generally show a high-contrast vas-
cularization structure. Furthermore it allows a fast stitching

process, since only single feature points have to be matched
instead of large pixel blocks.

According to the situation, these image structures can also
vary in scale during the video sequence, e.g. caused by a
zoom, we use distinctive scale invariant keypoints, calculated
by the Scale Invariant Feature Transform (SIFT) [4]. SIFT
features are located by detecting the local extrema of a differ-
ence-of-Gaussian (DoG) function. This closely approximates
the scale-normalized Laplacian of Gaussian (LoG), which
is introduced by Lindeberg [5] for effective scale-invariant
feature detection. Mikolajczyk [6] showed that this extrema
detection generally provides the most stable image features,
compared to a range of other possible image functions, such
as the gradient, Hessian, or Harris corner function.

The relationship between DoG and LoG can be under-
stood from the heat diffusion equation:

�

��
�

G
G� � 2 , (3)

�
�

��

� �

� �
� � �

�

�
2G

G G x y k G x y
k

( , , ) ( , , )
(4)

	 � � � �G x y k G x y k G
DoG LoG

( , , ) ( , , ) ( )� � �
� ���� ���� ���

1 2 2 . (5)

Eq. (5) shows that the DoG function with a constant scale
difference k approximates the �2 scale-normalized LoG multi-
plied by a constant factor (k � 1), which does not influence the
extrema detection. After the features are localized by a maxi-
mum operation over space and scale, a further refinement
step is applied. Low contrast points and keypoints along
edges are rejected, since they are unstable to small amounts of
noise. Thus, the ratio of the eigenvalues of the 2×2 Hessian
matrix

H �


�
�


�
�

D D
D D

xx xy

xy yy
, (6)

based on the second derivations of the DoG image D(x, y, �) is
calculated and compared to a threshold r. Keypoints with a
large ratio, which means having a large principal curvature
across the edge but a small one in the perpendicular direc-
tion, are thus rejected. A result of feature detection is shown
in Fig. 4.

2.4. Feature matching
Subsequent to the feature localization, described in the

preceding section, the feature points of two images are
matched. A robust matching algorithm requires distinctive
keypoints. So we calculate for each feature point a rotation

©  Czech Technical University Publishing House http://ctn.cvut.cz/ap/ 51

Acta Polytechnica Vol. 48  No. 3/2008

Fig. 2: Principle setup of a PDD video cystoscopy system, showing
a cystoscope introduced into the urinary bladder with its
limited field of view (FOV). The fluorescence PDD light
source provides the illumination and a camera at the tail
of the rigid cystoscope transmits the captured image data
to a computer video system.

Fig. 3: Left: Original input image, right: Binary mask (transpar-
ent overlay)

Fig. 4: SIFT features located in image one (left) and image two
(right)


invariant SIFT descriptor [4]. The SIFT descriptor takes into
account the local neighborhood of the Gaussian smoothed
image L x y( , ) by calculating the gradient magnitude m x y( , )
and orientation �( , )x y , accumulated in a histogram, as illus-
trated in Fig. 5, by

� � � �m x y L x y L x y L x y L x y( , ) ( , ) ( , ) ( , ) ( , )� � � � � � � �1 1 1 12 2

(7)

�( , ) arctan
( , ) ( , )
( , ) ( , )

x y
L x y L x y
L x y L x y

�
� � �

� � �

�

�
�

1 1
1 1�

�

�
��. (8)

The use of 8 orientations and a 4×4 array of histograms
leads to a distinctive 128 element feature vector. After each
feature point is described by its local descriptor, we apply the
matching process using the minimum Euclidean distance
measurement

� � �d d dl
i

i lmin 2. (9)
In this process one local descriptor dl of the first image is

compared to all feature descriptors di of the second image
within the 128 dimensional feature space, and vice versa. The
minimum squared error then leads to the best corresponding
point �dl . Afterwards, the next local descriptor

�

dl �1 is selected
and matched. This procedure is repeated until all feature
descriptors are processed. Fig. 6 shows the resulting point cor-
respondences of the two sequential images.

2.5 Homography estimation
In the next step we determine an image-to-image trans-

form, so called 2-D homography, based on the final point
correspondences. Since the images of the video sequence
describe a non-rigid camera movement, we choose an affine
model for the homography. An affine transform provides six
degrees of freedom, parametrized by a translation vector
�

t t tT x y� ( , ), rotation angle �, scales sx and sy, and a shear pa-

rameter a. In homogeneous coordinates the homography
matrix M can bewritten as

M
A

�


�

�
�
�


�

�
�
�

�


�
�


�
�

a b c
d e f

t
T

0 0 1
0 1

�

� (10)

with

A �
�

�
�

�

�
�
�

�
��

�

�
��

�1
0 1

0
0

a s
s

x

y

cos( ) sin( )
sin( ) cos(

� �

� �)
�

�
�

�

�
�. (11)

Although the matching process shows a good matching
result (see Fig. 6), some matching errors can occur, due to the
noise and blur of the PDD images. Since matching errors
have a high impact on the image transformation, we have to
reject these outliers to perform a robust homography estima-
tion. Therefore we employ the RANSAC (RANdom SAmple
Consensus) model fitting algorithm [7], which rejects outliers
of the matching results during an iterative process. With
the adaption of the fitting model to our 2-D affine model,
RANSAC carries out the following steps:
1. Select a set of p � 4 point matches randomly (four point

correspondences are required to determine the affine
model),

2. Validate the points of being not collinear. If they are col-
linear, go back to step 1,

3. Calculate the affine 2-D homography between image two
and one,

4. Apply the transform to each feature point of image two
and the inverse transform to image one, respectively.
Count the number of points (so-called inliers), which
lie within a spatial error tolerance of �max to the related
reference points,

5. If the number of inliers is greater than the best previous
score, save the homography matrix.

6. If the maximum number Nmax of trials is reached, termi-
nate the algorithm. Otherwise go back to step 1.

The rejected point correspondences of Fig. 6 performed
by the iterative RANSAC algorithm are labeled black in Fig. 7.

Eventually the affine homography matrix between image
two and one is determined by the greatest number of inliers.

2.6 Stitching and blending
Now we apply the estimated homography to image two

and combine image one and two. If a direct composition

52 ©  Czech Technical University Publishing House http://ctn.cvut.cz/ap/

Acta Polytechnica Vol. 48  No. 3/2008

Fig. 5: 2×2 Keypoint descriptor array representing the orienta-
tion histogram of an 8×8 set of samples (principle sketch)

Fig. 6: Matched point correspondences of image one and image
two Fig. 7: Rejected outlier points (labeled black) performed by the

RANSAC algorithm


method is used, visual artifacts like edges and signal disconti-
nuities occur in the combined image, as shown in Fig. 8.

This effect results from the inhomogeneous illumination
of the bladder wall during the cystoscopy. The illumination
intensity decreases from the middle of the image to the bor-
ders. To overcome this problem, we apply a cross blending
interpolation, suggested by Wald et al. [8], during the compo-
sition process. The method performs an interpolation in the
overlapping region based on a linear mutual weight distribu-
tion, as illustrated in Fig. 9.

Due to to these weight functions, pixels with low illumina-
tion at the image borders have less impact on the interpola-
tion than pixels in the image center. This approach reduces
the visual artifacts significantly, as can be seen in Fig. 10.

Finally, the two images are stitched and a combined image
is built. Then we apply the whole image mosaicing algorithm

iteratively to the subsequent images, until all frames of the
video sequence are processed.

3 Optimization
We implement the image mosaicing algorithm firstly in an

offline MATLAB program code. To reduce the computational
time we apply some further improvements to the algorithm.
Since the camera movement along the video sequence is
smooth and slow, we dynamically increase the processing
frame step size and decrease the overlapping region of the
images, respectively. If a sufficient number of reliable match-
ing points is still found, the expansion of the panoramic
image will perform faster. Otherwise not enough matching
points are found due to low contrast of the image structures or
too small image overlap, and the matching process fails. In
that case we successively decrease the processing frame step
size of the subsequent images, until the matching succeeds.

4 Results and perspectives
After all images of the video sequence have been pro-

cessed by our image mosaicing algorithm, the final pan-
oramic image is built. Fig. 11 represents a panoramic image of
an original endoscopic PDD video sequence of 580 frames.

The panorama shows a section of the left part of the uri-
nary bladder, stitched by the single video frames. Fig. 11 show
that the papillary tumors are located on the upper left blad-
der wall related to the left urethral orifice. The spatial relation
between adjacent images can now be directly accessed by the
panoramic image, since this information is only given implic-
itly by the camera movement along the video sequence in

©  Czech Technical University Publishing House http://ctn.cvut.cz/ap/ 53

Acta Polytechnica Vol. 48  No. 3/2008

Fig. 8: Direct image compositions without the blending
algorithm

Fig. 9: Cross blending interpolation with a mutual weight distri-
bution in the overlapping region of image I and II

Fig. 10: Image compositions applied with a linear cross blending
interpolation

Fig. 11: Panoramic image built from an original endoscopic PDD
video sequence of 580 frames


time. In addition, the panoramic image can be documented
in medical evidence records to supplement the textual de-
scriptions of the tumor positions in the bladder. This will lead
to a better and more intuitive understanding, and can be used
for follow-up cystoscopic examinations.

Since the computation time for each image pair takes sev-
eral seconds, real-time image mosaicing is not supported yet.
A software implementation in C�� should improve the per-
formance, and will be applied in future work. Further clinical
tests and evaluations also have to be performed. Nevertheless
physicians’ first comments have indicated that offline results
can already provide a high clinical benefit.

5 Summary
In this paper we have developed an image mosaicing

algorithm for bladder video sequences in fluorescence endos-
copy. First, we extracted the relevant image information and
applied a feature-based algorithm to get robust and distinc-
tive image feature points for each image pair. Based on our
affine parameter model we used an iterative optimization
algorithm to estimate the best image-to-image transform
according to a mean squared error measurement. Then we
described how visual artifacts, caused by inhomogeneous illu-
mination, could be compensated during the stitching process
by a mutual linear interpolation function. The results of
our iterative image mosaicing algorithm were discussed and
illustrated by a panoramic image of an original bladder endo-
scopic video sequence. Finally we described some optimiza-
tion steps and perspectives.

Acknowledgments
I would like to thank Dr. med Andreas Auge, Waldklini-

kum Gera, Germany and Olympus & Winter Ibe GmbH for
technical support, providing the video sequences and giving
preliminary feedback for this project.

I would also like to thank Prof. Dr.-Ing. Til Aach, Institute
of Imaging and Computer Vision, RWTH Aachen University,
Germany, who supervised this project.

References
[1] American Cancer Society®, Cancer Facts and Figures

2008 [Online]. Available:
http://www.cancer.org/downoads/STT/
2008CAFFfinalsecured.pdf

[2] Otsu, N.: A Threshold Selection Method from Gray-
-Level Histograms, IEEE Trans. Syst. Man Cybern., Vol. 9
(1979), p. 62–66.

[3] Szeliski, R.: Image Alignment and Stitching: A Tutorial,
Technical Report MSR-TR-2004-92, Microsoft Research,
2006.

[4] Lowe, D.: Distinctive Image Features from Scale-Invari-
ant Keypoints, Int. Journal of Computer Vision, Vol. 60
(2004), No. 2, p. 91–110.

[5] Lindeberg, T.: Scale-Space Theory: A Basic Tool for
Analysing Structures at Different Scales. Journal of Ap-
plied Statistics, Vol. 21 (1994), No. 2, p. 224–270.

[6] Mikolajczyk, K.: Detection of Local Features Invariant to
Affines Transformations. Ph.D. thesis, Institut National
Polytechnique de Grenoble, France, 2002.

[7] Fischler, M. A., Bolles, R. C.: Random Sample Consen-
sus: A Paradigm for Model Fitting with Applications to
Image Analysis and Automated Cartography. Communi-
cations of the ACM, Vol. 24 (1981), No. 6, p. 381–395.

[8] Wald, D., Reeff, M., Székely, G., Cattin, P., Paulus, D.:
Fliessende Überblendung von Endoskopiebildern für
die Erst ellung eines Mosaiks. Bildverarbeitung für die
Medizin, Mar. 2005, p. 287–291.

Alexander Behrens
e-mail: Alexander.Behrens@lfb.rwth-aachen.de

Institute of Imaging & Computer Vision

RWTH Aachen University
D-52056 Aachen, Germany

54 ©  Czech Technical University Publishing House http://ctn.cvut.cz/ap/

Acta Polytechnica Vol. 48  No. 3/2008