AP_06_4.vp


1 Introduction
Endoscopy is a minimally invasive diagnostic procedure

that is used to give the physician a realistic impression of
almost any part of the gastrointestinal tract or other organs
inside the human body. During an endoscopic examination a
flexible tube, which contains among other things a video cam-
era and a lamp, is introduced into the human body. It has a
steerable tip that enables the physician to change perspective
and to ease navigation, e. g. through the colon. Another
system for imaging the gastointestinal tract is a so-called
„Camera in a Pill“ or „PillCam“ [1]. This is swallowed by the
patient and travels through the gastrointestinal tract in a
natural way. The images that are recorded by the camera are
sent via a radio frequency transmitter to a data recorder,
which is worn on a belt around the patient’s waist.

Due to the frontal illumination, there are often light re-
flections that are disturbing for the physician. In the case of
classic endoscopy, the physician may change the perspective
by turning the tip of the endoscope to avoid these reflections.
In the case of a camera-in-pill examination, however, there is
no way to force the pill to move to a better position. There-
fore, these reflections need to be removed in a different
manner. The algorithm we are going to present may also be
applied to classic endoscopy for the physician’s convenience.

2 Methods
The algorithm that we apply is used in image communica-

tions to conceal image data corrupted by transmission errors
[2] or in the restoration of x-ray images with defective detec-
tor elements [3]. It models the defective image as a pointwise
product of the undisturbed image and a known binary defect
map that contains ones where the image is correct and zeros
in the case of a defective pixel. This pointwise product in the
spatial domain corresponds to a convolution in the spectral
domain. To restore the image, a spectral deconvolution is
performed.

In case of endoscopy, we first perform a segmentation by
thresholding in the YUV color space and then we treat the
pixels that belong to light reflections as defective pixels,
setting them to zero, and then we apply the defect interpola-
tion algorithm.

2.1 Segmentation of reflections
Before the spectral deconvolution algorithm is applicable,

a map has to be generated that contains ones where the image
is undisturbed and zeros where specular reflexions occur.

These reflections are very bright, so the affected pixels
show high luminance in the YUV color space. Thus, the im-
age is transferred from RGB into the YUV color space by
applying the linear transform

Y
U
V

�

�

�
�
�

�

�

�
�
�

� 	 	

0 299 0 587 0114
0147 0 289 0 436

0 61

. . .
. . .

. 5 0 515 0100	 	

�

�

�
�
�

�

�

�
�
�


�

�

�
�
�

�

�

�
�
�. .

R
G
B

(1)

where the Y component contains luminance and U and V
contain chrominance.

Fig. 1 shows a typical histogram of the luminance channel
of an endoscopic image. On the right hand side, there is a
small peak that corresponds to pixels that contain specular
reflections. We now segment these reflections by thresholding
with a value that is just lower than the left edge of this hill.
Therefore, we low pass filter the histogram and find the posi-
tion beginning from the right hand side where the derivative
changes from a significant positive value to a value near zero.

Since there are often dark rings around the affected pixels
in the reconstructed image, and these are also very disturb-
ing, we enlarge the segmented areas by an erosion operation.
Note that we here use erosion instead of dilation to en-
large the segments, since we have a black segment on a white
background.

32 ©  Czech Technical University Publishing House http://ctn.cvut.cz/ap/

Acta Polytechnica Vol. 46  No. 4/2006

Removal of Specular Reflections in
Endoscopic Images
T. Stehle

During an endoscopic examination, pictures from the inside of the human body are displayed on a computer monitor. Disturbing light
reflections are often visible in these images. In this paper, we present an approach for removing these reflections and replacing them by an
estimate obtained using a spectral deconvolution algorithm.

Keywords: endoscopy, light reflection removal, spectral deconvolution, defect interpolation.

Fig. 1: Histogram of the luminance channel of an endoscopic
image containing specular reflections. The x axis
shows the luminance value, the y axis shows the number of
occurrences


2.2 Spectral deconvolution
For ease of notation, we describe the univariate case. The

results can be easily generalized to the bivariate case.
The observed image g(n) is modeled as

g n f n w n( ) ( ) ( )� 


G k
N

F k W k

N
F l W k l

l

N

( ) ( ) ( )

( ) ( )

� �

� 
 	
�

	
�

1

1

0

1

where 0  �n k N, , f(n) is the undisturbed image, and w(n)
a binary window, with w(n) � 0 if the corresponding pixel n
in g(n) belongs to a reflection and w(n) � 1 otherwise. G(k),
F(k) and W(k) are the DFT spectra of g(n), f(n) and w(n),
respectively. Both f(n) and g(n) are real valued, hence,
G(k) � G*(N	k) and F(k) � F*(N	k). Our goal is to estimate
F k f n( ) ( )DFT� ��� for 0  �n k N, . Let us select a spectral

line pair G(s) and G(N	s) of G(k). If F(k) consisted only of two
lines at s and N	s, i.e.
F k F k F s k s F N s k N s( ) �( ) �( ) ( ) �( ) ( )� � 	 � 	 	 �� � (3)

convolution with the window spectrum W(k) would yield for
the observed pair

� �G s
N

F s W F s W s

G s
N

F s W

( ) �( ) ( ) � ( ) ( )

( ) � ( ) ( )

*

* * *

� 
 � 


� 


1
0 2

1
0� �� 
�( ) ( ) ,*F s W s2

(4)

where �( )F s and �( ) � ( )*F N s F s	 � are the estimated coeffi-
cients. From (4), �( )F s can be found to

�( )
( ) ( ) ( ) ( )

( ) ( )

*
F s N

G s W G s W s

W W s
�

	
0 2

0 22 2
. (5)

The estimated spectrum �( )F k would then be given by (3).
Generally, F(k) consists of more than two spectral lines. The
error after deconvolution (5) at s and N	s in the spectral do-
main is given by

G k G k
N

F k W k

G k
N

F s W k s F

( )

*

( ) ( ) �( ) ( )

( ) �( ) ( ) �

1 1

1

� 	 
 �

� 	 
 	 �� �( ) ( )s W k s
 �
(6)

where G(1)(k) is the spectrum of the window internal differ-
ence image

� �g n g n f n w n w n f n f n( )( ) ( ) �( ) ( ) ( ) ( ) �( )1 � 	 
 � 	 . (7)
Clearly, the spectral error G k( )( )1 0� at the selected line

pair k � s and k � N	s.
Given s and N	s, the estimates �( )F s and �( )F N s	 are

optimal in the minimum mean square error (MMSE) sense
if the energy Eg of g

(1)(n) is minimized. Using Parseval’s
theorem, Eg can directly be evaluated in the spectral domain
according to

� �E g n
N

G k

N
E

g
n

N

k

N

G

� �

� 


�

	
�

	
� �( ) ( )( ) ( )

.

1 2

0

1
1 2

0

1
1

1
(8)

This provides MMSE estimates for the sought frequency
coefficients.

If the selected line pair �( )F s and �( )F N s	 is dominant in
the sense specified below, its convolution with W(k) tends to
“hide” other, less dominant spectral coefficients of F(k). This
influence is removed by (6), so that another line pair can be
selected from G(1)(k), estimated and subtracted. This leads to
the following iteration for spectral deconvolution:
� Initialization: � ( )( )F k0 0� , G k G k( )( ) ( )0 � , i � 1.

� i-th iteration step: Select a pair of spectral coefficients
G si i( ) ( )( )	1 , G N si i( ) ( )( )	 	1 out of G ki( )( )	1 .

� Estimate �( )( )F s i , �( )( )F N s i	 according to (5) such that
G s G N si i i i( ) ( ) ( ) ( )( ) ( )� 	 � 0, i.e.

� �G s
N

F s W F s W s

G

i i i i i

i

( ) ( ) ( ) * ( ) ( )

*(

( ) �( ) ( ) � ( ) ( )	

	
� �1
1

0 2

� �1 1 0 2) ( ) * ( ) ( ) * ( )( ) � ( ) ( ) �( ) ( )s
N

F s W F s W si i i i� �

(9)

�

� �

�( )

( ) ( ) ( ) ( )

(

( )

( ) ( ) ( ) ( ) * ( )

F s

N
G s W G s W s

W

i

i i i i i

�

		 	1 10 2

0 22
2

) ( )
.

( )	 W s i

(10)

Since we seek to minimize the error (8), the line pair we
should select in the ith iteration is that which maximizes the
energy reduction � �� E iF s�( )( ) , which can be calculated to

� �� E i

i i i i

F s

N
F s W k s F s W k s

�( )

�( ) ( ) � ( ) ( )

( )

( ) ( ) * ( ) ( )

�

	 � �
1

2

2

0

1

k

N

�

	
�
(11)

which can be rewritten to

�E
i

i i i i

N
F s

F s W k s F s W k s

�

� � �

1
2

2

� ( )

�( ) ( ) � ( ) (

* ( )

( ) ( ) * ( ) ( ) * ( )

( )

* ( )

) ( )

�( )

� ( ) (

W k s

N
F s

F s W k

i

k

N

i

i

	
�

��
�

��

�

	
�

	
�
0

1

2
1

s F s W k s W k si i i i

k

N
( ) ( ) ( ) * ( )) �( ) ( ) ( ) .

2

0

1

� 	 �
�

��
�

���

	
�

(12)

For a binary window, we have w n w n2( ) ( )� . Inserting this
into Parseval’s theorem, we obtain

W k s W k s

W k N

i

k

N
i

k

N

k

N

( ) ( )

( )

( ) ( )� � 	

� �

�

	
�

	
�

	
� �

�

2

0

1 2

0

1

2

0

1
w n N W

n

N
( ) ( )

�

	
� � 

0

1
0

(13)

©  Czech Technical University Publishing House http://ctn.cvut.cz/ap/ 33

Acta Polytechnica Vol. 46  No. 4/2006

D
F

T
�

�
�

�


and

W k s W k s

N w n j p
s
N

n

i i

k

N

i

( ) ( )

( ) exp

( ) * ( )

( )

� 	

� 	
�
�
�

�

�

	
�
0

1

2
2

�
�

� 

�

	
�
n

N
iN W s

0

1
2( ).( )

(14)

Inserting (13) and (14) into (12) yields

� �� E i i i i i iF s G s F s G s� 
 � 
	 	� ( ) ( ) �( ) ( )* ( ) ( ) ( ) ( ) *( ) ( )1 1 . (15)

Since we know how to estimate �( )( )F s i optimally (10),
� ( )* ( )F s i can be eliminated, thus expressing the error energy
reduction depending on the available error spectrum line
G ki s

i( ) ( )( )	1 as

� E
i

i i i

N

W W s

G s W G s

�
	

( ) ( )

( ) ( ) Re (

( )

( ) ( ) ( )

0 2

2 0 2

2 2

1 2 1� �( ) ( )) ( ) .i iW s
2

2
��
�

�
�
�

�

�
�

�

 
!

(16)

Selecting the best line pair in each iteration hence implies
finding s such that �E according to (16) is maximum. Because
of the symmetry of the DFT spectra of real valued signals, it
suffices to search over only half the coefficients of G ki( )( )	1 ,
i.e. from k � 0 to k � N/2, to find s(i).

�( )
( )

( )( ) ( ) ( )F s
N

W
G si i i� 	

0
1 . (17)

Similarly, the calculation of the error energy reduction �E
according to (16) modifies to

� E
i iN

W
G s� 	

2
0

1 2

( )
( )( ) ( ) . (18)

Clearly, when selecting only a single line, the error en-
ergy reduction depends only on the modulus spectrum
| ( )|( ) ( )G si i	1 , apart from a constant factor. To save computa-
tional expense in the selection step, a simplified approach is
to always select s(i) such that| ( )|( ) ( )G si i	1 is maximum, regard-
less of whether a single line is tested, that is, s i( ) � 0 or
s Ni( ) � 2, or a line pair. Practically, the outcome of the itera-
tion remains almost unchanged.

When the estimated spectrum � ( )* ( )F s i contains as many
lines as there are samples of f(n) inside the window w(n), the
remaining error energy EG vanishes. The backtransformed
estimate � ( )( )f ni is then identical to g(n) inside the observation
window, and contains extrapolated information outside. In
practice, the iteration is stopped when �E falls below a pre-
specified level �, or when a maximum number of iterations is
reached.

To achieve high spectral resolution for the interpolated
signal, it is often reasonable to apply zero-padding to g(n) and
w(n) before transforming and starting the iteration. An illus-
tration of the entire recursion is given in Fig. 2.

3 Results
The algorithm as described above was applied to several

endoscopic images. In our experiments, we used a 9×9 circu-
lar structure element for erosion of the binary mask image,
and we independently applied 100 iterations of the deconvo-
lution algorithm to each channel of the YUV color space.
Fig. 3 and Fig. 4 show two examples with the original image
and a processed version of the image. The specular reflections
are removed and the holes are filled with texture that does not
disturb the overall impression of the image.

However, the results are not perfect. In Fig. 3, several ves-
sels are affected by specular reflections in the lower right
corner, and in the processed image some of these vessels ap-
pear interrupted. In Fig. 4, block artifacts were added to the
image in areas where a large number of specular reflections
occurred. In some cases, the specular reflections itselves were
surrounded by green, pink and yellow circles. These circles
were correctly identified not being specular reflections and
therefore they were not removed completely. This is not se-
vere if the physician knows that they are caused by reflections.
In the processed image these colors are visible, too, but the
reflections have been removed, so the physician might think
it is the real color of the tissue.

The processed images have not yet been presented to a
physician because of the drawbacks described above. How-
ever, the results achieved so far are already very promising.

4 Discussion
We have presented an approach for removing specular re-

flections from endoscopic images by segmentation of the af-
fected pixels and interpolation via a spectral deconvolution
algorithm.

The algorithm, however, needs further improvement. We
are going to modify the segmentation of the specular reflec-
tions such that the typical elliptical shape is incorporated into

34 ©  Czech Technical University Publishing House http://ctn.cvut.cz/ap/

Acta Polytechnica Vol. 46  No. 4/2006

Fig. 2: Illustration of the recursive spectral deconvolution algo-
rithm. From the final spectral estimate � ( )( )F ki , the interpo-
lated signal estimate is obtained by an inverse DFT.


the segmentation process. A rejection criterion will be inte-
grated so that the quantitiy of false positive segmentation re-
sults is reduced. In addition, we will address the color
disturbances in the processed images.

5 Acknowledgments
I would like to thank Prof. Dr. T. Aach, Institute of Im-

aging and Computer Vision, RWTH Aachen University,
Germany, who supervised this project. I would also like to

thank Prof. Dr. med. C. Trautwein, University Hospital
Aachen, Germany, who kindly provided the endoscopic im-
ages used in this paper.

References
[1] Iddan, G., Meron, G., Glukhovsky, A. Swain P.: Wireless

Capsule Endoscopy, Nature, 2000, p. 405–417.
[2] Kaup, A., Meisinger, K., Aach, T.: Frequency Selec-

tive Signal Extrapolation with Applications to Error

©  Czech Technical University Publishing House http://ctn.cvut.cz/ap/ 35

Acta Polytechnica Vol. 46  No. 4/2006

Fig. 3: Endoscopic image of the colon with specular reflections (left) and processed image (right). The overall impression of the pro-
cessed image is better, but some artifacs have been added to the image. The vessels in the lower right corner appear blurred, and
some of them interrupted

Fig. 4: Endoscopic image of the esophagus with specular reflections (left) and the processed image (right). In areas with a large number
of specular reflections block artifacts have been added to the image.


Concealment in Image Communications. International
Journal of Electronic Communication (AE), Vol. 59 (2005),
p. 147–156.

[3] Aach, T., Metzler, V.: Defect Interpolation in Digital Ra-
diography – How Object-Oriented Transform Coding
helps. SPIE Medical Imaging, Proceedings of SPIE 4322,
2001, p. 824–835.

Thomas Stehle
e-mail: Thomas.Stehle@lfb.rwth-aachen.de

Institute of Imaging & Computer Vision
RWTH Aachen University
D-52056 Aachen, Germany

36 ©  Czech Technical University Publishing House http://ctn.cvut.cz/ap/

Acta Polytechnica Vol. 46  No. 4/2006