AP02_02.vp


1 Introduction
Video image compression plays an important role in

transmission and storage of digital video data. The applica-
tions include multimedia transmission, teleconferencing,
videophones, high-definition television (HDTV), CD-ROM
storage, etc. A large body of work in image/video processing
has involved motion estimation [1] and [6]. Applications
of motion estimation exist in image sequence filtering and
restorations, video coding, target tracking, robot navigation,
monitoring and surveillance, biomedical problems, and the
human-computer interface.

The most effective technique for motion estimation makes
use of block matching algorithms (BMA). The full search al-
gorithm (FS) is the most obvious candidate for a search tech-
nique for finding the best possible weight in the search area.
Kago et al [7] use a three-step motion vector search (TSS) to
compute displacements up to 6 pel/frame. This method,
for W � 6 pel/frame, searches 25 positions to locate the best
match. The three-step search (TSS) algorithm is one of the
best fast search algorithms, and provides a good estimation.

To reduce the computational complexity, hierarchical
and multi-resolution fast block matching is used. One family
of fast block motion estimation algorithms relies on the idea
of predicting the approximate large-scale motion vectors
in the coarse-resolution video and refining the prediction
motion vectors to find the final values. These are called hier-
archical [5], [2] or multi-resolution methods [3] and [4].
Hierarchical methods use the same image size but different
block sizes at each level. Multi-resolution methods use differ-
ent image resolutions with a smaller image size at a coarser
level. The wavelet transform has recently emerged as a prom-
ising technique for image processing applications, due to its
flexibility in representing non-stationary image signals, and
its ability in adapting to human visual characteristics. Zhang
and Zafar [8] applied wavelet theory to real-time video com-
pression, and proposed multi-resolution motion estimation
(MRME). This scheme exploits the cross correlation among
all layers of the wavelet pyramid structure in order to re-

duce the computational complexity of the motion estimation
process.

We present a novel multi-resolution variable block size
algorithm (MRVBS) based on wavelet decomposition. The
approach presented in this paper provides an accurate
motion estimate even in the presence of noise. We utilize
a wavelet component of the seven sub-bands from two layers
of a wavelet pyramid in the lowest resolution. In each sub-
-band we perform the block matching estimation within
a nine-block only. The simulation results are analyzed to
assess the proposed algorithm with and without influence of
noise. Noise in a sequence not only degrades the visual qual-
ity, but also hinders the subsequent analysis and processing
(e.g., compression, estimation and coding). The problem of
removing noise from image sequences has attracted a number
of researchers. However, the noise cannot be completely re-
moved from the image sequences.

This paper is organized as follows. In section 2, the pro-
posed algorithm based on wavelet decomposition is brief-
ly described. Section 3 presents simulation results of the
new algorithm without influence of noise. Section 4 presents
simulation results under the influence of a single noise and
mixed noise. A few concluding remarks are given in section 5.

2 The proposed algorithm
Fig. 1 sets out the structure of the algorithm. The MRVBS

algorithm is summarized as follows:
Step 1: To begin the motion vectors estimation process,

the original image frame is decomposed into two
layers using the two-dimensional discrete wave-
let transform (DWT2). The motion vectors for
the lowest low-pass band are estimated by central
search. The search is performed at the center and
its eight neighboring blocks with a block size of
4 × 4.

Step 2: These motion vectors are then used as a new center
for other three-bands in the same layer (layer
number 2). For these three-bands, the search is

30 ©  Czech Technical University Publishing House http://ctn.cvut.cz/ap/

Acta Polytechnica Vol. 42  No. 2/2002

A Low-complexity Wavelet Based
Algorithm for Inter-frame Image
Prediction
S. Usama, B. Šimák

In this paper, a novel multi-resolution variable block size algorithm (MRVBS) is introduced. It is based on: (1) Using the wavelet
components of the seven sub-bands from two layers of wavelet pyramid in the lowest resolution; (2) Performing a block matching estimation
within a nine-block only in each sub-band of the lower layer; (3) Scaling the estimated motion vectors and using them as a new search center
for the finest resolution. The motivation for using the multi-resolution approach is the inherent structure of the wavelet representation. A
multi-resolution scheme significantly reduces the searching time, and provides a smooth motion vector field. The approach presented in this
paper providing an accurate motion estimate even in the presence of single and mixed noise. As a part of this framework, a comparison of the
Full search (FS) algorithm, the three-step search (TSS) algorithm and the new algorithm (MRVBS) is presented. For a small addition in
computational complexity over a simple TSS algorithm, the new algorithm achieves good results in the presence of noise.

Keywords: video compression, motion estimation, wavelet transform, multi-resolution.


also performed by the same method (central
search) using the same block size. We used a block
size of p q� � �� �2 2j j for the jth layer in the wave-
let pyramid, where p and q are the sizes of the
block required at the highest resolution ( p � 16
and q � 16).

Step 3: The current motion vectors, estimated from layer
2, are scaled and used as a new center for the three
highest frequency bands in layer 1. In layer 1, the
search is performed using a block size of 8 × 8.

Step 4: The estimated motion vectors are then scaled and
used as a center for the final central search process.
Also, the final central search is performed at the
center and its eight neighboring blocks. This pro-
cess is performed for the original frame using
block size of 16 × 16.

The computational cost for MRVBS, without the wave-
let complexity, is (36*p2+27*p1+9*p0), where p2, p1, and
p0 are the block sizes used in layer 2, layer 1, and layer 0,
respectively.

3 Simulation results without influence
of noise
Experimental results using the proposed algorithm are re-

ported in this section. The algorithm is applied to three
famous video sequences in the QCIF format: Carphone, Fore-
man, and Miss America. These video sequences have a three-
-kinds of motion. The experimental results are evaluated
using the luminance component of each sequence. The
results are based on the peak-signal-to-noise ratio (PSNR)
function, and use the mean absolute difference (MAD) in

performing block matching. The error terms are not used in
the frame reconstruction. Only forward prediction is imple-
mented in the experiments. No threshold value is used in the
search process.

A performance comparison of MRVBS, TSS, and FS in
terms of PSNR between the estimated frames and the original
frames is carried out for these video sequences. The compari-
son is made among the first 30-frame of each sequence.

The PSNR comparisons show that the MRVBS usually
provides a performance similar to the TSS and FS algorithms,
especially in the case of slow motion with a stationary back-
ground.

As an example, Fig. 2 shows the performance comparison
for the Carphone sequence (this was the worst result).

©  Czech Technical University Publishing House http://ctn.cvut.cz/ap/ 31

Acta Polytechnica Vol. 42  No. 2/2002

Original frame,

x(t)

Original

frame, x(t-1)

Central

search

process

CDWT2
Central

search

process

CDWT2

CDWT2

Central

search

process
CDWT2

Central

search

process

Layer 0

Initial estimates

Detail Detail

Layer 1

Low-pass Low-pass

Initial estimates

Detail Detail Layer 2

Low-pass Initial estimates Low-pass

Fig. 1: Motion Estimation Using Wavelet Decomposition

20

22

24

26

28

30

32

34

36

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29

Frame No.

MRVBS TSS FS

P
S

N
R

Fig. 2: PSNR comparisons of MRVBS, TSS, and FS algorithms for
the “Carphone” sequence without influence of noise


4 Simulation results under influence
of noise
To demonstrate the performance of our algorithm, the

average PSNR (across all input frames) is plotted against
input noise density and signal-to-noise ratio (SNR). The aver-
age PSNR, PSNRavg, is given as

PSNR PSNRavg �

�

�
1

1
F

i

i

F

(1)

where PSNRi is the measured PSNR for frame i, and F is
the total number of frames. We shall compare the MRVBS
algorithm against the TSS and FS algorithms. In addition,
the PSNR comparison among the three algorithms will be
introduced.

4.1 Simulation results under the influence of
Gaussian noise

An additive Gaussian noise with a different signal-to-noise
ratio (SNR) degraded the three video sequences. We applied
the new motion estimation algorithm (MRVBS) to these se-
quences. Fig. 3 shows the performance comparison for the
Miss America sequence with a Signal-to-noise ratio (SNR) of
10 dBs. FS and TSS are performed at layer 0 using a block size
of 16 × 16. The PSNR comparison shows that the MRVBS
usually performs better than the TSS and FS algorithms.
Under normal operating, e.g., input SNR between 30 to
50 dBs, the performance of MRVBS is similar to the per-
formance of the TSS and FS algorithms. For extremely
noisy sequences, e.g., for SNR of 10 dBs, the performance
of MRVBS is as much as 2 dBs better than the other two
algorithms. In Fig. 4, the average PSNR, PSNRavg (across
all input frames) is plotted against input noise level for the
“Carphone” sequence. These results indicate that the motion
estimation techniques used approximately at low level
of noise have the same performance. The performance of
MRVBS is as much as 2 dBs better than the performance of
FS and TSS under a high level of Gaussian noise. In addition,
for the “Foreman” sequence the performance comparison is
similar to the results of the “Carphone” sequence. For the
“Miss America” sequence, the performance of MRVBS is as

much as 3 dBs better than the other performance of FS and
TSS under a high level of Gaussian noise.

4.2 Simulation results under the influence of
salt & pepper (impulse) noise

Additive salt & pepper noise with different noise density
degraded the three video sequences. We applied the new mo-
tion estimation algorithm (MRVBS) to these sequences.

Fig. 5 shows the performance comparison for the “Fore-
man” sequence with noise density of 40 %. FS and TSS are
performed at layer 0 using a block size of 16 × 16.

As an example, the average PSNR, PSNRavg (across all
input frames) is plotted against the input noise level for
the “Miss America” sequence in Fig. 6.

The performance of MRVBS is as much as 8 dBs better
than the other performance of FS and TSS under a high level
of salt & pepper noise. MRVBS performs well in the case of
salt & pepper noise, better than the presence of Gaussian
noise.

32 ©  Czech Technical University Publishing House http://ctn.cvut.cz/ap/

Acta Polytechnica Vol. 42  No. 2/2002

32

33

34

35

36

37

38

39

40

41

1 3 5 7 9

1
1

1
3

1
5

1
7

1
9

2
1

2
3

2
5

2
7

2
9

Frame No.

MRVBS FS TSS

P
S

N
R

[d
B

s
]

Fig. 3: PSNR comparisons of the MRVBS, TSS, and FS algo-
rithms for the “Miss America” sequence with Gaussian
noise (SNR � 10 dBs)

27

29

31

33

35

10 15 30 40 50

MRVBS FS TSS

SRN [dBs]

P
S

N
R

a
v

g

Fig. 4: PSNR (average) vs. source SNR for the “Carphone” se-
quence

10

12

14

16

18

20

22

24

26

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29

Frame No.

MRVBS FS TSS

P
S

N
R

[d
B

s
]

Fig. 5: PSNR comparisons of the MRVBS, TSS, and FS algorithms
For the “Foreman” sequence with salt & pepper noise
(40%)


4.3 Simulation results under the influence of
mixed noise

We will now assess the performance of the proposed
algorithm with respect to mixed Gaussian noise and im-
pulse noise. The restoration result for the “Miss America”
videosequence is shown in Fig. 7 for mixed Gaussian (Vari-
ance � 200) and impulse noise (20 %). From these tests, we
conclude that our algorithm works extremely well for video
sequences corrupted with single or mixed noise. In addi-
tion, for mixed Gaussian (Variance � 200) and impulse noise
(20 %), Fig. 8 shows reconstructed frame number 30 of the
“Carphone” sequence from frame number 29 and motion
vectors estimated using the current algorithms.

5 Conclusion

We introduced the new multi-resolution variable block size
(MRVBS) algorithm. This algorithm is based on a central
search in each layer of the wavelet pyramid. Three QCIF
format video sequences corrupted by single and mixed

©  Czech Technical University Publishing House http://ctn.cvut.cz/ap/ 33

Acta Polytechnica Vol. 42  No. 2/2002

16

20

24

28

32

36

10 20 30 40

Noise Density (%)

MRVBS FS TSS

P
S

N
R

a
v

g
[d

B
s

]

Fig. 6: PSNR (average) vs. source for the “Miss America”
sequence

27

29

31

33

35

37

39

1 3 5 7 9

1
1

1
3

1
5

1
7

1
9

2
1

2
3

2
5

2
7

2
9

Frame No .

TSS MRVBS FS

P
S

N
R

[d
B

s
]

Fig. 7: PSNR comparisons of the MRVBS, TSS, and FS algorithms
for the “Miss America” sequence under the influence of
mixed noise

a) original frame no. 30

b) reconstructed using MRVBS

d) reconstructed using TSS

Fig. 8: “Foreman” sequence: Reconstructed frame no. 30 using
the MRVBS, TSS, and FS under the mixed noise

c) FSreconstructed using


noise used for performance evaluation. The results show that
MRVBS is usually better than the FS and TSS algorithms,
especially with slow motion video sequences. The PSNR com-
parisons show that, the best performance is in the case of the
Miss America sequence (slow motion with stationary back-
ground). Experimentally, the proposed algorithm has been
shown to significantly outperform the motion estimation for
these three types of video sequences for several distinct noise
types, including impulsive, Gaussian, and mixed impulsive
Gaussian noise.

From the experimental results, under the influence of
mixed noise, the maximum improvement within the first
30-frame was about 6 dBs with the Miss America sequence.
We observe that the maximum improvement in case of pan-
ning and object translation in the Foreman sequence is
2.5 dBs in comparison with the FS algorithm. This supports
our claim that MRVBS can be effectively used with noisy
sequences to get a better estimation.

The simulation confirms that the proposed algorithm
performs better than the FS and TSS algorithms with the
three types of motion used. These gains can be observed
in terms both of the perceptual quality and of the PSNR
of the restored images. It should also be noted that since
the MRVBS algorithm can contain a regular data flow
through the entire search procedure, it is suitable for hard-
ware implementation.

References
[1] Sezan, M. I., Langendijk, R. L. (ed.): Motion Analysis and

Image Sequence Processing. Kluwer Academic Publishers,
1993.

[2] Dufaux, F., Kunt, M.: Multi-grid Block Matching Motion
Estimation with an Adaptive Local Mesh Refinement. In Proc.
SPIE Visual Commun. and Image Processing ’92,
Vol. 1818, p. 97–109.

[3] Li, J., Lin, X., Wu, Y.: Multiresolution Tree Architecture with
its Application in Video Sequence Coding: A New Result. In

Proc. SPIE Visual Communications and Image Proces-
sing ’93, Vol. 2094, p. 730–741.

[4] Uz, K. M., Vetterli, M., Legall, D.: Interpolative Multi-
resolution Coding of Advanced Television with Compatible
Sub-channels. IEEE Trans. Circuits Syst. Video Technol.,
March 1991, Vol. 1, p. 86–99.

[5] Bierling, M.: Displacement Estimation by Hierarchical Block
Matching. In Proc. SPIE Visual Communications and
Image Processing ’88, Vol. 1001, p. 942–951.

[6] Netravali, A. N., Robbins, J. D.: Motion Compensated Tele-
vision Coding: Part I. The Bell System Technical Journal,
March 1979, Vol. 58, p. 631–670.

[7] Kago, T., Iinuma, K., Hirano, A., Iijima, Y., Ishiguro, T.:
Motion compensated interframe coding for video conferencing.
In Proc. Nat.: Telecommun. Conf., New Orleans, LA,
Nov. 29–Dec. 3, 1981, p. G5.3.1–5.3.5.

[8] Zhang, Y. Q., Zafar, S.: Motion-compensated Wavelet Trans-
form Coding for Color Video Compression. IEEE Trans.
Circuits Sys. Video Technol., September 1992, Vol. 2,
p. 285–296.

Doc. Ing. Boris Šimák, CSc.
phone: +420 2 2435 2203
e-mail: simak@feld.cvut.cz

Dept. of Telecommunications Engineering
Czech Technical University in Prague
Faculty of Electrical Engineering
Technická 2
166 27 Prague 6, Czech Republic

Ing. Sayed Usama, Ph.D.
phone: +420 2 2435 2088
e-mail: usama@feld.cvut.cz

Deparment of Electrical Engineering
Faculty of Engineering
Assiut University
Assiut, Egypt

34 ©  Czech Technical University Publishing House http://ctn.cvut.cz/ap/

Acta Polytechnica Vol. 42  No. 2/2002