سليمان مرتضى وعلي خليل


Al-Khwarizmi 
Engineering   

Journal 
Al-Khwarizmi Engineering Journal, Vol. 8, No.4, PP 1-8 (2012) 

 
Speech Compression Using Multecirculerletet Transform 

 
Sulaiman Murtadha               Ali. K. Ibrahim 

Department of Electrical Engineering/University of Baghdad                           

                                                     
(Received 20 March 2011; accepted 14 May 2012) 

 
Abstract 
 

Compressing the speech reduces the data storage requirements, leading to reducing the time of transmitting the 
digitized speech over long-haul links like internet. To obtain best performance in speech compression, wavelet 
transforms require filters that combine a number of desirable properties, such as orthogonality and symmetry.The MCT 
bases functions are derived from GHM bases function using 2D linear convolution .The fast computation algorithm 
methods introduced here added desirable features to the current transform. We further assess the performance of the 
MCT in speech compression application. This paper discusses the effect of using DWT and MCT (one and two 
dimension) on speech compression. DWT and MCT performances in terms of compression ratio (CR), mean square 
error (MSE) and peak signal to noise ratio (PSNR) are assessed. Computer simulation results indicate that the two 
dimensions MCT offer a better compression ratio, MSE and PSNR than DWT. 
 
Keywords: Sound, Speech Compression, MCT, DWT. 
 
 
1. Introduction 
 

Data compression is the process of converting 
an input data stream (the source stream or the 
original raw data) to a data stream with a smaller 
size (the output, the bit stream, or the compressed 
stream). A stream is either a file or a buffer in 
memory. Data compression is popular for two 
reasons: (1) People like to accumulate data and 
hate to throw any data away. No matter how big a 
storage device one has, sooner or later it is going 
to overflow. Data compression seems useful 
because it delays this inevitability. (2) People hate 
to wait a long time for data transfers. When sitting 
at the computer, waiting for a web page to come 
in or for a file to download, anything longer than a 
few seconds is a long time to wait [1].  Speech 
coding is a lossy type of coding, which means that 
the output signal does not exactly sound like the 
input. The input and the output signal could be 
distinguished to be different. Coding of audio 
however, is a different kind of problem than 
speech coding. Audiocodingtries 
tocodetheaudioin a perceptually lossless way. 

This means that even though the input and 
output signals are not mathematically equivalent, 

the sound at the output is the same as the input. 
This type of coding is used in applications for 
audio storage, broadcasting, and Internet 
streaming [2]. Speech compression plays an 
important role in teleconferencing, satellite 
communications and multimedia applications. 
However, it is more important to ensure that 
compression algorithm retains the intelligibility of 
the speech. The success of the compression 
scheme is based on the simplicity of the 
technology and efficiency of the algorithm used in 
the system. Parametric coding techniques are 
commonly used methods for speech compression 
and its application [4]. This paper deals with 
speech compression of isolated spoken words. A 
flexible compression scheme that is based on 
MCT decomposition is used in this work. This 
paper is organized as follows. Section 2 gives a 
discussion of the Discrete Wavelet Transforms. 
Section 3 explains the data base used for the 
experiment.Section 4 is devoted for computer 
simulation and results and section 5 concludes the 
paper. 
 
 
Sulaiman Murtadha                                   Al-Khwarizmi Engineering Journal, Vol. 8, No.4, PP 1- 8 (2012) 

2 

 
2. Discrete Wavelet Transform 
 
The signal is passed through a series of high 

pass filters to analyze the high frequencies, and it 
is passed through a series of low pass filters to 
analyze the low frequencies. The procedure starts 
with passing this signal (sequence) through a half 
band digital lowpass filter with impulse response 
h[n]. Filtering a signal corresponds to the 
mathematical operation of convolution of the 
signal with the impulse response of the filter.  
The convolution operation in discrete time is 
defined as follows [4]: 

]2[].[][ nkHnxkL
n

−= ∑                           …(1) 
]2[].[][ nkFnxkH

n

−= ∑                         …(2) 

][kL  and ][kH  are the outputs of the lowpass 
and highpass filters.                                                                                 
 
 
3. Multicircularlet Transform 
 

One famous multifilter bank is the GHM filter 
proposed by Geronimo, Hardian, and Massopust 
[5]. The GHM basis offers a combination of 
orthogonality, symmetry, and compact support, 
which cannot be achieved by any scalar wavelet 
basis [6]. The GHM two scaling and wavelet 
functions satisfy the following two-scale dilation 
equations: 

∑ 







−
−

=








k
k kt

kt
H

t
t

)2(
)2(

2
)(
)(

2

1

2

1

φ
φ

φ
φ

            …(3) 

∑ 







−
−

=








k
k kt

kt
G

t
t

)2(
)2(

2
)(
)(

2

1

2

1

φ
φ

ψ
ψ

           …(4) 

where Hk for GHM system is four scaling matrices 
H0, H1, H2, and H3.Also, Gk for GHM system are 
four wavelet matrices: G0, G1, G2, and G3. 
The individual coefficients values of these 
matrices are generated using the following 
procedure: 

1. Apply the 2-D linear Convolution between the 
G’s & H’s . This can be achieved as follows: 
a) compute    iii HHA ⊗=1  
b) compute   iii GGB ⊗=1             
where i= 0,1,2,3 
2. Now compute the 2-D Convolution between the 
resultant of step 1 & the G’s  & H’s. This can be 
done through the following way: 

a) Compute iii HAA ⊗= 12  
b) Compute iii GBB ⊗= 12                
 where i= 0,1,2,3 

2. The linear convolution process will be repeated 
several times. It was found that the optimal results 
were at the third step. This selection is based upon 
the evaluation of the application of the resultant 
coefficients in compression task. 
The proposed matrix coefficients A’s and B’s 
wear obtained by performing the following 
computations: 
a) Compute iii HAA ⊗= 2  
b) Compute iii GBB ⊗= 2                   
where i= 0,1,2,3 

The new multifilter Bases functions are denoted 
by A’s and B’s which are stated as 
     ,    ,     ,          &      ,     ,     ,      
The results of the third convolution (basis 
functions) are: 
a) The  A’s 2x2 matrices are:   =  1.4561    1.4131−  1.0265 −0.9857  ,    =   1.6896  1.5814  1.4376  1.5450    =   0.0977 −0.09450 0   ,         = 1.0 − 005 ∗    0.6250  00 0  
b) The B’s 2x2 matrices are:   =     0.0460 0.0343−0.0459 −0.0 342  ,          =  2.7696 −2.4406−2.4142 2.7225     =  1.6610 −1.60661.6581 −1.6038   ,              =  −0.0302 −0.03020.0052 0.0052   

The results show that at the third convolution 
the number of zeros in the resultant matrix 
increased which gives matrices similar in 
construction to Laurent matrix. Due to the good 
characteristics of the transformed 1-D signal 
(speech) by the basis functions obtained from the 
third convolution, it will be adopted as the 
transform named “Multicircularlet transform”. 
The AB multifilter bank coefficients are 2 by 2 
matrices, and during the convolution step they 
must multiply vectors (instead of scalars). This 
means that multifilter banks needs 2 input rows. 
This transformation is called preprocessing. The 


Sulaiman Murtadha                                   Al-Khwarizmi Engineering Journal, Vol. 8, No.4, PP 1- 8 (2012) 

3 

 
most obvious way to get two input rows from a 
given signal is to repeat the signal. Two rows go 
into the multifilter bank. This procedure is called 
“Repeated Row” which introduces oversampling 
of the data by a factor of 2 [8]. For a given scalar 
input signal {Xk} of length N (N is assumed to be 
a  power of 2 and so is of even length), repeated 
row preprocessing of this signal is by repeating 
the input stream with the same stream multiplied 
by a constant α. So the input length-2 vector are 
formed from the original as:  
 

                                                                    …(5) 
Where k=0,1,2,…,N-1 
Here   is constant, and from the preprocessing 
scheme of the GHM multifilter bank ,   is equal 
to   (1 √2 ). It is found that this factor is very 
suitable for preprocessing the application of the 
proposed transform. 
After the multifilter bank reconstruction 
(synthesis) step a postfiltering is applied.  
If the preprocessing step, represented by a matrix 
multiplication,   =                                                             …(6) 
Where P is 2NxN, X is Nx1, and    is 2Nx1, then 
in detail 























=























=













































MMMO

L

L

L

L

1
1,0

0
1,0

1
0,0

0
0,0

1

1

0

0

3

2

1

0

 
00
010
00
001

v
v
v
v

X
X
X

X

X
X
X
X

α

α

α

α
   …(7) 

For computing discrete multicircularlet transform, 
the transformation matrix can be written as 
follows: 



































=

1032

3210

3210

3210

1032

3210

3210

000000
000000

000000
0000

000000

000000
000000

BBBB
BBBB

BBBB
BBBB

AAAA

AAAA
AAAA

W

L

L

MMMMLMMMMMM

L

LMM

L

MMMMLMMMMMM

L

L

   
                                                                  …(8)                                                                           

Where     ,    are the impulse responses of the     multifilter bank. 
 
 
4. Proposed Compression Algorithem 
 
4.1. Implementing Wavelet and          

Multicirculertlet Transform 
 

The following steps will be followed in the 
implementation of this algorithm: 
1. Record and read the input speech in sampled and 

digitized form. In this step the input speech will 
be shown as a one dimension array stored in 
specified vector. 

2. Apply DWT, save the coefficients and write 
them as an audio file format called WAV format. 

3. Apply MCT; 1D & 2D MCT for an input speech 
are used. At first 1D MCT is applied. For this 
purpose, each speech after sampling will be 
resized to the power of two. Then by using a 
different program, 2D MCT will be applied. In 
2D MCT program, the speech samples are 
divided into different levels, each level contains 2  samples. 

Write output Files as a stream in the desired format. 
 

4.2. Applying Multicirculerletlet Transform 

MCT contains the following steps:                                                             
a) Use Hamming window and different levels by 

changing Ns. 
b) It is possible to apply 1D or 2D MCT. 

b.1) For applying 1D MCT: 
b.1.i) Compute the length of the speech, the 
number of total    samples contained. 
b.1.ii) If the length is to power of two, apply 
MCTs directly, save the coefficients and write 
them as a WAV file.  
b.1.iii) If the length is not to power of two, 
refer to the original file, append silence or near 
zero samples to get a power of two speech 
samples. Compute the MCT for the entire 
speech, save the coefficients and write them as 
a WAV file. 
b.2) For applying 2D MCT: 
b.2.i) It doesn’t matter that the length of the 
speech is to the power of two or not. Starting N 
= 2 (each frame equals 4 samples) and 
increasing N (Maximum N = 6, depending on 
the original speech bit rate) achieve the desired 
compressed speech, good compression and 
good quality. 
b.2.ii) Framing is used to cut the long-time 
speech to the short-time speech signal in order 













kX
kX

α


Sulaiman Murtadha                                   Al-Khwarizmi Engineering Journal, Vol. 8, No.4, PP 1- 8 (2012) 

4 

 
to compute 2D MCT. Each frame contains 2  
samples, so the number of columns is equal to 2  and the number of rows is the integer of 
dividing the total number of speech samples 
into 2 . 
b.2.iii) Compute the MCT for each row, then 
column wise on the result, finally cascade the 
rows, save the coefficients and write them as a 
WAV file. 
 
 
5. Experiment Results 
 
After applying the transforms to speeches, the 

compressed and uncompressed speeches will be 
written as a stream file in the form of WAV for 
playing them on any software like windows media 
player or use them on the internet as stream media 
or even more complicated systems like telephone 
switches. Also, it is possible to write the results in 
another form using AU extension. But WAV 
extension is preferred more than AU.  

The test material contains seven speech 
samples stored as files, the format of these files 

are wave format. Each file has different size and 
different number of samples. The type of the 
digital speech is pulse code modulation (PCM) 
and the tested speech samples are in the bit rate of 
64, 256 and 3072 Kbps. The properties of the 
tested speech samples are presented in Tables (1, 
2, 3, 4, 5, 6, and 7) and Fig. (1). 
                                
Table 1, 
Properties of Tested Speech Samples. 

File Name  File Size (BYTE) 
Total 
Samples 

Be 2251016 562743 

Ru 4155904 1038965 

R01 5266700 1316664 

R02 5931500 1482864 

R03 7886092 1971512 

HE 4815572 1203882 

 
Table 2, 
Performance Measures of Speech Signals. 
 

Table 3, 
Compression Performance Results Using Haar DWT. 

File Name Input Size Output Size CR% CF Compression Gain 

Be 2251016 281416  12.502 7.999 91.303 

Ru 4155904 479528 11.538 8.667 94.785 

R01 5266700 588376 11.172 8.951 95.188 

R02 5931500 641476 10.815 9.247 96.598 

R03 7886092 785800 9.964 10.036 100.155 

HE 4815572 551986 11.463 8.724 96.072 

Average  11.242  8.937 95.120 

Signal 
                            PSNR                           MSE

Wavelet 2D MCT Wavelet  2D MCT 

Be 49.8350 52.9273 0.6084 0.3314

Ru 49.6102 52.1699 0.6671 0.3945

R01 51.9419 55.1144 0.3841 0.2003

R02 51.8094 54.3706 0.4113 0.2377

R03 49.8011 52.6932 0.6354 0.3498

HE 50.3298  53.1462 0.5676
0.3151


Sulaiman Murtadha                                   Al-Khwarizmi Engineering Journal, Vol. 8, No.4, PP 1- 8 (2012) 

5 

 
Table 4, 
Compression Performance Using 1D MCT. 

z Input Size Output Size CR%  CF Compression Gain 

Be 2251016 652788 29.000 3.448 53.761 

Ru 4155904 1127974 27.141 3.684 56.637 

R01 5266700 1299708 24.678 4.052 60.769 

R02 5931500 1492908 25.169 3.973 59.913 

R03 7886092 1971556 25.000 4.000 60.205 

HE 4815572 1273926 26.454 3.780 57.750 

Average 26.240 3.823 58.241 

 
Table 5, 
Compression Performance Using 2D MCT N = 1. 

File Name  Input Size  Output Size CR% CF Compression Gain 

Be 2251016 584788 25.979 3.849 58.538 

Ru 4155904 1027974 24.735 4.043 60.668 

R01 5266700 1259708 23.918 4.181 62.127 

R02 5931500 1372908 23.146 4.320 63.552 

R03 7886092 1771556 22.464 4.452 64.851 

HE 4815572 1173926 24.378 4.102 61.301 

Average 24.103 4.158 61.887 

 
Table 6, 
Compression Performance Using 2D MCT N = 2. 

File Name  Input Size Output Size CR% CF Compression Gain 

Be 2251016 273416 12.146 8.233 91.555 

Ru 4155904 459528 11.057 9.044 95.635 

R01 5266700 528376 10.032 9.968 99.860 

R02 5931500 561476 9.466 10.564 102.383 

R03 7886092 685800 8.696 11.499 106.066 

HE 4815572 501986 10.424 9.593 98.196 

Average 10.304 9.817 99.197 

 
Sulaiman Murtadha                                   Al-Khwarizmi Engineering Journal, Vol. 8, No.4, PP 1- 8 (2012) 

6 

 
Table 7, 
Compression Performance Using 2D MCT N = 3. 

File Name Input Size Output Size CR% CF Compression Gain 

Be 2251016 133416 5.927 16.872 122.717 

Ru 4155904 279528 6.726 14.868 117.224 

R01 5266700 318376 6.045 16.542 121.860 

R02 5931500 331476 5.588 17.894 125.271 

R03 7886092 385800 4.892 20.441 131.050 

HE 4815572 301986 6.271 15.946 120.266 

Average  5.908 17.094 123.284 

 
(a)  Mean Square Error (MSE)  

 
(b) Peak Signal to Noise Ratio (PSNR) 

 
Fig. 1. Performance Measures of Speech Signals. 

 
It is shown that 1D has the maximum 
compression ratio percentage. The best 

compression ratio is in the case of 2D MCT with 
maximum number of levels, N equal 3. 

 
Wavelet 

MCT

0

0.2

0.4

0.6

0.8

M
ea

n
Sq

u
ar

e 
Er

ro
r 

(d
B

)

Speech Samples

Be Ru               R01              R02             R03               
HE

Wavelet

MCT

46

48

50

52

54

56

P
ea

k 
Si

gn
al

 t
o

 N
o

is
e 

R
at

io
 

(d
B

)

Speech Samples
Be Ru               R01              R02             R03               
HE


Sulaiman Murtadha                                   Al-Khwarizmi Engineering Journal, Vol. 8, No.4, PP 1- 8 (2012) 

7 

 
6. Conclusions 
 

The Haar wavelet transform is the simplest and 
the fastest one to implement. However, because of 
its discontinuity, it is not optimal to simulate a 
continuous signal. Based on our experiments. 
Haar wavelet obtained the worst compression 
result, which proves the above statement. Db4 
found the first continuous orthogonal compactly 
supported wavelet. Note that this family of 
wavelets is not symmetric. The advantage of 
symmetry is that the corresponding wavelet 
transform can be implemented using a mirror 
boundary condition, which reduces boundary 
artifacts. 

The MCT is a symmetrical transform and easy 
to implement. The MCT based compression 
software designed reaches a signal of noise ratio 
of 55.114db and CR equal to 20.4. The proposed 
method could be classified in the field of 
symmetrical compression. This case occurs when 
the compression and decompression use basically 
the same algorithm but work in opposite 
directions. 

 
7. References 
 
[1] D. Salomon, 2007 “Data Compression, the 

Complete Reference”, Fourth Edition, 
Contributions by, Springer-Verlag London. 

[2]  “Law bit-rate speech coders for multimedia 
communication,” R.V. Cox and P. Krmn, 
LEEE 

[3] Communications Magazine, pages 34-40, 
1996 http://www.bell-labs.com 

[4] Shijo M Joseph, Firoz Sha A and Babu Anto 
P “Speech Compression: A Comparative 
Study Between Discreet Wavelet and 
Wavelet Packet Decomposition” 
International Journal of Computer and 
Network Security Vol 2. No7 2010. 

[5] G. Strange and T. Nguyen, 1996 “Wavelets 
and Filter banks”, Wellesley, Wellesley-
Cambridge. 

[6] Geronimo, J., Hardian,D.& Massopust, P. “ 
Fractal Function and Wavelet Expansion 
Based on Several Functions”, J. Approx. 
Theory, Vol. 78, PP. 373-401, 1994. 

[7] Burrus, C.S., Goperath, R.A., and Gue, H., 
“Introduction to wavelet and wavelet 
transforms”, A primer Upper Saddle, 
NJ(U.S.A.), Prantice Hall, Inc., 1998. 

[8] Strela, V. & Walden, A.T. “ Orthogonal and 
biorthogonal multiwavelets for signal 
denoising and image compression”, Proc. 
SPIE, 3391: 96-107, 1998. 

[9] Shi Zhong and Vladimir Cherkassky, “Image 
Denoising using Wavelet Thresholding and 
Model Selection”, Dept. of ECE, Univ. of 
Texas at Austin, 2001. 

 
http://www.bell-labs.com


 )2012( 1- 8، صفحة 4، العدد8مجلة الخوارزمي الھندسیة المجلد              سلیمان مرتضى                                                         

8 

 
  ضغط الكالم باستخدام تحویل الملتي سیكلرلیت
  

  علي خلیل ابراھیم        سلیمان مرتضى عباس      
  جامعة بغداد/ كلیة الھندسة/ قسم الھندسة الكھربائیة

  
  الخالصة

 ،ومن اجل الحصول على اكفأ اداء. زمن رسال الكالم الرقمي خالل ضغط االنترنیتضغط الكالم یقلل متطلبات خزن المعلومات مما یؤدي الى اختصار 
خوارزمیة الملتي سیكلر لیت  المشتقة من دالة . استعملت خوارزمیة  الویفلیت والتي تحتاج الى مرشحات ذات خصائص مرغوبة مثل التناظر والتعامد

)GHM (یناقش ھذا . الخوارزمیة  خوارزمیة الجدیدة في مجال ضغط الكالم وذلك للسرعة العالیة لھذهلذه اتم استعمال ھ. وتستعمل دالة الكونفلیوشن الخطیة
 ومعدل مربع الخطأ، ) CR( البحث تأثیر استخدام ھذه خوارزمیة  وخوارزمیة  الویفلیت في ھذا التطبیق وكفأءة االداء حسبت من حیث  نسبھ الضغط

)MSE(، ونسبھ االشارة الى الضوضاء )PSNR .( برامج الحاسبة التمثیلیة لھذا التي تؤثر ان خوارزمیة الملتي سیركلر لیت ذات البعدین اعطت افضل
  . النتائج