Al-Qadisiyah Journal For Engineering Sciences,         Vol. 8……No. 4 ….2015 
 

555 
 

IMPROVE THE PERFORMANCE OF VOICE-EXCITED LPC 

VOCODER IN SPEECH CODING APPLICATION  

Assistant Lecturer:  Awwab Qasim Jumaah Althahab 

Assistant Lecturer:  Ahmed Hussein Shatti Alisawi 

Electrical Engineering Department, Collage of Engineering, University of Babylon 

Email: eng.awwab.qasim@uobabylon.edu.iq                     Email: Eng_Ahmed.Shatti2014@yahoo.com 

Received 3 August 2015        Accepted 21 October 2015 

 
ABSTRACT 

One of the fundamental problems in the area of digital speech processing is a speech coding that 

has been studied for years. Speech coding simply transforms the speech signals as fewer numbers 

of binary digits as possible, which can be then transmitted through channels or stored in memory 

devices. Due to the fact that the bandwidth of the channels is not unlimited, speech compression is 

needed to let more space bandwidth; thereby more speech coded signals can be sent over same 

channel bandwidth. Linear Predictive Coding (LPC) that is based on linear prediction (LP) model, 

which is a method to represent and analyze human speech, is one of the most common speech 

coding techniques. It is used in compression the digital speech signals, resulting low bit rate. This 

method has become the dominant technique for determine the fundamental speech parameters such 

as pitch, formants, spectra, vocal tract area functions. However, the weakness of LPC is in 

estimating the fundamental speech parameters causes poor voice quality and performance. The aim 

of this paper is to build a system with precise detection of speech parameters for encoding a better 

speech quality at low bit rate. This can be done through proposing a modified version to the voice-

excited LPC vocoder based on Discrete Cosine Transform (DCT) and quantization of residual error 

while retaining low bit rate; hence conserve the bandwidth. Segmental power signal to noise ratio 

(SEGPSNR) and mean square error (MSE) as an objective measure for speech signal quality are 

implemented for the proposed improvement through computer simulation using Matlab 11.  

 
Keywords: Voice-Excited LPC vocoder, Levinson-Durbin recursion, IIR filters, Discrete Cosine 

Transform (DCT), Mean square error (MSE). 

 
 اداء الترميز الصوتي التنبؤي الخطي المثار في تطبيقات تشفير الكالم تحسين

 العيساوي مدرس مساعد: اواب قاسم جمعة الذهب                           مدرس مساعد: احمد حسين شاطي

  قسم الهندسه الكهربائيه, كليه الهندسه, جامعه بابل  

 
 الخالصة

 لكالميمكن تعريف تشفير ا. الذي درس لسنوات تشفير الصوت الرقمي هوالكالم واحدة من القضايا الرئيسية في مجال معالجة 

الى اقل عدد ممكن من االرقام الثنائية والتي تنقل بعدها عبر القنوات او تخزن في اجهزة  لكالمتحويل اشارات ا على انه ببساطة

mailto:awwab.qasim@uobabylon.edu.iq
mailto:Eng_Ahmed.Shatti2014@yahoo.com


Al-Qadisiyah Journal For Engineering Sciences,         Vol. 8……No. 4 ….2015 
 

555 
 

لكي نسمح فراغات اكثر في  كالمالذاكرة. بسبب الحقيقة بان عرض الحزمة للقنوات الناقلة لالشارة محدودة, لذلك نحتاج ضغط ال

(، LPCالترميز الخطي التنبئي )له عبر نفس قناة النطاق الترددي. النطاق الترددي وبالتالي المزيد من الكالم المشفر ممكن ارسا

 الصوت إشارات ضغط في تستخدم والتي شيوعا كالماكثر تقنيات تشفير ال هو واحد من، كالم البشر وتحليل لتمثيل الذي هو طريقة

االساسية مثال  كالمحسابات خصائص الهذه الطريقة اصبحت تقنية سائدة في  الترميز. معدل بت انخفاض، مما ينتج عن الرقمية

( نوع ضياعي من LPC)ولكن الضعف في حسابات خصائص الكالم تجعل  النغمة, االصوات, الطيف ودوال الجهاز الصوتي.

دقيق لخصائص الكالم الكشف الانواع الترميز ويسبب ضعف في اداء وجودة الصوت. الهدف من هذا العمل هو بناء نظام مع 

نسخة محسنة من الصوت المثار  قتراحالترميز. ويمكن القيام بذلك من خالل ا معدل بت جودة كالم في انخفاضلترميز افضل 

 بت انخفاض معدل مع الحفاظ على المتبقي الخطأ و تكميم  (DCT)باالعتماد على تحويلة  (VELPC)للترميز التبؤي الخطي 

ومتوسط مربع االخطاء  (SEGPSNR)بة قدرة االشارة الى الضوضاء عرض النطاق الترددي. نس الحفاظ على وبالتالي ,الترميز

(MSE)  11باستخدام ماتالب  الكمبيوتر محاكاة من خاللالمقترح  تحسينللنفذت كمقياس موضوعي لجودة اشارة الكالم . 

 
 الكلمات المفتاحية:  الترميز الصوتي التنبؤي الخطي المثار, المرشحات ذو االستجابة الترددية االنهائية, تحويلة 

 .(MSE) , متوسط مربع االخطاء

 
1. INTRODUCTION 

The fundamental purpose of signal compression techniques is to reduce the number of bits required 

to represent a signal (speech, audio, image and video) while keeping an acceptable signal quality 

for the purpose of reaching the communication system target of low bit rate transmission or 

message encryption (Marcelo and Valdemar, 2005). Speech coding is the process of transforming 

the speech signal at hand, to a more compact form, which can then be transmitted with a 

considerably smaller memory. The motivation behind this is the fact that access to unlimited 

amount of bandwidth is not possible. Therefore, there is a need to code and compress speech 

signals. For example, in digital cellular technology, many users need to share the same frequency 

bandwidth. Utilizing speech compression makes it possible for more users to share the available 

system. Another example where speech compression needed is in digital voice storage. For a fixed 

amount of available memory, compression makes it possible to store longer messages.  

One of the most powerful speech analysis techniques is the method of linear predictive analysis that 

is uses (LPC) vocoder (Daniele et al., 2014). The speech features such as pitch, formants, spectra 

and vocal tract transfer function can be all estimated using this type of technique. The good 

extracted parameters result good reconstructing the speech signal and then more intelligible speech. 

The weakest link in most LPC vocoders is estimations and representations of the excitation 

functions (especially the pitch period of excitation signal) (Marcelo and Valdemar, 2005). In this 

vein, (Yugandhar and Satyapriya, 2013) presented and implemented three coding techniques (LPC, 

Waveform and sub-band coding), checking their performance measures such as compression ratio 

and speech audible quality. In (Minal and Sonal, 2014), the authors proposed a system to 

implement a model based design by using the linear prediction coefficients of the encoded speech 

data and prove to be the promising method for speech compression. (Jingyun et al., 2014) presented 

a linear prediction model that is based on first order norm. They proposed a method based on linear 

programming to calculate the parameters of the model and analyze the performance of the first 

order norm. 

A great challenge in digital coding of signals is the development of methods for assessing the 

quality of reconstructed signals. In this paper, for more accurate estimation of speech parameters, a 

modification on the voice-excited LPC vocoder based on Discrete Cosine Transform (DCT) is 

proposed for coding two male wideband speech signals. The measure used for assessing the quality 

of signals may be classified into two general groups: subjective quality and objective quality 

measures. For the purpose of this paper, the speech coder developed is evaluated using the 

objective measure, which is based on a direct mathematical comparison between the original and 

(DCT) 


Al-Qadisiyah Journal For Engineering Sciences,         Vol. 8……No. 4 ….2015 
 

565 
 

processed signals. The objective analysis that will be performed consists of computing segmental 

power signal to noise ratio (SEGPSNR) and mean square error (MSE) between the original and the 

coded speech signals. Furthermore, the effect of quantizing the residual error on bit rate is studied.  

The remaining of the paper is organized as follows: Section 2 demonstrates the speech production 

model and how speech can be represented as the output of a linear time varying system. In section 

3, the proposed modification to find the optimum estimation of speech parameters is discussed and 

presented. In section 4, the performance of the proposed approach is evaluated using computer 

simulation in MATLAB 11. The paper is finished with some concluded remarks. 

 
2. SPEECH PRODUCTION 

Figure (1) depicts the simple speech production model. Speech is produced when an air being 

pushed from the lungs through the vocal tract, and out through the mouth to generate speech. In this 

type of description the lungs can be thought of as the source of the sound and the vocal tract can be 

thought of as a filter that produces the various types of sounds that make up speech (Lawrence and 

Ronald, 2009). Speech signals consist of several sequences of sounds, which can be classified into 

voiced and unvoiced. The fundamental difference between these two types of speech sounds comes 

from the way they are produced. Voiced sounds are produced by vibrating the vocal cords due to 

the air comes from the lungs. The rate at which the vocal cords vibrate dictates the pitch of the 

sound. However, unvoiced sounds do not rely on the vibration of the vocal cords. The unvoiced 

sounds are created by the constriction of the vocal tract that is modeled as a linear all pole filter 

(infinite impulse response filter). The vocal cords remain open, and the constrictions of the vocal 

tract force air out to produce the unvoiced sounds (Lawrence and Ronald, 2009). Now, speech can 

be modeled as the output of a linear time varying system (IIR filter), excited by either quasi-

periodic impulse train or white noise to generate various components of speech. 

During the production of a given speech signal, the encoding process of LPC analyzer uses to 

successfully predict and estimate a set of accurate parameters for modeling the vocal tract (all pole 

filter). The predictor parameters determined by minimizing the residual error, which is the sum of 

the squared differences between the actual speech signal and the linearly predicted one over frames 

of a finite duration, which is normally 20 ms long (John et al., 1999). Only the predictor 

coefficients and residual error are sent instead of sending original speech signals. Decoding process 

involves using the error and predicted parameters received to build a synthesized version of the 

original speech signal. The transfer function of the time-varying digital filter is given by (Lawrence 

and Ronald, 2009). 

 
      ( )  
 

  ∑    
   

 ( )

 ( )
                                                                                               (1) 

 
where        are the gain, order and parameters of IIR filter. Only the first  
   coefficients are 

transmitted to the LPC synthesizer. The most common methods used to determine the coefficients 

are the covariance and the auto-correlation methods. For our implementation, the auto-correlation 

method will be used. The reason is that this method is superior to the covariance method in the 

sense that the roots of the polynomial in the denominator of the above equation is always 

guaranteed to be inside the unit circle; hence the system  ( ) is for sure stable (Awwab, 2013; 
Carlo et al., 2009). The Levinson Durbin algorithm will be used in our simulation to compute the 

required parameters for the auto-correlation method.  

 
https://www.google.iq/search?hl=ar&tbo=p&tbm=bks&q=inauthor:%22John+R.+Deller%22


Al-Qadisiyah Journal For Engineering Sciences,         Vol. 8……No. 4 ….2015 
 

565 
 

3. THE PROPOSED MODIFICATION ON VELPC MODEL 

The classic approach to analyze human speech based on LPC showed poor sound quality, the voice 

excitation is the weakest portion of this method (Yi and Philipos, 2008). Therefore, voice-excited 

linear predictive coding (VELPC) is one approach to get better sound quality. A system of this type 

has been studied by (Thomas and Abeer, 2011). Figure (2) shows a block diagram of VELPC with 

excitation detector. The proposed modification on the model shown in Figure (2) and will be made 

in simulation is to use a pre-emphasis filter. It is used to make the spectrum as flat as possible by 

boosting the high frequencies in order to get a better result for estimation of the predictor 

parameters. Obviously, the predictor coefficients corresponding to higher frequencies can be better 

estimated. This kind of treatment is within the reconstruction part of the speech signal. 

The input speech signal that is divided over finite duration of times (frames) is filtered by the 

estimated transfer function of linear predictive coding analyzer. The output of the analyzer is called 

the residual (error signal) that is sent with the predictor coefficients to the receiver. Consequently, a 

very good speech quality can be achieved. However, the trade off paid of this system is a high bit 

rate; therefore, one solution to reduce the bit rate to16 kbits/sec is to use Discrete Cosine Transform 

(DCT) to the residual error. The fact behind the use of DCT is only the low frequencies of the 

residual signal are needed in order to maintain a good reconstruction of the excitation. The DCT 

concentrates most of the signal energy in the first few coefficients that will be then sent to achieve a 

high compression rate. Another process that will be executed in our simulations is shown that those 

DCT coefficients could be quantized using 4, 6 and 8 bits instead of 16 bits which is the original 

representation. The quantized process is based on the partial reflection coefficients (PRC), which are 

the average values during the calculation of the well-known Levinson-Durbin recursion. Finally, the 

receiver simply performs an inverse DCT and uses the resulting signal to excite the voice. From 

equation (1) and from the concept of speech production model, where current speech sample  [ ] is 
approximated as a linear combination of past samples: 

 
 [ ]  ∑    [   ]
 
      [ ]                                                                                     (2) 

 
where  [ ] is voiced or unvoiced sounds and   is sample index. A linear predictor with prediction 
coefficients    is define as a system whose output is 
 

 ̃[ ]  ∑   
 
    [   ]                                                                                                   (3) 

 
The prediction error (excitation) is the difference between the observed and predicted signals, and it 

is assumed to be independent and identically distributed process (i.i.d) (John, 1975). 

 
 [ ]   [ ]  ̃[ ]                                                                                                            (4) 
 

Substituting (3) into (4); yields 

 
 [ ]   [ ] ∑   
 
    [   ]                                                                                       (5) 

 
Let now suppose that the prediction error filter can be represented as 

 
 ( )    ∑    
   

                                                                                                       (6) 

 
Then 

 
 ( )   ( )  ( )∑    
   

     ( )[  ∑    
   

   ]   ( ) ( )                       (7) 


Al-Qadisiyah Journal For Engineering Sciences,         Vol. 8……No. 4 ….2015 
 

565 
 

If the coefficients of the predictor (  ) converges exactly to (  ), the error becomes 
 

 [ ]    [ ]                                                                                                                     (8) 
 

Or 

 
 ( )    ( )                                                                                                                    (9) 
But from equation (7),  ( )   ( ) ( ), substituting (7) into (9) and reallocate the terms, yields 
 

 ( )   
 ( )

 ( )
                                                                                                                     (10) 

 
From (1) and (10), we can conclude that 

 
 ( )  
 

 ( )
                                                                                                                        (11) 

 
Equation (11) shows that the prediction error filter  ( ) (it is also called the analysis filter) is the 
inverse filter of the system  ( ), the synthesis filter. The optimization problem aims to find an 
estimate of the prediction coefficients from a set of observed real samples such that the prediction 

error is minimized (Stephen and Lieven, 2004). The resulting values are then assumed to be the 

parameters of the system function  ( ) which will be then used for synthesizing speech segments. 
To minimize the error, let [          ] set of past values of speech signal  [ ] are given; where 
  is the order of prediction error filter and equation (3) can be written as 
 

 ̃[ ]  ∑   
 
    [   ]     [   ]    [   ]        [   ]                   (12) 

 
 ̃                    ∑     
 
                                                                       (13) 

 
According to (13), the prediction error is now written as: 

 
       ̃    ∑     
 
                                                                                                (14) 

 
To find the predictor coefficients, the first order derivative is taken with respect to the predictor 

coefficient    to the mean squared error and equating the result to zero as in (15). 
 

  { [  ∑     

 
   ](   )}                                                                                (15) 

 
Rearenging (15), yields 

 
 {   }   {  ∑     
 
   }                                                                                                  (16) 

 
Setting           and define covariance      [    
 ], and      [   

 ], (16) will be written 

as 

 
[

   
]  [

          
][

  
]                                                                            (17) 

 
Al-Qadisiyah Journal For Engineering Sciences,         Vol. 8……No. 4 ….2015 
 

565 
 

Equation (17) can be solved using the so-called Levinson-Durbin algorithm (George, 1980). In our 

simulations to the proposed scheme, objective measure of speech quality takes a mathematical 

criterion to analyze the performance and compare the origin with the reconstructed speech signals. 

Segmental Power SNR is calculated by first measure the SNR of each frame, then, take the average 

during the speech and it is defined in equation (18). Also, the mean square error (MSE) is 

calculated and defined by the equation (19). 

 
∑      

∑   (   )    

∑     (   )    

 
                                                                               (18) 

 
and  

 
∑ |   ( )|                                                                                                          (19) 

 
where   is the frame length,   is the number of frames,  (   ) is the original speech of the     
points of the     frame, and    (   ) is the residual error of the     points of the     frame (Colin 
and Rainer, 2011). 

 
4. PERFORMANCE ANALYSIS AND SIMULATION RESULTS 

The speech signals that will be coded are wideband signals. We utilize a modified version of voice-

excited linear predictive coding (MVELPC) to code 2 males with the same phrase sentence 

(Welcome to The University of Babylon). The typical set of parameter values that have been used 

in our simulation results is as follows: the bandwidth of the speech signals is 8 kHz, the sampling 

frequency has to be at 16 kHz with a maximum end-to-end delay of 100 ms and length of each 

frame is 20 ms which results in 320 samples per frame. For perfect reconstruction, the overlapping 

length have to be 10 ms; hence the actual frame length is 30 ms which contains 480 samples, 

resulting 50 frames per second. The bit rate of original speech is at 250 kbps whereas the bit rate of 

synthesized speech is calculated in Table (1). 

Take into consideration the variation on the number of bits in quantization process and then 

multiply the result by 50 (number of frames per second); the bit rates are finally obtained. Figure 

(3) shows the original and the reconstructed speech signals based on MVELP vocoder with 

different quantized representation to the residual error (different No. of bits in quantization 

process). As can be seen from the figure, the reconstructed signal has a lower quality than the 

original signal when 4 bits used since there is a clear difference in shape between them. However, 

the similarity increases when the number of quantizing bits is raised, but does not sound exactly 

like the original speech signal.  

Segmental PSNR and MSE are measured as can be seen in Table (2) and (3) for wave files 

(ahmed.wav) and (ali.wav), respectively. It is obvious from the values estimated that the 

reconstructed signals based on classic LPC have been very noisy since having very low SEGPSNR 

and high MSE. Meaning, the noise is stronger than the actual signal. However, the reconstructed 

speech signal that is based on the MVELPC vocoder sounds far better and its SEGPSNR is good 

enough. The SEGPSNR is increasing while MSE is decreasing when the number of bits is rising 

from 4 to 8 in the quantization process to the residual error whereas maintaining low bit rate, not 

exceeding 16 kbps. The VELPC vocoder is also implemented and the results obtained, as can be 

seen in Table (2) and (3), are superior than both the classic LPC and MVELPC, but it demands a 

very high bit rate. This explains why the proposed system is desirable compared to what is 

achieved from other types of speech coding techniques such as VELPC, waveform and Subband 

Coders that require a very high bit rate for transmission. 


Al-Qadisiyah Journal For Engineering Sciences,         Vol. 8……No. 4 ….2015 
 

565 
 

A frame number 20 is selected from wave file (ahmed.wav), plus 240 points hamming window. 

The frequency response is estimated and plotted of the eight-pole LPC vocoder and MVELPC 

vocoder that is based on a pre-emphasis filter as shown in Figure (4). Also, at the same figure and 

take 512 Fast Fourier Transform (512-FFT), the power spectral density (PSD) of the original 

speech signal is determined and plotted. As can be seen from Figure (4), the spectrum of eight-pole 

MVELPC vocoder is good at peaks and troughs. The first, second, third, fourth and fifth peaks are 

good for fitting the original speech. However, the spectrum of eight-pole LPC vocoder shows poor 

performance for fitting the original speech at the peaks and troughs. Figure (5) illustrates the 

original speech signal with the reconstructed one for frame number 20 that is selected from wave 

file (ahmed.wav). The reconstructed one is shown to be very close and at most match the original 

signal. 

 
5. CONCLUSION 

In this paper, we presented a modified model vocoder that is based on voice-excited LPC to 

compress 2 male speech signals while maintaining low bit rate. The results have been achieved 

from the MVELPC are intelligible and desirable since the coder almost keeps perceptual relevant 

spectral characteristics of the speech signal. Also, high SEGPSNR and low MSE gained compared 

with the values obtained from classic LPC. The tradeoffs between speech quality on one side and 

bandwidth, the bit rate and complexity on the other side have analyzed and clearly appeared here. A 

better quality can be achieved by increasing the bit rate through an increase in bits used to quantize 

the DCT coefficients, causing larger bandwidth have to be used as shown in Table (2) and (3). On 

the other hand, the classic LPC results are much poorer, and they are unintelligible and ineligible. 

Due to the fact that the MVELPC vocoder used gives pretty good results with the entire required 

limitations, particularly bit rate, the model can be more studied and improved.  

 
REFERENCES 

 
[1] Awwab Q. Althahab, 2013 "Performance Analysis of Adaptive Blind Equalization 

Algorithms for Noisy FIR and IIR Channels" M.S. Thesis, University of Colorado. 

 
[2] Carlo Magi, Jouni Pohjalainen, Tom Backstrom and Paavo. Alku, 2009 "Stabilized 

Weighted Linear Prediction" Speech Processing, Vol. 51, No. 5, pp. 401-411. 

 
[3] Colin Breithaupt and Rainer Martin, 2011 "Analysis of the Decision-Directed SNR 

Estimator for Speech Enhancement with Respect to Low-SNR and Transient Conditions" 

IEEE Transactions on Audio, Speech and Language Processing, Vol. 19, No. 2, pp. 277-289. 

 
[4] Daniele Giacobello, et al, 2014 "Stable 1-Norm Error Minimization Based Linear 

Predictors for Speech Modeling" IEEE Transactions on Audio, Speech and Language 

Processing, Vol. 22, No. 5, pp. 912-922. 

 
[5] George Cybenko, 1980 "The Numerical Stability of the Levinson-Durbin Algorithm for 

Toeplitz Systems of Equations" SIAM J. Sci. and Stat. Comput., Vol. 1, Issue 3, pp. 303-319. 

 
[6] Jingyun Xu, Xiaoqun Zhao, Qiao Wang and Digang Wang, 2014 "Linear Prediction 

Analysis of Speech Signal Based on    Norm" journal of Computational Information Systems, 
Vol. 10, No. 17, pp. 7553-7560. 

 
Al-Qadisiyah Journal For Engineering Sciences,         Vol. 8……No. 4 ….2015 
 

565 
 

[7] John Makhoul, 1975 "Linear Prediction: A Tutorial Review" Proc. IEEE, Vol. 63, No. 4, 

pp. 561-580. 

 
[8] John R. Deller, John G. Proakis and John H. L. Hansen, 1999 "Discrete-Time Processing of 

Speech Signals" Wiley-IEEE Press. 

 
[9] Lawrence Rabiner and Ronald W. Schafer, 2009 "Theory and Application of Digital 

Speech Processing" Prentice-Hall Inc. 

 
[10] Marcelo S. Alencar and Valdemar C. da Rocha Jr, 2005 "Communication Systems" 

Springer. 

 
[11] Minal Mulye and Sonal K. Jagtap, 2014 "Speech Compression using Analysis by 

Synthesis" IJECCE., Vol. 5, Issue 4, pp. 275-280. 

 
[12] Stephen Boyd and Lieven Vandenberghe, 2004 "Convex Optimization" Cambridge 

University Press. 

 
[13] Thomas Drugman and Abeer Alwan, 2011 "Joint Robust Voicing Detection and Pitch 

Estimation Based on Residual Harmonics" Proc. Interspeech, pp. 1973-1976. 

  
[14] Yi Hu and Philipos C. Loizou, 2008 ''Evaluation of Objective Quality Measures for 

Speech Enhancement'' IEEE Trans. On Speech, Audio and Language Processing, Vol. 16, No. 

1, pp. 229-238. 

 
[15] Yugandhar Dasari and K. Satyapriya, 2013 "Performance Analysis of Speech Coding 

Techniques" IJAREEIE., Vol. 2, Issue 11, pp. 5725-5732. 

 
Table (1): Bit rate calculation for the proposed model including DCT. 

 
Parameters Number of bits for each frame 

Predictor coefficients=8 8*8=64 bits 

Number of DCT coefficients=30 30*(number of bits in quantizing process 

of residual error) 

Gain of the predictor 5 bits 

Total number of bits for each frame ? 

 
https://www.google.iq/search?hl=ar&tbo=p&tbm=bks&q=inauthor:%22John+R.+Deller%22
https://www.google.iq/search?hl=ar&tbo=p&tbm=bks&q=inauthor:%22John+G.+Proakis%22
https://www.google.iq/search?hl=ar&tbo=p&tbm=bks&q=inauthor:%22John+H.+L.+Hansen%22


Al-Qadisiyah Journal For Engineering Sciences,         Vol. 8……No. 4 ….2015 
 

566 
 

Table (2): Simulation results for speech file (ahmed.wav). 

 
Type of 

vocoder 

No. of 

predictor 

coefficients 

No. of bits in 

quantization 

process of 

residual error 

SEGPSNR 

in dB 

MSE Bit rate based 

on Table (1) 

(bits/sec) 

Modified 

version of 

VELPC with 

30 DCT 

coefficients 

8 4 2.9403 0.0141 9450 

8 6 4.1106 0.0084 12450 

8 8 5.0021 0.0077 15450 

VELPC 8 --- 10.8479 0.0026 195450 

Classic LPC 8 --- 1.2687 0.1159 3800 

 
Table (3): Simulation results for speech file (ali.wav). 

 
Type of 

vocoder 

No. of 

predictor 

coefficients 

No. of bits in 

quantization 

process of 

residual error 

SEGPSNR 

in dB 

MSE Bit rate based 

on Table(1) 

(bits/sec) 

Modified 

version of 

VELPC with 

30 DCT 

coefficients 

8 4 2.2512 0.0164 9450 

8 6 4.1547 0.0081 12450 

8 8 5.0952 0.0074 15450 

VELPC 8 --- 12.6146 0.0014 195450 

Classic LPC 8 --- 1.2266 0.1564 3800 

 
                Voiced 

                ( )                 ( ) 

                 Unvoiced            Speech Signal 

 
Figure (1): Simple speech production system 

 
Periodic 

impulse train 

Random 

White Noise 

 ( ) 


Al-Qadisiyah Journal For Engineering Sciences,         Vol. 8……No. 4 ….2015 
 

565 
 

0 1 2 3 4 5

x 10
4

-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

sample number

a
m

p
li
tu

d
e

the first original speech sound for (ahmed.wav)

0 1 2 3 4 5

x 10
4

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

sample number

a
m

p
li
tu

d
e

the MVELPC, reconstructed speech with 4 bits quantizing process

0 1 2 3 4 5

x 10
4

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

sample number

a
m

p
li
tu

d
e

the MVELPC, reconstructed speech with 6 bits quantizing process

0 1 2 3 4 5

x 10
4

-1

-0.5

0

0.5

1

sample number

a
m

p
li
tu

d
e

the MVELPC, reconstructed speech with 8 bits quantizing process

 
 ( )                                  ( ) 
  

           Channel 

 
Figure (2): A voice-excited LPC system. 

 
Figure (3): Waveform of the sentence "Welcome to the University of Babylon", the original and 

reconstructed speech signals. 

 
LPC 

Analyzer 

Coder 
LPC 

Synthesizer 

Excitation 

Detector 

Decoder 


Al-Qadisiyah Journal For Engineering Sciences,         Vol. 8……No. 4 ….2015 
 

565 
 

50 100 150 200 250 300 350 400 450
-4

-2

0

2

4
 Input signal and error signal( frame : 20 )

 Samples [ n ] 

 A
m

p
li
tu

d
e
 

 Input signal

 Input signal*hamming

Error signal

0 1000 2000 3000 4000 5000 6000 7000 8000

-80

-60

-40

-20

0

FFT of input signal, frequency reponse of LPC and MVELPC vocoder ( frame : 20 )

 Frequency [Hz ] 

 A
m

p
li
tu

d
e
 [

 d
B

] 

 
FFT(fftpoints:512)

MVELP vocoder(order:8)

LPC vocoder(order:8)

0.475 0.48 0.485 0.49 0.495 0.5

-1.5

-1

-0.5

0

0.5

1

1.5

2

 Reconstructed signal and Input Signal - ( frame : 20 of ahmed.wav )

Time [ s ]

 A
m

p
li
tu

d
e

 
Signal Reconstructed

Input Signal

 
Figure (4): PSD of input speech, frequency response of LPC and MVELPC vocoder. 

 
Figure (5): Original and reconstructed speech signal (frame No. 20 of ahmed.wav).