Al-Qadisiyah Journal For Engineering Sciences, Vol. 8……No. 4 ….2015 555 IMPROVE THE PERFORMANCE OF VOICE-EXCITED LPC VOCODER IN SPEECH CODING APPLICATION Assistant Lecturer: Awwab Qasim Jumaah Althahab Assistant Lecturer: Ahmed Hussein Shatti Alisawi Electrical Engineering Department, Collage of Engineering, University of Babylon Email: eng.awwab.qasim@uobabylon.edu.iq Email: Eng_Ahmed.Shatti2014@yahoo.com Received 3 August 2015 Accepted 21 October 2015 ABSTRACT One of the fundamental problems in the area of digital speech processing is a speech coding that has been studied for years. Speech coding simply transforms the speech signals as fewer numbers of binary digits as possible, which can be then transmitted through channels or stored in memory devices. Due to the fact that the bandwidth of the channels is not unlimited, speech compression is needed to let more space bandwidth; thereby more speech coded signals can be sent over same channel bandwidth. Linear Predictive Coding (LPC) that is based on linear prediction (LP) model, which is a method to represent and analyze human speech, is one of the most common speech coding techniques. It is used in compression the digital speech signals, resulting low bit rate. This method has become the dominant technique for determine the fundamental speech parameters such as pitch, formants, spectra, vocal tract area functions. However, the weakness of LPC is in estimating the fundamental speech parameters causes poor voice quality and performance. The aim of this paper is to build a system with precise detection of speech parameters for encoding a better speech quality at low bit rate. This can be done through proposing a modified version to the voice- excited LPC vocoder based on Discrete Cosine Transform (DCT) and quantization of residual error while retaining low bit rate; hence conserve the bandwidth. Segmental power signal to noise ratio (SEGPSNR) and mean square error (MSE) as an objective measure for speech signal quality are implemented for the proposed improvement through computer simulation using Matlab 11. Keywords: Voice-Excited LPC vocoder, Levinson-Durbin recursion, IIR filters, Discrete Cosine Transform (DCT), Mean square error (MSE). اداء الترميز الصوتي التنبؤي الخطي المثار في تطبيقات تشفير الكالم تحسين العيساوي مدرس مساعد: اواب قاسم جمعة الذهب مدرس مساعد: احمد حسين شاطي قسم الهندسه الكهربائيه, كليه الهندسه, جامعه بابل الخالصة لكالميمكن تعريف تشفير ا. الذي درس لسنوات تشفير الصوت الرقمي هوالكالم واحدة من القضايا الرئيسية في مجال معالجة الى اقل عدد ممكن من االرقام الثنائية والتي تنقل بعدها عبر القنوات او تخزن في اجهزة لكالمتحويل اشارات ا على انه ببساطة mailto:awwab.qasim@uobabylon.edu.iq mailto:Eng_Ahmed.Shatti2014@yahoo.com Al-Qadisiyah Journal For Engineering Sciences, Vol. 8……No. 4 ….2015 555 لكي نسمح فراغات اكثر في كالمالذاكرة. بسبب الحقيقة بان عرض الحزمة للقنوات الناقلة لالشارة محدودة, لذلك نحتاج ضغط ال (، LPCالترميز الخطي التنبئي )له عبر نفس قناة النطاق الترددي. النطاق الترددي وبالتالي المزيد من الكالم المشفر ممكن ارسا الصوت إشارات ضغط في تستخدم والتي شيوعا كالماكثر تقنيات تشفير ال هو واحد من، كالم البشر وتحليل لتمثيل الذي هو طريقة االساسية مثال كالمحسابات خصائص الهذه الطريقة اصبحت تقنية سائدة في الترميز. معدل بت انخفاض، مما ينتج عن الرقمية ( نوع ضياعي من LPC)ولكن الضعف في حسابات خصائص الكالم تجعل النغمة, االصوات, الطيف ودوال الجهاز الصوتي. دقيق لخصائص الكالم الكشف الانواع الترميز ويسبب ضعف في اداء وجودة الصوت. الهدف من هذا العمل هو بناء نظام مع نسخة محسنة من الصوت المثار قتراحالترميز. ويمكن القيام بذلك من خالل ا معدل بت جودة كالم في انخفاضلترميز افضل بت انخفاض معدل مع الحفاظ على المتبقي الخطأ و تكميم (DCT)باالعتماد على تحويلة (VELPC)للترميز التبؤي الخطي ومتوسط مربع االخطاء (SEGPSNR)بة قدرة االشارة الى الضوضاء عرض النطاق الترددي. نس الحفاظ على وبالتالي ,الترميز (MSE) 11باستخدام ماتالب الكمبيوتر محاكاة من خاللالمقترح تحسينللنفذت كمقياس موضوعي لجودة اشارة الكالم . الكلمات المفتاحية: الترميز الصوتي التنبؤي الخطي المثار, المرشحات ذو االستجابة الترددية االنهائية, تحويلة .(MSE) , متوسط مربع االخطاء 1. INTRODUCTION The fundamental purpose of signal compression techniques is to reduce the number of bits required to represent a signal (speech, audio, image and video) while keeping an acceptable signal quality for the purpose of reaching the communication system target of low bit rate transmission or message encryption (Marcelo and Valdemar, 2005). Speech coding is the process of transforming the speech signal at hand, to a more compact form, which can then be transmitted with a considerably smaller memory. The motivation behind this is the fact that access to unlimited amount of bandwidth is not possible. Therefore, there is a need to code and compress speech signals. For example, in digital cellular technology, many users need to share the same frequency bandwidth. Utilizing speech compression makes it possible for more users to share the available system. Another example where speech compression needed is in digital voice storage. For a fixed amount of available memory, compression makes it possible to store longer messages. One of the most powerful speech analysis techniques is the method of linear predictive analysis that is uses (LPC) vocoder (Daniele et al., 2014). The speech features such as pitch, formants, spectra and vocal tract transfer function can be all estimated using this type of technique. The good extracted parameters result good reconstructing the speech signal and then more intelligible speech. The weakest link in most LPC vocoders is estimations and representations of the excitation functions (especially the pitch period of excitation signal) (Marcelo and Valdemar, 2005). In this vein, (Yugandhar and Satyapriya, 2013) presented and implemented three coding techniques (LPC, Waveform and sub-band coding), checking their performance measures such as compression ratio and speech audible quality. In (Minal and Sonal, 2014), the authors proposed a system to implement a model based design by using the linear prediction coefficients of the encoded speech data and prove to be the promising method for speech compression. (Jingyun et al., 2014) presented a linear prediction model that is based on first order norm. They proposed a method based on linear programming to calculate the parameters of the model and analyze the performance of the first order norm. A great challenge in digital coding of signals is the development of methods for assessing the quality of reconstructed signals. In this paper, for more accurate estimation of speech parameters, a modification on the voice-excited LPC vocoder based on Discrete Cosine Transform (DCT) is proposed for coding two male wideband speech signals. The measure used for assessing the quality of signals may be classified into two general groups: subjective quality and objective quality measures. For the purpose of this paper, the speech coder developed is evaluated using the objective measure, which is based on a direct mathematical comparison between the original and (DCT) Al-Qadisiyah Journal For Engineering Sciences, Vol. 8……No. 4 ….2015 565 processed signals. The objective analysis that will be performed consists of computing segmental power signal to noise ratio (SEGPSNR) and mean square error (MSE) between the original and the coded speech signals. Furthermore, the effect of quantizing the residual error on bit rate is studied. The remaining of the paper is organized as follows: Section 2 demonstrates the speech production model and how speech can be represented as the output of a linear time varying system. In section 3, the proposed modification to find the optimum estimation of speech parameters is discussed and presented. In section 4, the performance of the proposed approach is evaluated using computer simulation in MATLAB 11. The paper is finished with some concluded remarks. 2. SPEECH PRODUCTION Figure (1) depicts the simple speech production model. Speech is produced when an air being pushed from the lungs through the vocal tract, and out through the mouth to generate speech. In this type of description the lungs can be thought of as the source of the sound and the vocal tract can be thought of as a filter that produces the various types of sounds that make up speech (Lawrence and Ronald, 2009). Speech signals consist of several sequences of sounds, which can be classified into voiced and unvoiced. The fundamental difference between these two types of speech sounds comes from the way they are produced. Voiced sounds are produced by vibrating the vocal cords due to the air comes from the lungs. The rate at which the vocal cords vibrate dictates the pitch of the sound. However, unvoiced sounds do not rely on the vibration of the vocal cords. The unvoiced sounds are created by the constriction of the vocal tract that is modeled as a linear all pole filter (infinite impulse response filter). The vocal cords remain open, and the constrictions of the vocal tract force air out to produce the unvoiced sounds (Lawrence and Ronald, 2009). Now, speech can be modeled as the output of a linear time varying system (IIR filter), excited by either quasi- periodic impulse train or white noise to generate various components of speech. During the production of a given speech signal, the encoding process of LPC analyzer uses to successfully predict and estimate a set of accurate parameters for modeling the vocal tract (all pole filter). The predictor parameters determined by minimizing the residual error, which is the sum of the squared differences between the actual speech signal and the linearly predicted one over frames of a finite duration, which is normally 20 ms long (John et al., 1999). Only the predictor coefficients and residual error are sent instead of sending original speech signals. Decoding process involves using the error and predicted parameters received to build a synthesized version of the original speech signal. The transfer function of the time-varying digital filter is given by (Lawrence and Ronald, 2009). ( ) ∑ ( ) ( ) (1) where are the gain, order and parameters of IIR filter. Only the first coefficients are transmitted to the LPC synthesizer. The most common methods used to determine the coefficients are the covariance and the auto-correlation methods. For our implementation, the auto-correlation method will be used. The reason is that this method is superior to the covariance method in the sense that the roots of the polynomial in the denominator of the above equation is always guaranteed to be inside the unit circle; hence the system ( ) is for sure stable (Awwab, 2013; Carlo et al., 2009). The Levinson Durbin algorithm will be used in our simulation to compute the required parameters for the auto-correlation method. https://www.google.iq/search?hl=ar&tbo=p&tbm=bks&q=inauthor:%22John+R.+Deller%22 Al-Qadisiyah Journal For Engineering Sciences, Vol. 8……No. 4 ….2015 565 3. THE PROPOSED MODIFICATION ON VELPC MODEL The classic approach to analyze human speech based on LPC showed poor sound quality, the voice excitation is the weakest portion of this method (Yi and Philipos, 2008). Therefore, voice-excited linear predictive coding (VELPC) is one approach to get better sound quality. A system of this type has been studied by (Thomas and Abeer, 2011). Figure (2) shows a block diagram of VELPC with excitation detector. The proposed modification on the model shown in Figure (2) and will be made in simulation is to use a pre-emphasis filter. It is used to make the spectrum as flat as possible by boosting the high frequencies in order to get a better result for estimation of the predictor parameters. Obviously, the predictor coefficients corresponding to higher frequencies can be better estimated. This kind of treatment is within the reconstruction part of the speech signal. The input speech signal that is divided over finite duration of times (frames) is filtered by the estimated transfer function of linear predictive coding analyzer. The output of the analyzer is called the residual (error signal) that is sent with the predictor coefficients to the receiver. Consequently, a very good speech quality can be achieved. However, the trade off paid of this system is a high bit rate; therefore, one solution to reduce the bit rate to16 kbits/sec is to use Discrete Cosine Transform (DCT) to the residual error. The fact behind the use of DCT is only the low frequencies of the residual signal are needed in order to maintain a good reconstruction of the excitation. The DCT concentrates most of the signal energy in the first few coefficients that will be then sent to achieve a high compression rate. Another process that will be executed in our simulations is shown that those DCT coefficients could be quantized using 4, 6 and 8 bits instead of 16 bits which is the original representation. The quantized process is based on the partial reflection coefficients (PRC), which are the average values during the calculation of the well-known Levinson-Durbin recursion. Finally, the receiver simply performs an inverse DCT and uses the resulting signal to excite the voice. From equation (1) and from the concept of speech production model, where current speech sample [ ] is approximated as a linear combination of past samples: [ ] ∑ [ ] [ ] (2) where [ ] is voiced or unvoiced sounds and is sample index. A linear predictor with prediction coefficients is define as a system whose output is ̃[ ] ∑ [ ] (3) The prediction error (excitation) is the difference between the observed and predicted signals, and it is assumed to be independent and identically distributed process (i.i.d) (John, 1975). [ ] [ ] ̃[ ] (4) Substituting (3) into (4); yields [ ] [ ] ∑ [ ] (5) Let now suppose that the prediction error filter can be represented as ( ) ∑ (6) Then ( ) ( ) ( )∑ ( )[ ∑ ] ( ) ( ) (7) Al-Qadisiyah Journal For Engineering Sciences, Vol. 8……No. 4 ….2015 565 If the coefficients of the predictor ( ) converges exactly to ( ), the error becomes [ ] [ ] (8) Or ( ) ( ) (9) But from equation (7), ( ) ( ) ( ), substituting (7) into (9) and reallocate the terms, yields ( ) ( ) ( ) (10) From (1) and (10), we can conclude that ( ) ( ) (11) Equation (11) shows that the prediction error filter ( ) (it is also called the analysis filter) is the inverse filter of the system ( ), the synthesis filter. The optimization problem aims to find an estimate of the prediction coefficients from a set of observed real samples such that the prediction error is minimized (Stephen and Lieven, 2004). The resulting values are then assumed to be the parameters of the system function ( ) which will be then used for synthesizing speech segments. To minimize the error, let [ ] set of past values of speech signal [ ] are given; where is the order of prediction error filter and equation (3) can be written as ̃[ ] ∑ [ ] [ ] [ ] [ ] (12) ̃ ∑ (13) According to (13), the prediction error is now written as: ̃ ∑ (14) To find the predictor coefficients, the first order derivative is taken with respect to the predictor coefficient to the mean squared error and equating the result to zero as in (15). { [ ∑ ]( )} (15) Rearenging (15), yields { } { ∑ } (16) Setting and define covariance [ ], and [ ], (16) will be written as [ ] [ ][ ] (17) Al-Qadisiyah Journal For Engineering Sciences, Vol. 8……No. 4 ….2015 565 Equation (17) can be solved using the so-called Levinson-Durbin algorithm (George, 1980). In our simulations to the proposed scheme, objective measure of speech quality takes a mathematical criterion to analyze the performance and compare the origin with the reconstructed speech signals. Segmental Power SNR is calculated by first measure the SNR of each frame, then, take the average during the speech and it is defined in equation (18). Also, the mean square error (MSE) is calculated and defined by the equation (19). ∑ ∑ ( ) ∑ ( ) (18) and ∑ | ( )| (19) where is the frame length, is the number of frames, ( ) is the original speech of the points of the frame, and ( ) is the residual error of the points of the frame (Colin and Rainer, 2011). 4. PERFORMANCE ANALYSIS AND SIMULATION RESULTS The speech signals that will be coded are wideband signals. We utilize a modified version of voice- excited linear predictive coding (MVELPC) to code 2 males with the same phrase sentence (Welcome to The University of Babylon). The typical set of parameter values that have been used in our simulation results is as follows: the bandwidth of the speech signals is 8 kHz, the sampling frequency has to be at 16 kHz with a maximum end-to-end delay of 100 ms and length of each frame is 20 ms which results in 320 samples per frame. For perfect reconstruction, the overlapping length have to be 10 ms; hence the actual frame length is 30 ms which contains 480 samples, resulting 50 frames per second. The bit rate of original speech is at 250 kbps whereas the bit rate of synthesized speech is calculated in Table (1). Take into consideration the variation on the number of bits in quantization process and then multiply the result by 50 (number of frames per second); the bit rates are finally obtained. Figure (3) shows the original and the reconstructed speech signals based on MVELP vocoder with different quantized representation to the residual error (different No. of bits in quantization process). As can be seen from the figure, the reconstructed signal has a lower quality than the original signal when 4 bits used since there is a clear difference in shape between them. However, the similarity increases when the number of quantizing bits is raised, but does not sound exactly like the original speech signal. Segmental PSNR and MSE are measured as can be seen in Table (2) and (3) for wave files (ahmed.wav) and (ali.wav), respectively. It is obvious from the values estimated that the reconstructed signals based on classic LPC have been very noisy since having very low SEGPSNR and high MSE. Meaning, the noise is stronger than the actual signal. However, the reconstructed speech signal that is based on the MVELPC vocoder sounds far better and its SEGPSNR is good enough. The SEGPSNR is increasing while MSE is decreasing when the number of bits is rising from 4 to 8 in the quantization process to the residual error whereas maintaining low bit rate, not exceeding 16 kbps. The VELPC vocoder is also implemented and the results obtained, as can be seen in Table (2) and (3), are superior than both the classic LPC and MVELPC, but it demands a very high bit rate. This explains why the proposed system is desirable compared to what is achieved from other types of speech coding techniques such as VELPC, waveform and Subband Coders that require a very high bit rate for transmission. Al-Qadisiyah Journal For Engineering Sciences, Vol. 8……No. 4 ….2015 565 A frame number 20 is selected from wave file (ahmed.wav), plus 240 points hamming window. The frequency response is estimated and plotted of the eight-pole LPC vocoder and MVELPC vocoder that is based on a pre-emphasis filter as shown in Figure (4). Also, at the same figure and take 512 Fast Fourier Transform (512-FFT), the power spectral density (PSD) of the original speech signal is determined and plotted. As can be seen from Figure (4), the spectrum of eight-pole MVELPC vocoder is good at peaks and troughs. The first, second, third, fourth and fifth peaks are good for fitting the original speech. However, the spectrum of eight-pole LPC vocoder shows poor performance for fitting the original speech at the peaks and troughs. Figure (5) illustrates the original speech signal with the reconstructed one for frame number 20 that is selected from wave file (ahmed.wav). The reconstructed one is shown to be very close and at most match the original signal. 5. CONCLUSION In this paper, we presented a modified model vocoder that is based on voice-excited LPC to compress 2 male speech signals while maintaining low bit rate. The results have been achieved from the MVELPC are intelligible and desirable since the coder almost keeps perceptual relevant spectral characteristics of the speech signal. Also, high SEGPSNR and low MSE gained compared with the values obtained from classic LPC. The tradeoffs between speech quality on one side and bandwidth, the bit rate and complexity on the other side have analyzed and clearly appeared here. A better quality can be achieved by increasing the bit rate through an increase in bits used to quantize the DCT coefficients, causing larger bandwidth have to be used as shown in Table (2) and (3). On the other hand, the classic LPC results are much poorer, and they are unintelligible and ineligible. Due to the fact that the MVELPC vocoder used gives pretty good results with the entire required limitations, particularly bit rate, the model can be more studied and improved. REFERENCES [1] Awwab Q. Althahab, 2013 "Performance Analysis of Adaptive Blind Equalization Algorithms for Noisy FIR and IIR Channels" M.S. Thesis, University of Colorado. [2] Carlo Magi, Jouni Pohjalainen, Tom Backstrom and Paavo. Alku, 2009 "Stabilized Weighted Linear Prediction" Speech Processing, Vol. 51, No. 5, pp. 401-411. [3] Colin Breithaupt and Rainer Martin, 2011 "Analysis of the Decision-Directed SNR Estimator for Speech Enhancement with Respect to Low-SNR and Transient Conditions" IEEE Transactions on Audio, Speech and Language Processing, Vol. 19, No. 2, pp. 277-289. [4] Daniele Giacobello, et al, 2014 "Stable 1-Norm Error Minimization Based Linear Predictors for Speech Modeling" IEEE Transactions on Audio, Speech and Language Processing, Vol. 22, No. 5, pp. 912-922. [5] George Cybenko, 1980 "The Numerical Stability of the Levinson-Durbin Algorithm for Toeplitz Systems of Equations" SIAM J. Sci. and Stat. Comput., Vol. 1, Issue 3, pp. 303-319. [6] Jingyun Xu, Xiaoqun Zhao, Qiao Wang and Digang Wang, 2014 "Linear Prediction Analysis of Speech Signal Based on Norm" journal of Computational Information Systems, Vol. 10, No. 17, pp. 7553-7560. Al-Qadisiyah Journal For Engineering Sciences, Vol. 8……No. 4 ….2015 565 [7] John Makhoul, 1975 "Linear Prediction: A Tutorial Review" Proc. IEEE, Vol. 63, No. 4, pp. 561-580. [8] John R. Deller, John G. Proakis and John H. L. Hansen, 1999 "Discrete-Time Processing of Speech Signals" Wiley-IEEE Press. [9] Lawrence Rabiner and Ronald W. Schafer, 2009 "Theory and Application of Digital Speech Processing" Prentice-Hall Inc. [10] Marcelo S. Alencar and Valdemar C. da Rocha Jr, 2005 "Communication Systems" Springer. [11] Minal Mulye and Sonal K. Jagtap, 2014 "Speech Compression using Analysis by Synthesis" IJECCE., Vol. 5, Issue 4, pp. 275-280. [12] Stephen Boyd and Lieven Vandenberghe, 2004 "Convex Optimization" Cambridge University Press. [13] Thomas Drugman and Abeer Alwan, 2011 "Joint Robust Voicing Detection and Pitch Estimation Based on Residual Harmonics" Proc. Interspeech, pp. 1973-1976. [14] Yi Hu and Philipos C. Loizou, 2008 ''Evaluation of Objective Quality Measures for Speech Enhancement'' IEEE Trans. On Speech, Audio and Language Processing, Vol. 16, No. 1, pp. 229-238. [15] Yugandhar Dasari and K. Satyapriya, 2013 "Performance Analysis of Speech Coding Techniques" IJAREEIE., Vol. 2, Issue 11, pp. 5725-5732. Table (1): Bit rate calculation for the proposed model including DCT. Parameters Number of bits for each frame Predictor coefficients=8 8*8=64 bits Number of DCT coefficients=30 30*(number of bits in quantizing process of residual error) Gain of the predictor 5 bits Total number of bits for each frame ? https://www.google.iq/search?hl=ar&tbo=p&tbm=bks&q=inauthor:%22John+R.+Deller%22 https://www.google.iq/search?hl=ar&tbo=p&tbm=bks&q=inauthor:%22John+G.+Proakis%22 https://www.google.iq/search?hl=ar&tbo=p&tbm=bks&q=inauthor:%22John+H.+L.+Hansen%22 Al-Qadisiyah Journal For Engineering Sciences, Vol. 8……No. 4 ….2015 566 Table (2): Simulation results for speech file (ahmed.wav). Type of vocoder No. of predictor coefficients No. of bits in quantization process of residual error SEGPSNR in dB MSE Bit rate based on Table (1) (bits/sec) Modified version of VELPC with 30 DCT coefficients 8 4 2.9403 0.0141 9450 8 6 4.1106 0.0084 12450 8 8 5.0021 0.0077 15450 VELPC 8 --- 10.8479 0.0026 195450 Classic LPC 8 --- 1.2687 0.1159 3800 Table (3): Simulation results for speech file (ali.wav). Type of vocoder No. of predictor coefficients No. of bits in quantization process of residual error SEGPSNR in dB MSE Bit rate based on Table(1) (bits/sec) Modified version of VELPC with 30 DCT coefficients 8 4 2.2512 0.0164 9450 8 6 4.1547 0.0081 12450 8 8 5.0952 0.0074 15450 VELPC 8 --- 12.6146 0.0014 195450 Classic LPC 8 --- 1.2266 0.1564 3800 Voiced ( ) ( ) Unvoiced Speech Signal Figure (1): Simple speech production system Periodic impulse train Random White Noise ( ) Al-Qadisiyah Journal For Engineering Sciences, Vol. 8……No. 4 ….2015 565 0 1 2 3 4 5 x 10 4 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 sample number a m p li tu d e the first original speech sound for (ahmed.wav) 0 1 2 3 4 5 x 10 4 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 sample number a m p li tu d e the MVELPC, reconstructed speech with 4 bits quantizing process 0 1 2 3 4 5 x 10 4 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 sample number a m p li tu d e the MVELPC, reconstructed speech with 6 bits quantizing process 0 1 2 3 4 5 x 10 4 -1 -0.5 0 0.5 1 sample number a m p li tu d e the MVELPC, reconstructed speech with 8 bits quantizing process ( ) ( ) Channel Figure (2): A voice-excited LPC system. Figure (3): Waveform of the sentence "Welcome to the University of Babylon", the original and reconstructed speech signals. LPC Analyzer Coder LPC Synthesizer Excitation Detector Decoder Al-Qadisiyah Journal For Engineering Sciences, Vol. 8……No. 4 ….2015 565 50 100 150 200 250 300 350 400 450 -4 -2 0 2 4 Input signal and error signal( frame : 20 ) Samples [ n ] A m p li tu d e Input signal Input signal*hamming Error signal 0 1000 2000 3000 4000 5000 6000 7000 8000 -80 -60 -40 -20 0 FFT of input signal, frequency reponse of LPC and MVELPC vocoder ( frame : 20 ) Frequency [Hz ] A m p li tu d e [ d B ] FFT(fftpoints:512) MVELP vocoder(order:8) LPC vocoder(order:8) 0.475 0.48 0.485 0.49 0.495 0.5 -1.5 -1 -0.5 0 0.5 1 1.5 2 Reconstructed signal and Input Signal - ( frame : 20 of ahmed.wav ) Time [ s ] A m p li tu d e Signal Reconstructed Input Signal Figure (4): PSD of input speech, frequency response of LPC and MVELPC vocoder. Figure (5): Original and reconstructed speech signal (frame No. 20 of ahmed.wav).