Final Tarik Zeyad /Al-khwarizmi Engineering Journal ,vol.1, no. 1,PP 52-60 (2005) 52 Al-khwarizmi Engineering Journal Al-Khwarizmi Engineering Journal, vol.1, no.1,pp 52-60, (2005) Speech Signal Compression Using Wavelet And Linear Predictive Coding Dr. Tarik Zeyad Ahlam Hanoon Electrical Engineering Dept. / College of Engineering / University of Baghdad Abstract A new algorithm is proposed to compress speech signals using wavelet transform and linear predictive coding. Signal compression based on the concept of selecting a small number of approximation coefficients after they are compressed by the wavelet decomposition (Haar and db4) at a suitable chosen level and ignored details coefficients, and then approximation coefficients are windowed by a rectangular window and fed to the linear predictor. Levinson Durbin algorithm is used to compute LP coefficients, reflection coefficients and predictor error. The compress files contain LP coefficients and previous sample. These files are very small in size compared to the size of the original signals. Compression ratio is calculated from the size of the compressed signal relative to the size of the uncompressed signal. The proposed algorithms where fulfilled with the use of Matlab package. Keyword: Wavelet, Speech Coding, Linear Predictive Coding, Levinson Durbin. 1. Introduction Compression can be achieved by reducing redundancy. It is the process of reduction in the amount of signal space that must be allocated to a given message set or data sample set. This signal space may be in a physical volume, such as data storage medium: an interval of time, such as the time required to transmit a given message set [1][2]. Bit rate reduction or data reduction are all terms which mean basically the same thing it means that the same information is carried using smaller quantity or rate of data.[3][4] Compression is the process of converting an input data stream [The source stream or the original raw data] into another data stream [The output or the compressed data] that has a smaller size. 2. The Proposed Speech Signal Compression Algorithm The proposed compression system divided into four headlines: 2.1 Preprocessing Part: • Acquisition of the received signal in an IF mode, in our work the signals are recorded using microphone in different places and conditions, so a number of signals are recorded. • Using Haar and db4 wavelet decomposition algorithm to decompose the received speech signal. • After that the decomposed frames are segmented into 256 samples per segment. Accordingly the time or frequency variations are viewed in small intervals. By this way the processing will be more accurate. • Now each frame is multiplied by finite – duration window. This process is called windowing. Rectangular window having duration of one-pitch periods. This produces output spectrum very close to that of the vocal tract impulse response. The equation of a rectangular window is:       −<< = otherwise Nn nr 0 101 )(ω (1) where ωr(n) is the signal samples inside the window. Now the data is ready to an efficient feature extraction part. Tarik Zeyad /Al-khwarizmi Engineering Journal ,vol.1, no. 1,PP 52-60 (2005) 53 2.2 The Feature Extraction Part: The feature extracting helps to reduce noise as well as the signal component redundant to the process of compression. Discrete wavelet transform is more popular in the field of digital signal processing. To parameterize the speech signal it is first decomposed in a dyadic form. The decomposition processed only on the low frequency branches, which is the approximation coefficients since it is more intelligible part as well as most of the information in this part, while the details coefficients are ignored since they contain noise. All data operations can be performed using just the corresponding wavelet coefficients and hence approximation coefficients fed to the next stage (linear predictor stage). 2.3 Linear Prediction Based Coefficients Calculation: The linear prediction (LP) provides parametric modeling techniques, which is used to model the spectrum as an autogressive process. These parametric models are basically used in compression system. The ability of linear prediction as applied to speech, however, lies not only in its predictive function but also in the fact that it gives a very good model of the vocal tract, which is useful for both practical and theoretical purposes for representing speech for low bit transmission or storage. 2.3.1 Autocorrelation method for calculating the LPC Autocorrelation method was used for many reasons: 1) It requires less amount of storage. 2) The number of multiplication needed for computation is small. 3) The number of sample will be as large as several pitch periods to ensure reliable results. 4) The number of poles will depend on the sampling rate, it is known that for every 1KHz sampling rate two poles (one conjugate pole) will be needed. The prediction error is represented by: ∑ = −−= p i i )in(Sa)n(S)n(e 1 (2) where S(n) is the value of sample n and ∑ = − p i i )in(Sa 1 is the predicted value of sample (n). The prediction parameter (ai) is known and determined by minimizing the mean square error (MSE) E[e2(n)]. These coefficients are determined by solving (p) (order of LP) with (p) unknowns that obtained by minimizing MSE           =                     −− − − )P(R)(R....)p(R)p(R .... )p(R....)(R)(R )p(R....)(R)(R MMMMM R(2) R(1) a(P) a(2) a(1) 021 201 110 ………. (3) where R(n) is the correlation of the window with it self with shift equals to n (R(0) is the autocorrelation). This type of matrix is called Toeplitz matrix where all elements along a given diagonal are equal and it is very easy to invert it. 2.3.2 Levinson-Durbin Procedure The L-D algorithm is a recursive algorithm that solves the (a=R-1R). It is very computationally efficient. During this algorithm a number of coefficients are generated which are {ai} a set of coefficients and {ki}, which is called reflection coefficients. These coefficients can be used to rebuild the set of filter coefficients {ai} and can guarantee a stable if their magnitudes are strictly less than one. n 50 100 150 200 250 256 1 Fig. (1) Rectangular window of 256-sample length. Tarik Zeyad /Al-khwarizmi Engineering Journal ,vol.1, no. 1,PP 52-60 (2005) 54 The Levinson Durbin algorithm is summarized by [1]: (4) (5) (6) (7) (8) (9) Fig. (2) Shows the flow chart of the compression of speech signal while fig. (4), illustrates the flow chart of the decompression of speech signal. p j a a p i E j i R a i R ki R E p j j i i j i j ≤ ≤ = ≤ ≤     − − = = − − = − ∑ 1 1 / ) ( ) ( ) 0 ( ) ( E k E i i i − = − ) 1 ( ) 1 ( 2 ) ( i j a k a a i j i i i j i j − ≤ ≤ + = − − − 1 1 ) 1 ( ) 1 ( ) ( ki a i i = ) ( ) 1 ( 1 1 ) 1 ( ) 0 ( Tarik Zeyad /Al-khwarizmi Engineering Journal ,vol.1, no. 1,PP 52-60 (2005) 55 START SAMPLED AT FREQUENCY (8,22)KHZ , AID S(n) WITH 8BIT&16BIT ACCURACY SEGMENTAION 256 SAMPLE PER SEGMENT AND FRAMING NO OVERLAP SAMPLE FEATURE EXTRACTION (DWT) TO 3 LEVEL [(C(I,:),L(I,:)]=WAVEDEC(SEG(I,:),N,W) WAVELET COEFFICIENTS: [D3(I,:), D2(I,:),D1(I,:)]=IGNORE [A3(I,:)]=APPCOFF(C(I,:),L(I,:),W) WINDOWING (RECTANGULAR WINDOW) WITH NO OVERLAP SAMPLES LPC-ANALYSIS VIA L-D METHOD LP COEFFICIENTS AI = . . . . .. . .. a(p+1) FOR EACH FRAME FINAL PREDICTION ERROR IS E<3*10-5? YES NO Fig.(2) Flow chart of compression speech signal. STORAGE DEVICE LP COEFFICIENTS FOR EACH FRAME FIRST SAMPLES FROM EACH FRAME FROM APPROXIMATION COEFFICIENT,W,N INPUT SPEECH SIGNAL X(t) SAVE PREDICTOR ERROR STOP IS THAT LAST FRAME Yes NO 1ST SAMPLES FROM EACH FRAME Tarik Zeyad /Al-khwarizmi Engineering Journal ,vol.1, no. 1,PP 52-60 (2005) 56 Fig.(3) Flow chart of decompression speech signal. APPROXIMATION COEFFICIENT = PREDICTION SIGNAL ( FROM STORAGE COEFFICIENTS) + PREDICTION ERROR :)],(+:),(ˆ[=:)],(3[ IEISIA n STORAGE DEVICE LP COEFFICIENTS AND PREVIOUS SAMPLE FROM EACH FRAME AND PREDICTION ERROR RECONSTRUCT THE SIGNAL LEVEL FROM COEFFICIENT STOP OUTPUT SPEECH SIGNAL IS THAT LAST FRAME? YES NO START Tarik Zeyad /Al-khwarizmi Engineering Journal ,vol.1, no. 1,PP 52-60 (2005) 57 3. Evaluation Tests of the Algorithm 3.1 Tested Speech Samples: The test material will contain five speech samples stored in five files, the format of these files are wave format, each file has a different size with respect to the other files of normal Arabic sentences altered by different speakers. The type of the digital speech is pulse code modulation (PCM) and the tested speech samples have 8-bit/samples or 16- bit/samples. The properties of tested wave data are presented in table (1). Table (1) : Properties of tested wave data Signal S1 S2 S3 S4 S5 File Name Allah aa1 aa2 DALEEL PROG File Type Wave file Wave file Wave file Wave file Wave file File Size byte 190640 123392 160000 260664 296192 Media Length (s) 23 15 20 14 13 File Format PCM PCM PCM PCM PCM 8 KHz 8 KHz 8 KHz 22 KHz 22 KHz 8-bit mono 8-bit mono 8-bit mono 16-bit mono 16-bit mono 4.2 Performance Measure The compression algorithm used in this project has been tested on a number of sounds (about 5-files). Table (2) shows performance measures of speech signals. To evaluate the performance of a compression technique the criteria of measuring distortion in reconstructed sound files are defined. These criteria are necessarily applied. These include signal to noise ratio, peak signal to noise and normalized root mean square error. The above quantities are calculated using the following formats [1]: 1-Signal to noise ratio       σ σ = 2 2 e x 10Log 10SNR (10) xσ is the mean square of the speech signal and eσ is the mean square difference between the original signal and reconstructed signal. 2-Peak signal to noise ratio ( ) 2 r-x Nx log 10 PSNR 2 10= (11) N is the length of reconstructed signal, x is the maximum absolute square value of the signal x and 2 r-x is the energy of the difference between the original and reconstructed signals. 3-Normalized root mean square error ( ) ( )2 2 (n)(n) (n)(n) x rx NRMSE µ− − = (12) x(n) is the speech signal, r(n) is the reconstructed signal and μx(n) is the mean of speech signal . Tarik Zeyad /Al-khwarizmi Engineering Journal ,vol.1, no. 1,PP 52-60 (2005) 58 Table (2) Performance measures of speech signals. Signal Wavelet SNR PSNR NRMSE S1 Haar db4 5.4328 6.8434 5.9214 5.6851 1.2902 1.3258 S2 Haar db4 4.2883 4.7211 13.2997 13.1877 1.2686 1.2844 S3 Haar db4 3.0827 3.3462 10.5618 10.4571 1.1904 1.2048 S4 Haar db4 7.2869 11.7732 15.0258 14.7718 1.3448 1.3841 S5 Haar db4 8.2198 12.9519 16.6317 16.3998 1.3580 1.3948 5. Compression Performance This section gives the computation of the compression performance of the proposed method in tables (3) and (4) [2]. This table is computed according to the following equations: The first column in this table is the name of the file. The second column is the size of the file before compression and the size of the output file (that represents LPC coefficient and previous samples of each speech signal). The compression ratio is the third column computed by using equation number (13). The fourth column is the compression factor computed by applying equation number (14). The 5th column is the expression computed using equation number (15). Finally the expression gain is computed using equation number (16). The results of compression performance are shown in table (3) by using Haar wavelet transform and in table (4) by using db4 wavelet transform. Table (3) Compression Performance results when using Haar Wavelet Transform. File name Input file size Output file size CR % CF EX Compression Gain S1 190640 15624 8.19 12.20 91.81 108.6 S2 123392 10080 8.16 12.24 91.83 108.7 S3 160000 13104 8.19 12.21 91.81 108.6 S4 260664 61936 23.76 4.208 76.24 62.413 S5 296192 56448 19.05 5.247 80.94 71.99 Average 13.47 9.223 86.522 92.04 Compression ratio = the size of output/the size of input <1 ……..(13) Compression factor = the size of input/the size of output >1 …… (14) The expression =100*(1-Compression ratio) …..(15) Compression gain = 100 log(reference size/compressed size) ……(16) Tarik Zeyad /Al-khwarizmi Engineering Journal ,vol.1, no. 1,PP 52-60 (2005) 59 Table (4) Compression Performance results when using db4 Wavelet Transform. File name Input file size Output file size CR% CF EX Compression Gain S1 190640 18480 9.69 10.316 90.306 101.35 S2 123392 11928 9.66 10.344 90.33 101.47 S3 160000 15456 9.66 10.350 90.34 101.5 S4 260664 73696 28.27 3.537 71.72 54.86 S5 296192 67032 22.36 4.418 77.36 64.529 Average 15.982 7.793 84.086 84.74 6.Conclusions: A significant advantage of using wavelets for speech coding is that the compression ratio can be varied. This work shows that wavelet decomposition in conjunction with other techniques such as LPC is promising compression techniques which make use of the elegant theory of wavelets. Several conclusions can be drawn: - 1- The two proposed techniques work better with clean source materials; noisy sound waveforms produce poor results and need additional processing techniques. 2- Choosing the right decomposition level in the DWT is important for many reasons, for processing speech signals no advantage is gained beyond level 3 usually processing at lower scale leads to a better compression ratio. 3- Linear predictive technique causes time delay and some loss of quality. But they are negligible in terms of cost when compared with the advantages of storage space saving, smaller B.W requirement, lower power consumption and small product size. 4- This proposed method could be classified in the field of symmetrical compression. This case occurs when the compression and decompression use basically the same algorithm but work in opposite directions. WT and LPC are used together to get more compression ratio, and depending on these ratios the algorithm gives promising results, although each of them can be used individually to compress speech signals but by using WT technique alone the compression ratio can be varied by changing the level of decomposition while the compression ratio is constant when LPC is used. Keyword: Wavelet, speech coding, linear predictive coding, Levin son Durbin 7. References [1] Rabiner L.R, and Schafer R.W., (1990), “Digital Processing Of Speech Signals” , Prentice Hall. [2] Jabber M., (1999), “ Speech Compression And Recognition Using Wavelet Transform”, An M.Sc. Thesis, University of Baghdad, College of Engineering, Electrical Engineering Department. [3] Markel J.D. and Gray A.H., (1976), “Linear Prediction Of Speech Coding”, Springer Verlag, New York. [4] Koul M., (1975), “ Linear Prediction Tutorial”, Proceeding of IEEE , Vol.63, No.4, PP.461-580, April. [5] Belloch G.E., (2001), “Introduction to Data Compression”, Cavnege Mellon University, Web Site, October. Tarik Zeyad /Al-khwarizmi Engineering Journal ,vol.1, no. 1,PP 52-60 (2005) 60 ضغط الملفات الصوتية باستعمال تحويل المويجة والتشفير ذو األستنتاجات الخطية اد احالم حنونطارق زي.د جامعة بغداد/ كلية الهندسة /قسم الهندسة الكهربائية :الخالصة تم في هذا البحث استخدام التحويل نوع التحويل المويجة واستخدام المرشح ذو معامالت االستنتاج الخطية لغرض db4و Haarنوع المسمى تحويل المويجة لتحويتخدام التم أس. ضغط حجم الملفات التي تحتوي على تسجيالت صوتية لغرض تنفيذ عملية الضغط األولى واحتساب نسبة ضغط الملفات بأستخدام كافة الطرق أما بصورة مفردة أو بصوزة وكانت نسبة ضغط الملفات تعتمد على Levinson Durbinتم احتساب معامالت المرشح باستخدام خوارزمية . مجتمعة خطأ المرشحات تم مقارنة حجم الملفات المضغوطة مع الملفات األصلية وكذلك أحتساب نسبة الخطأ التي تنتج من نسبة .عملية الضغط وكانت جميع الملفات التي تم ضغطها هي ملفات قيم