Proposed Speech Analyses Method Using the Multiwavelet Transform Baydaa Jaffer Al-Khafaji Dept. of Computer Science/ College of Education for pure Science (Ibn Al-Haitham) / University of Baghdad Received in : 7April 2014, Accepted in :14 April2014 Abstract Speech is the first invented way of communication that human used age before the invention of writing. In this paper, proposed method for speech analyses to extract features by using multiwavelet Transform (Repeated Row Preprocessing).The proposed system depends on the Euclidian differences of the coefficients of the multiwavelet Transform to determine the beast features of speech recognition. Each sample value in the reference file is computed by taking the average value of four samples for the same data (four speakers for the same phoneme). The result of the input data to every frame value in the reference file using the Euclidian distance to determine the frame with the minimum distance is said to be the "Best Match". Simulation program using visual basic has been in order to determine final results. Keyword: Multiwavelet Transform, Euclidian distance, Hamming window, Repeated Row Preprocessing 394 | Computer Science @a@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ÚÓ‘Ój�n€a@Î@Úœäñ€a@‚Ï‹»‹€@·rÓ:a@Âig@Ú‹©@Ü‹127@@ÖÜ»€a@I1@‚b«@H2014 Ibn Al-Haitham Jour. for Pure & Appl. Sci. Vol. 27 (1) 2014 Introduction Speech communicate system consists of the speech mechanism and hearing system , Speech Mechanisms shows how Human speech is produced by complex interactions between the diaphragm, lungs, throat, mouth and nasal cavity [ 1] . The processes which control speech Phonation is the process of converting air pressure into sound via the vocal folds, or vocal cords as they are commonly called , Resonation is the process by which certain frequencies are emphasized by resonances in the vocal tract ,and Articulation is the process of changing the vocal tract resonances to produce distinguishable sounds. In general, in its most simplistic form speech can be viewed as a stochastic process involving two principal dimensions - time and frequency. The complexity of the speech recognition task lies in the fact that a given utterance can be represented by an effectively infinite number of time- frequency patterns. A human speech signal is produced by moving the vocal tract articulators towards target positions that characterize a particular sound [2]. There are several viable methods currently used for speech recognition including template matching, acoustic-phonetic recognition and stochastic processing.The human auditory system is based on a time-frequency analysis of sounds. The information received by the human ears can be described most conveniently as non-linear auditory responses to frequency selectivity and perceived loudness [ 3]. The Multiwavelet Transform is widely used in the extraction of features in all the types of signals, especially for extracting features in the speech sound signal. The wavelet and multiwavelets transformations are directly applicable only to one – dimensional signals. The two main categories of methods for doing this are separable and non separable algorithms which are mostly applied for 2-D signals,where Separable methods simply work on each dimension in series. The typical approach is to process each of the rows in order and then process each column of the result. Non-separable methods work in both signal dimensions at the same time. While non-separable methods can offer benefits over separable methods, such as savings in computation, they are generally more difficult to implement [ 4] . Preprocessing for multiwavelets As mentioned before , multiwavelets filter banks require a vector –valued input signal .There is a number of ways to produce such signal from 2-D data signal perhaps the most obvious method is to use adjacent rows and columns of the data [4] . However, this approach does not work well for general multiwavelets and leads to reconstruction artifacts in the lowpass data after coefficient quantization. This problem can be avoided by constructing "constrained" Multiwavelets , which possess certain key properties . Unfortunately, the extra constraints somewhat restrictive; data compression tests show that constrained Multiwavelets do not perform as well as some other Multifilters. Another approach is to first split each row or column into two half-length signals and then use these two half signals as the channel input into the Multifilter. A naive approach is to simply take the odds samples for one signal and the even samples for the second signal. This approach doesn't work well because it destroys the assumes characteristics of the input signal. It is generally presumed that data will locally well-approximated by low-order polynomials, usually constant, linear ,or quadratic. The high-pass filter is designed to give a uniform zero output when the input has this form. Taking alternating data points as the filter alerts character of the input signal, hence the filter output will no longer be forced to zero ,reducing compression performance .But there is a way around this problem :one may first prefilter the two half length signals before passing them into the Multifilter . 395 | Computer Science @a@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ÚÓ‘Ój�n€a@Î@Úœäñ€a@‚Ï‹»‹€@·rÓ:a@Âig@Ú‹©@Ü‹127@@ÖÜ»€a@I1@‚b«@H2014 Ibn Al-Haitham Jour. for Pure & Appl. Sci. Vol. 27 (1) 2014 Over sampled scheme: repeated row preprocessing In multiwavelets setting , GHM multiscaling and multiwavelets function coefficients are 2*2 matrices ,and during transformation step they must multiply vector(instead of scalars) .This means that multifilter bank need 2input rows . so the most obvious way to get two input rows from a given signal is to repeat the signal .Two rows go into the multifilter bank .This procedure is called "Repeated Row" which introduces over sampling of the data by factor of 2. for a given scalar input signal {Xk} of length n (N is assumed to be power of 2 and so is of even length), repeated row preprocessing of this signal is by repeating the input stream with the same stream multiplied by a constant α . So the input leangth-2 vector is formed from the original as Here α is constant: it is typically chosen so that if Xk=C=constant, for all k, then the output from the high-pass multifilter is zero .This can always be done if the system has approximation order higher than zero. For the GHM case , α=1/ 2 is selected since The output from the low-pass multifilter is simply a scaled version of input, Speech recognition via multiwavelet There is an extensive amount of information in the speech signal only some of which is relevant to the selection of the correct machine response. A critical task is to extract from all that information all those parts that convey the message. * Feature extraction is done by Multiwavelet Transform. However, it is extremely difficult to do this. An objective of speech analysis for speech recognition is therefore to extract useful information from the signal by approaching directly in the time domain, or by some related techniques in the frequency domain .In some way or another most proposed schemes are based on, or can be related to, short-time frequency analysis of the speech waveform. Xk αX Where k=0.1.2….n-1 C [H 0+H 1+H 2+H 3] 4 4 C C = ----(3) [G0+G1+G2+G3] C 8 0 0 C 0 = ---- (2) 0 396 | Computer Science @a@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ÚÓ‘Ój�n€a@Î@Úœäñ€a@‚Ï‹»‹€@·rÓ:a@Âig@Ú‹©@Ü‹127@@ÖÜ»€a@I1@‚b«@H2014 Ibn Al-Haitham Jour. for Pure & Appl. Sci. Vol. 27 (1) 2014 Proposed speech recognition system using multiwavelet transform 1- Preprocessing the data The first thing to be done before getting the data to be processed in the system is data acquisition processing. The data to be processed is stored on the computer as a" WAVE PCM sound files 8-bit mono" as shown in figure (1). 2- Data framing It may simply consist of placing the data into fixed-size blocks , thus the speech samples are taking in frames small enough not to cover too many changes in this speech signal but large enough to discover meaningful information about the signal .Each block is of 256 bit as shown in figure (2) 3- Choosing data windowing Two main types of window functions that had been used in speech recognition namely the Hamming window which is used here       − Π − 1 2 cos46.054.0 N n if 10 −≤≤ Nn ------(4) Wn= 0 if otherwise and the Rectangular window 1 if 10 −≤≤ Nn ---- (5) Wn= o if otherwise 4-Feature extraction is done by multiwavelet transform Using the over-sampled scheme of processing, the DMWT matrix is double in dimension compared with the input which should be a square matrix (of 4*1 minimum). 5- Creating the reference Each sample value in the reference file is computed by taking the average value of four samples for the same data (four speakers for the same phoneme) . 6- Finding the best match Finding the distance of the input data to every frame value in the reference file using the Euclidian distance measurement function E=(x-ref) ^2. The frame with the minimum distance is said to be the "Best Match" the table of result shows in figure (3). A flow diagram for the proposed speech analyze method via multiwavelet shown in figure (4) Conclusions This paper presents a new technique for speech analyze through the use of multiwavelet to extract features by using (Repeated Row Preprocessing).The proposed system depends on the Euclidian distance measurement for finding the distance of the input data to every frame value in the reference file to compute the average value of four samples for the same data (four speakers for the same phoneme) to determine the beast result. We can apply this method to a wider set of Phonemes (more than 7 speakers). Also it has been seen that the bandwidth of the hamming window is about twice the bandwidth of the rectangular window of the same length. This method could be developed to use as a preprocessing for the development of a speech - to - text recognition system. 397 | Computer Science @a@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ÚÓ‘Ój�n€a@Î@Úœäñ€a@‚Ï‹»‹€@·rÓ:a@Âig@Ú‹©@Ü‹127@@ÖÜ»€a@I1@‚b«@H2014 Ibn Al-Haitham Jour. for Pure & Appl. Sci. Vol. 27 (1) 2014 References 1. Steven, E. and Golowich(1998) A Support Vector/HMM aprouch to phoneme recognition,October,14Bell labs,Lucent Technology. 2. Mohamed Debyeche ( 2000) Phoneme recognition system basec on HMM with distributed VQ codebook,IEEE Trans on comm., 28. 3. Jenkins, W.K, ( 1986) ,Recent advances in residue number techniques for recursive digital filtering, IEEE Transactions on Acoustics, Speech, and Signal Processing,ASSP- 27, 19-30, February . 4. Waled, A. Mahmoud ,Quantization Techniques for the classification and recognition of speech signals( 1986),ph.d thesis,University of Wales. Figure No. (1): shows the opening of a wave file interface in the proposed system 398 | Computer Science @a@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ÚÓ‘Ój�n€a@Î@Úœäñ€a@‚Ï‹»‹€@·rÓ:a@Âig@Ú‹©@Ü‹127@@ÖÜ»€a@I1@‚b«@H2014 Ibn Al-Haitham Jour. for Pure & Appl. Sci. Vol. 27 (1) 2014 Figure No.(2)The data to be processed is stored on the computer as a" WAVE PCM sound files 8-bit mono الحرف ش الحرف س الحرف ز الحرف ر الحرف ذ الحرف د الحرف خ الحرف ح الحرف ج الحرف ث الحرف ت الحرف ب الحرف ا اللفظة ا 22 204 188 190 216 85 117 99 125 222 111 156 203 ب 77 17 122 79 212 56 43 122 55 205 136 98 69 ت 67 189 7 33 260 34 123 87 46 89 76 123 54 ث 45 132 76 20 34 142 84 29 66 87 45 200 80 ج 132 88 34 98 13 66 55 112 65 99 122 50 32 Figure No. (3)Table of result the frame with the minimum distance is said to be the best 399 | Computer Science @a@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ÚÓ‘Ój�n€a@Î@Úœäñ€a@‚Ï‹»‹€@·rÓ:a@Âig@Ú‹©@Ü‹127@@ÖÜ»€a@I1@‚b«@H2014 Ibn Al-Haitham Jour. for Pure & Appl. Sci. Vol. 27 (1) 2014 Figure No. (4):A flow diagram for the proposed speech analyze method by using multiwavelet Euclidian distance Output in the best match Feature extraction by multiwavelet Windowing 1-hamming 2-rectangular Put data into affixed size Input speech signal WAVE 400 | Computer Science @a@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ÚÓ‘Ój�n€a@Î@Úœäñ€a@‚Ï‹»‹€@·rÓ:a@Âig@Ú‹©@Ü‹127@@ÖÜ»€a@I1@‚b«@H2014 Ibn Al-Haitham Jour. for Pure & Appl. Sci. Vol. 27 (1) 2014 لتحلیل الكالم باستخدام التحویل متعدد المویجة طریقة مقترحة بیداء جعفر صادق الخفاجي / جامعة بغداد ( ابن الھیثم) للعلوم الصرفةكلیة التربیة / قسم علوم الحاسبات 2014 نیسان 14، قبل البحث 2014 نیسان 7 في استلم البحث الخالصة لتحلی�ل جدی�دة طریق�ة ص�فن الورق�ة، ھ�ذه ف�ي. ق االتصال بین البش�ر وحت�ى قب�ل اخت�راع الكتاب�ةائالكالم ھو اول طر المھمة وباستخدام التحویل متع�دد المویج�ة وباح�د انواع�ھ الت�ي ھ�ي طریق�ة تك�رار تجھی�ز الص�فوف النظ�ام الكالم خصائص المقترح یعتمد على طریقة اقلیدس في حساب المسافة ومعامالت التحویل متعدد المویجة في حساب فرق المسافة كل قیمة في )ولكي نحقق افضل للصوت نفسھمتحدثین ة(اربعنفسھا یاناتالبلترتیب فایل المصدر تحسب باخذ متوسط قیمة اربع عینات )6(لغة البرمجة فیجوال بیسك متالنتائج نعتمد بتحقیق ا قل مسافة بطریقة اقلیدس، استخد الكلمات المفتاحیة:تحویل متعدد المویجة،طریقة اقلید س في قیاس المسافة ، نافذة المبالغة،المعالجة بالصفوف المتكررة 401 | Computer Science @a@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ÚÓ‘Ój�n€a@Î@Úœäñ€a@‚Ï‹»‹€@·rÓ:a@Âig@Ú‹©@Ü‹127@@ÖÜ»€a@I1@‚b«@H2014 Ibn Al-Haitham Jour. for Pure & Appl. Sci. Vol. 27 (1) 2014