Al-Qadisiyah Journal For Engineering Sciences, Vol. 9……No. 2 ….2016 135 DESIGN AND IMPLEMENTATION OF WIRELESS VOICE CONTROLLED MOBILE ROBOT Dr. Ali Ahmed Abed College of Engineering- University of Basrah aaad_bah@yahoo.com Dr. Abbas A. Jasim College of Engineering- University of Basrah abbas.a.jasim@ieee.org Received 26 August 2015 Accepted 21 January 2016 ABSTRACT This paper presents a technique for a speech recognizer used to control the motion of an intelligent automated mobile robot. The aim is to interact with the mobile robot using natural and direct communication techniques. The voice is processed to get proper and safe movement of a mobile robot and satisfying high recognition rate. Features are extracted from speech signal using Mel Frequency Cepstral Coefficients (MFCC). To realize feature matching, an efficient Dynamic Time Warping (DTW)-based speech recognition system is presented which is applicable for isolated words of Arabic language. The tested words are compared to a trained database using this DTW algorithm. On the other side, the mobile robot is designed with two servo motors as driving actuators. These actuators are controlled by L298 motor driver circuit. The control algorithm is programmed and downloaded into a PIC18F45K22 microcontroller which is interfaced to a USB port of a 10" notebook computer. The robot proves a capability of understanding the full meaning of the five Arabic speech commands that steer it forward, backward, right, left, or stop. Keywords: Arabic speech recognizer, Mel Frequency Cepstral Coefficients, dynamic time warping, Pattern Recognition , Mobile robot. تصميم وتنفيذ روبوت متحرك السلكي مسيطر عليه بالصوت علي احمد عبدد. جامعة البصرة-كلية الهندسة عباس عبد االمير جاسمد. جامعة البصرة-كلية الهندسة الخالصة البحث طريقة لبناء مميز كالم يستخدم للسيطرة على حركة روبوت متحرك آلي وذكي يستطيع التفاعل وفهم لغة الكالم الطبيعية يقدم آمنة بصورة مباشرة. يقدم البحث الخطوات التفصيلية الالزمة لمعالجة االشارة الصوتية بما يضمن نسبة تمييز عالية تؤدي الى حركة تستخدم DTWوخوارزمية معتمدة على MFCCوطبيعية للروبوت. الخوارزميات المستخدمة للمعالجة الصوتية هي: خوارزمية دة لتمييز الكلمات العربية المنفصلة. تعتمد عملية التمييز على مقارنة الكلمات االختبارية مع الكلمات المدربة مسبقاً والمخزونة في قاع وذلك L298تساق بواسطة مسيطر نوع servoجانب آخر، فقد تم بناء روبوت متحرك ثنائي المحركات من نوع بيانات مسبقة. من لتحقيق خوارزمية السيطرة الصوتية. تمت موائمة الروبوت مع مميز الكالم من خالل معالج )مايكروكونترولر( نوع mailto:aaad_bah@yahoo.com mailto:abbas.a.jasim@ieee.org Al-Qadisiyah Journal For Engineering Sciences, Vol. 9……No. 2 ….2016 136 PIC18F45K22 ًبأكملها من خالل توجيه الروبوت تم اختبار المنظومة كدائرة بينية صممت بشكل كامل لهذا الغرض. واخيرا .على المسار المطلوبوجيه الروبوت باستخدام خمس كلمات عربية هي: امام، خلف، يمين، يسار، قف والتي بواسطتها يمكن ت روبوت المتحرك، الاالنماط مميز كالم عربي، معامالت تردد ميل، معامالت الوقت الديناميكي، تمييزالكلمات المفتاحية: NOMENCLATURE ANN Dist(x,y) DTW F FFT F mel GD HMM K L LD LPC M MFCC RR V X Y Artificial Neural- Network Euclidean distance between two points Dynamic Time Warping Tone frequency in Hz Fast Fourier Transform Mel frequency in Hz Global Distance Hidden Markov models Number of frames Number of samples in each frame Local Distance Linear Predictive Coding Number of samples that separated frames Mel Frequency Cepstral Coefficients Recognition Rate Voice Activity Detection Sequence feature vector in n dimensional space Another sequence feature vector in n dimensional space 1. INTRODUCTION The Arabic language is the fifth widely used language world-wide since there are at least 200 million people speak Arabic, (Khalid, 2013). There are little researches in speech recognition field that deal with Arabic as compared to English or Japanese. The Arabic language has monosyllabic and polysyllabic words with two categories of phonemes: pharyngeal and emphatic, which found in all Semitic languages, (Al-Zabibi, 1990) and (Alkhouli , 1990). The automatic speech recognition, which got a good attention for many decades, allows a computer to recognize spoken words inputted by a mike. Speech recognizers are used in many applications such as: interacting with deaf people, healthcare, home automation, robotics, etc. There are a large number of approaches for speech recognition such as: Dynamic time warping (DTW), Artificial Neural- Network (ANN), Hidden Markov models (HMM), etc. In this work, an efficient DTW-based speech recognition system for isolated Arabic words is given as a feature matching algorithm and a Mel Frequency Cepstral Al-Qadisiyah Journal For Engineering Sciences, Vol. 9……No. 2 ….2016 137 Coefficient (MFCC) approach is used as a feature extraction approach because of its robustness and effectiveness compared to other well-known methods like Linear Predictive Coding (LPC), (Lindasalwa, 2010). After that a mobile robot is designed, as will be explained in the subsequent sections, and controlled by the designed speech recognizer to get a complete speech controlled system suitable for different applications. It is desired to command the mobile robot by voice via special interface that plays a significant role as a master control circuit for the servo motors of the robot. In voice control system, a difficulty may appear in the control circuit leading to a recognition error, which means that the recognized command is interpreted as opposite command. For example "Left" is interpreted as "Right" especially in languages with very high acoustic similarity like Polish. This problem is not significant in Arabic when using the direction words because they differ completely in pronunciation. Unlike other languages, Arabic language is characterized by having tremendous dialectical variety, diacritic text material, morphological complexity which may lead to some challenges against having a highly accurate Arabic recognizer. In the work of (Jean-Marc , 2007), the voice of the speaker depends on the distance and azimuth. The work satisfied a distance of 2m and azimuth range of 10 o to 90 o . In 2010, two voice recognition algorithms which are MFCC and DTW is built, evaluated and compared with other techniques to prove their effectiveness, (Lindasalwa, 2010). In 2011, (Ahmed,2011) had proposed a technique called (multiredgilet transform) with neural network to control the motion of a wheelchair dedicated for handicapped people. The work presented by (Rachna, 2011) is to build a microcontroller-based mobile robot controlled with speech. He studied various factors such as noise and distance factor for his speech recognition system. (Khalid, 2013) suggested DTW, MFCC and voice activity detection (VAD) for isolated words of Arabic language but with unsufficient recognition rates. In our work, we built a speech recognizer using MFCC, DTW, and VAD for five Arabic words and a high recognition rates are satisfied without depending on azimuth and the distance is limited by the wireless transmission distance. This speech recognition system is used to control the motion planning of an autonomous mobile robot designed completely to get a voice controlled robotic system. The rest of the paper is organized as follows: Section 2 is concerned with the explanation of our speech recognition system with all its stages. In section 3, the complete design of the mobile robot with the used components is explained. Section 4 provides the software structure of the overall system. In section 5, the obtained results and verification are given with some required discussion. Section 6 summarizes the main conclusions. 2. THE VOICE RECOGNITION SYSTEM The presented Arabic speech recognition system consists of the following stages: A. Preprocessing This stage is important to enhance the recorded speech signal characteristics by removing noise leading to obtain a high quality recorded speech. The high frequency contents of the input signal are emphasized by a first order FIR filter (implemented in software) to flatten the signal spectrum. Also, this stage should overcome the problem of using different types of microphones and different speaking loudness. This stage is hidden in the first stage "Read the voice input" of Figure1. B. Voice Activity Detection (VAD) Another problem that affects the performance of the speech recognizer is detecting the start and end points of the voice signal, (Khalid, 2013). The speech signal is segmented into spaced frames of 10ms Al-Qadisiyah Journal For Engineering Sciences, Vol. 9……No. 2 ….2016 138 width. After that, short-term power and zero-crossing rate are used to detect the speech/non-speech regions. It is clear that short-term power is increased in speech regions while zero-crossing rate is increased in non-speech regions. Hence, these two techniques give a good indication of speech appearance. C. Feature Extraction  Framing: The speech signal is segmented into K frames of L samples for each one. The adjacent frames are separated by M samples (M