 Proceedings of Engineering and Technology Innovation , vol. 3, 2016, pp. 25 - 27 25 A Wireless Sensor Network-Speech Recognition Scheme Using Deployments of Multiple Kinect Microphone Array-Sensors Ing-Jr Ding * and Shih-Kai Lin Department of Electrical Engineering, National Formosa University, Yunlin, Taiwan. Received 22 February 2016; received in revised form 28 March 2016; accept ed 13 April 2016 Abstract Speech recognition has successfully been uti- lized in lots of applications recently. With the development of the Kinect sensor device from Microsoft, speech recognition could be further promoted to be used in an ubiquitous environ- ment where a wireless sensor network using Ki- nect sensors is deployed. This study develops a wire less sensor network (WSN)-speech recogni- tion scheme using deployments of mu ltiple Ki- nect microphone-array sensors. Presented speech recognition by Kinect-WSN could effectively capture the acoustic data made from the talking speaker and then perform the corresponding voice command control on certain target. In this study, different strategies to deploy multiple Kinect microphone-array sensors for constructing an ubiquitous Kinect-WSN speech recognition en- vironment are investigated. Several different acoustic sensing data fusion methods are also explored for achieving superior performance on Kinect-WSN speech recognition. The presented method in this paper is evaluated the efficiency and effectiveness in an 5m×5m laboratory env i- ronment in which any of four test speakers is to ma ke the voice command anywhere. Developed Kinect microphone array sensor-deployed WSN speech recognition in this work is finely utilized in various different applications in control. Keywor ds : speech recognition, Kinect micro- phone array- sensor, wire less sensor network 1. Introduction Speech recognition has been a matured tech- nique for human machine-interaction (HCI) in the recent years. With the development of internet of things (IoTs) technology nowadays, the smart home scenario that most of equipment and de- vices in a fa mily are connected and commun i- cated with each other via wireless networks will be a practical integrated application. Conven- tional speech recognition viewed as the category of voice control interactions is that the voice command data provided by the specific user is acquired by the mic rophone in a very short dis- tance from the user [1, 2]. Such voice-control speech recognition application can be widely seen in speech recognition on the smart phone plat- form and speech recognition on the central mu l- timedia control panel platform in car. In the new technology of IoTs nowadays, different to con- ventional voice command-control speech recog- nition, in order to control all things connected to the internet in a home, office or other indoor environments, a new strategy for speech recogn i- tion developments, wireless sensor network (WSN)-speech recognition will be attracted much attention and a new and challengeable technique issue. This study explores the utilization of the Ki- nect sensor [3, 4] for sensing and then acquiring the acoustic voice command of the user in an office environment. As many persons know, in addition to gesture recognition by Kinect [5-7], the Kinect device can also be used to perform speech recognition due to the embedded micro- phone array design composed of four micro- phones [8]. Speech recognition in this work will be performed in an acoustic sensing area that is properly deployed by multiple Kinect micro- phone-arrays. In the presented scheme of WSN-speech recognition by multip le Kinect microphone array-sensors in this paper, technical issues such as the (1) deployment method of the Kinect mic rophone array, (2) the establishment of client server-based wireless sensor network by Kinect sensors, (3) investigations of acoustic sensing data fusion methods, and (4) the possible application with practice in a real life using pre- sented WSN-speech recognition by Kinect mi- crophone array sensors will be considered, which will be detailed in the following section . * Corresponding aut hor. Email: ingjr@nfu.edu.t w Proceedings of Engineering and Technology Innovation , vol. 3, 2016, pp. 25 - 27 26 Copyright © TAETI 2. Speech Recognition via Wireless Sensor Network by Using Kinect Microphone Array-Sensors The frame work of WSN-speech recognition by Kinect microphone array-sensors explored in this study mainly contains Kinect sensor de- ployments by two Kinect microphone-arrays, client-server WSN establishments using TCP/IP protocol, acoustic sensing data fusion using a simple and computationally fast strategy. The developed framework is further performed in an application of voice sensing and remote control to the multimedia player component on a smart phone, which is depicted in Fig. 1. Fig. 1 WSN-speech recognition by deployed Kinect microphone array-sensors and its control application Fig. 2 Sensed data fro m a Kinect microphone array-sensor for data fusion calculations (four-channel voice data contained sim- ultaneously in the unique Kinect sensor) As depicted in Fig. 1, a voice command fro m a speaking user is sensed by two Kinect micro- phone array-sensors that are properly deployed in an office space. Each sensed data from each of these two Kinect sensors is sent to the server via the TCP/TP protocol, and the server end performs data fusion calculations for determining the recognition result of the sensed voice command. After the data fusion estimate, the recognized voice command is then sent to the smart phone device via the Bluetooth protocol to carry out a series of functional control on the multimedia player application program. In this work, two Kinect microphone array-sensors are designed to be appropriately localized inside to acquire al- most most of all possible voice data. Fig. 2 shows sensed data in a form of four channels. All these data come fro m the unique Kinect microphone array-sensor. Two Kinect microphone array-sensors deployed in this work are composed of 2 sets of the four-channel sensed voice data. These data are then considered the content of the voice command using data fusion calculations. The data fusion strategy employed in this study is a voice energy-based method. As shown in Fig. 2, each voice data fro m the Kinect sensor have the different values of energy. The value of the voice energy is dependent on the amplitude value of the data in certain time dura- tion. The microphone in the Kinect microphone array has the large value of voice energy in case the voice data source (i.e. the speaking user) is located extreme ly near the microphone. Co n- versely, when the microphone in the Kinect mi- crophone array is far away from th e voice data source, the estimated voice energy to this mi- crophone will be significantly small. Based on the above design thought-line, the primary principle of the data fusion method in this study is that the microphone with the sensed data of large-sized values will have more effects on the recognition decision of voice commands. The simplest method based on such the designed fusion prin- ciple is that the data fusion result is the recogn i- tion outcome of the microphone receiver where the sensed data has the largest values of energies. In the part of control applications, the Blu e- tooth (BT) protocol is employed in this work to handle transmissions of the fused recognition command. The BT connection tunnel is firstly established in an initialization process to form a peer-to-peer connection pair between the server (the command provider) and the end -device of the smart phone (the command receiver). For speed- ing up command transmissions via BT, a co m- mand table containing a series of labels, each label with a text form representing a corre- sponding voice command, is properly devised. The multimedia player application platform in the smart phone will be finely operated by “remotely sensed voice commands made by the speaking user” under the regulation of the presented method. Proceedings of Engineering and Technology Innovation, vol. 3, 2016, pp. 25 - 27 27 Copyright © TAETI 3. Conclusions In this paper, a wireless sensor net- work-speech recognition approach is presented by deploying the Kinect microphone ar- ray-sensors. Co mpared to conventional speech recognition, the presented frame work consider- ing sensor deployments, sensing data fusion, wireless communication scheme establishments, and possible e xtension application with p ractice provides an acoustic sensing way for command control in the application of internet of things. In addition, the presented approach with the use of Kinect sensors will a lso avoid the property of ‘surveillance’ and therefore can be much mo re acceptable by the users. Acknowledgement This research is partially supported by the Ministry of Sc ience and Technology (MOST) in Taiwan under Grant M OST 104-2815-C-150-01 7-E. References [1] I. J. Ding, C. T . Yen and D. C. Ou, “A method to integrate GMM, SVM and DTW for speaker recognition,” International Journal of Engineering and Technology Innovation, vol. 4, no. 1, pp. 38-47, 2014. [2] I. J. Ding and Y. M. Hsu, “An HMM -like dynamic t ime wa rping scheme for auto- mat ic s peech recognition,” Mathematica l Proble ms in Engineering, vo l. 2014, Artic le ID 898729, 8 pages , 2014. [3] I. Tashev, “Kinect development kit : a toolkit for gesture- and speech based hu- man-machine interaction,” IEEE Signal Processing Magazine, vol. 30, no. 5, pp. 129–131, 2013. [4] Z. Zhang, “M icrosoft kinect sensor and its effect,” IEEE Mult imedia , vol. 19, no. 2, pp. 4-10, 2012. [5] I. J. Ding and C. W. Chang, “An eigen- space-based method with a user adap tation scheme for hu man gesture recognition by using Kinect 3D data,” Applied Mathe- mat ical Modelling, vol. 39, no. 19, pp. 5769-5777, 2015. [6] I. J. Ding and C. W. Chang, “Feature design scheme for Kinect-based DTW human gesture recognition,” Multimedia Tools and Applications , pp. 1-16, July, 2015 [7] K. Qian, J. Niu and H. Yang, “Developing a gesture based re mote human-robot interac- tion system using Kinect,” International Journal of Smart Ho me, vol. 7, no. 4, pp. 203-208, 2013. [8] K. Ku matani, T. Ara kawa , K. Ya ma moto, J. McDonough, B. Ra j, R. Singh and I. Tashev, “Microphone array processing for distant speech recognition: towards rea l-world de- ployment,” Proc. Asia-Pacific Signal & Information Processing Association Annual Summit and Conference (APSIPA ASC), 2012.