Microsoft Word - cet-01.docx CHEMICAL ENGINEERING TRANSACTIONS VOL. 46, 2015 A publication of The Italian Association of Chemical Engineering Online at www.aidic.it/cet Guest Editors: Peiyu Ren, Yancang Li. Huiping Song Copyright © 2015, AIDIC Servizi S.r.l., ISBN 978-88-95608-37-2; ISSN 2283-9216 An Improved Algorithm for Noise Reduction in Hearing Aids Yueqin Feng*, Guichen Tang, Ruiyu Liang, Qingyun W ang School of Communication Engineering, Nanjing Institute of Technology, Jiangsu, Nanjing, 211167, China fengyueqin@njit.edu.cn The noise reduction algorithm is a vital algorithm in hearing aids. To improve performances, an improved algorithm is proposed. For the basic perception filter, the masking tone will be audible after filtering. To solve this problem, one weighted perception filter algorithm for the simulation of the auditory mechanism is designed. Based on the basic perception filter, the priori SNR is firstly estimated by the direct decision method and then revised by the optimal smoothing factor method. The optimal value is obtained based on minimum mean square error criterion, and the gain of weighted perception filter is computed. Experiment results show that the proposed method can effectively eliminate the residual noise, reduce the interference, and improve the speech comprehension of hearing-impaired patients under the noisy environment. 1. Introduction How to improve the performance of hearing aids in the noisy environment is an important research direction. Many noise reduction methods have been proposed after decades of research. For single channel noise reduction algorithm, the problem needs to be solved is how to estimate desired speech signal from the signals corrupted by noise. The earliest method is the spectrum subtraction algorithm proposed by Boll in 1979 Boll (1979), which is computationally simple, but will bring out music noise. Afterwards, many improved algorithms have been proposed by researchers, such as the adaptive gain average spectrum subtraction algorithms proposed by Gaustafsson Gustafsson, Nordholm and Claesson (2001) reported, the Minimum Mean Square Error (MMSE) spectrum subtraction algorithms proposed Sim (1998) reported, etc. These algorithms have achieved a certain effect. Another common single channel noise reduction method is the Wiener filter method Wiener (1949) reported, which takes the Minimum Mean-Square Error between estimated signal and real signal as the optimum criterion. Then Lim and Oppenheim proposed aniterative Wiener filter method Lim (1979) reported. In 1984, Ephraim et al proposed an optimal amplitude spectral estimation method based on MMSE Ephraim (1984) reported. Because the human ear sound perception of loudness is in direct proportion to the spectrum amplitude, it is more appropriate to use logarithmic spectrum . Thus Ephraim and Malah proposed a MMSE estimation method based on logarithmic spectrum Ephraim and Malah (1985) reported. In addition, there are other improved forms of MMSE, such as MMSE estimation based on Laplace distribution proposed by Chen and Loizou, optimal modified logarithmic spectrum amplitude estimation proposed by Cohen, etc. Cohenand Berdugo (2001) reported. Because the human sound subjective perception is the result from the joint effect of human psychology and physiology, many studies for this research have been done. Now, the speech enhancement based on auditory masking model has been a hot research area. Virag et al. combined the auditory masking of human ear auditory system and the spectrum reduction algorithm to reduce the speech noise Virag (1999) reported. Loizou and Alam proposed a noise reduction method based on perceptual filter, by which the speech signal with noise is filtered by designing a perceptual filter Udrea (2008), Vizireanu and Ciochina reported. However, because of the variety of noise type and application occasions, no one is able to suppress all the noise. In addition, although the perceptual filter method performs better than the traditional noise reduction methods, it still has residual audible noise. The ultimate cause is: after the noise reduction by perceptual method, the speech element is reduced, which leads to the reduction of original masking threshold, so the inaudible noise element which is masked and not processed exceeds the masking threshold and become audible. Therefore, a perceptual filter algorithm based on the mechanism of auditory masking is proposed. The algorithm designs the weighting factor of the perceptual filter based on the principle of MMSE, so the estimation of prior SNR is modified, and the optimum gain function is achieved. For three noise types with different SNR, DOI: 10.3303/CET1546024 Please cite this article as: Feng Y.Q., Tang G.C., Liang R.Y., Wang Q.Y., 2015, An improved algorithm for noise reduction in hearing aids, Chemical Engineering Transactions, 46, 139-144 DOI:10.3303/CET1546024 139 file:///C:/-Users-Loujun-AppData-Local-Yodao-DeskDict-frame-20150717185120-javascript/void(0); three noise reduction algorithms are compared. The subjective and objective marker shows that the noise reduction performance of this algorithm is outstanding and the speech quality is good. 2. The principle of auditory masking The speech enhancement algorithm based on auditory masking does not need to eliminate all the noise, but just satisfies the condition that the residual noise is not perceived by human, which reduce the distortion of speech and improve the hearing comfort of human ear. Se t ( )y n to be the speech signal with noise, ( )x n to be the pure speech signal, ( )d n to be the additive noise signal, and suppose ( )x n and ( )d n are not relevant, then the speech signal with noise is: ( ) ( ) ( )y n x n d n  (1) After frame separation and Fourier transform, we get ( , ) ( , ) ( , )Y m k X m k D m k  (2) Where, ( , )Y m k is the amplitude spectrum of speech with noise, ( , )X m k is the amplitude spectrum of the pure speech, ( , )D m k is the amplitude spectrum of noise, m is the number of frame, k is the discrete frequency. Let ( , ) ( , ) ( , )X m k G m k Y m k    is the speech signal enhanced by the perceptual filter, ( , )G m k is the gain function of perceptual filter, then the error spectrum between enhanced speech and pure speech is:   ( , ) ( , ) ( , ) ( , ) 1 ( , ) ( , ) ( , ) ( , ) ( , ) X D m k X m k X m k G m k X m k G m k D m k m k m k              (3) Form formula (3) we can know that distortion of speech signal and distortion of noise signal is opposite with the variation trend of gain function ( , )G m k , and it is impossible to make both down simultaneously. Therefore, an ideal gain function needs to make a reasonable compromise between distortion of speech signal and distortion of noise signal. The purpose of perceptual filter is not to eliminate the residual noise completely, but keep the residual noise below the hearing threshold T by using the auditory masking of human ear and can not be heard by human ear, and make the distortion of speech signal minimum. As:    22 2 2 22 min 1 ( , ) ( , )min ( , ) constraint condition ( , ) ( , ) constraint condition ( , ) ( , ) ( , ) X D G m k X m km k m k T m k G m k D m k T m k           : : (4) Bring in a Lagrange factor ( , )m k , and ( , ) 0m k  . Let Lagrange cost function be:   2 2 2 22 2 ( , ) ( , ) ( , ) ( , ) ( , ) 1 ( , ) ( , ) ( , ) ( , ) ( , ) X D L m k m k m k T m k G m k X m k m k G m k D m k T m k                 (5) Let 0 L G    , then get: 2 2 2 ( , ) ( , ) ( , ) ( , ) ( , ) ( , ) ( , ) ( , ) X m k G m k X m k m k D m k m k m k m k         (6) Where, ( , )m k denotes the SNR of the m th frame. Put the above formula into the constraint condition 22 ( , ) ( , ) ( , )G m k D m k T m k , and ( , ) 0m k  , then get: 2 ( , ) ( , ) max 1, 0 ( , ) ( , ) D m k m k m k T m k             (7) As long as ( , )m k satisfies formula (7), the residual noise can be masked. Put formula (6) into 2 ( , ) X m k , then   222 2 2 ( , ) 1 ( , ) ( , ) ( , ) 1 ( , ) ( , ) ( , ) X m k G m k X m k m k X m k m k m k                (8) It can be inferred from formula (8) that with the increase of ( , )m k , 2 ( , ) X m k increases, namely the speech distortion increases. Therefore, in order to reduce the distortion of speech, take the minimum value of ( , )m k , 140 the gain function can be obtained: 2 2 1 ( , ) ( , ) min ,1 ( , )( , ) 1 max 1, 0 ( , ) T m k G m k D m kD m k T m k                 (9) The gain function in formula (9) is obtained by keeping the speech distortion minimum while keeping the residual noise below the masking threshold. 3. Improvement strategy When the power sprectrum 2( , )D m k of the k th discrete frequency point of the m th frame of noise is higher than the corresponding masking threshold ( , )T m k , we can obtain ( , ) 1G m k  , namely the k th discrete frequency point of the m th frame of speech with noise is suppressed; and for the frequency point of 2 ( , ) ( , )D m k T m k , ( , ) 1G m k  , which is output directly without any reduction process, namely only the noise above the masking threshold is filtered, and the noise below the masking threshold still exist. So the MAN phenomenon is generated. Based on the consideration of real-time and low-power of digital hearing aids, usually a Wiener filter based on prior SNR is used to suppress, because the Wiener filter is computationally simple and the residual noise of Wiener filter is similar with white noise to make the auditory perception good. Here, a weighting factor W is introduced and defined as: 2( , ) , ( , ) ( , ) ( , ) 1 ( , )( , )= 1, abs m k T m k D m k T m k m kW m k         ot her s (10) Where, ( , ) abs T m k is the absolute hearing threshold, ( , )m k is prior SNR. When noise power spectrum satisfied 2 ( , ) ( , ) ( , ) abs T m k D m k T m k  , suppress the speech with noise based on Wiener filter, so the MAN phenomenon is reduced and the speech quality is improved. Usually a decision-directed method is used to estimate the prior SNR ( , )m k . 2 2 ( 1, ) ˆ( , ) (1 ) max( ( , ) 1, 0), 0 1 ( 1, ) X m k m k m k E D m k                        (11) Where, 2 ( 1, )X m k   denotes the estimated pure speech spectrum of the ( 1)m  th frame, 2 ( 1, )E D m k        denotes the estimated noise spectrum of the ( 1)m  th frame,  is the smoothing factor, ( , )m k is the posterior SNR. The value of smoothing factor  is very important to estimate the prior SNR. Usually, it is constant close to 1. Literature Cappe (1994) reported denotes that the residual music noise is lower when the value of  is closer to 1, but it will lead to the transient distortion of final signal increase. In order to balance both aspects, the value of  is usually set to be a constant between 0.95-0.99. However, the fixed value of  is not able to handle the situation of the abrupt change of posterior SNR. Therefore, in order to solve the above problem, an optimal smoothing factor method is used, where  is no longer a fixed value, but a smoothing coefficient varying with time. Rewrite the original prior SNR smoothing formula (11) to the formula effected by time-varying smoothing factor:    ˆ( , ) , ( 1, ) 1 , max( ( , ) 1, 0)m k m k m k m k m k              (12) Where 2 2 ( 1, ) ( 1, ) ( 1, )m k X m k E D m k            . In order to reduce the error between estim ated value and real value, a MMSE criterion is used to update  ,m k . Mean square error (MSE) is defined as:   2 ˆ( , ) ( , ) ( 1, )J E m k m k m k         (13) 141 Substitute formula (13) in formula (14) and simplify, we can obtain:       2 2 22 , ( 1, ) ( , ) 1 , ( , ) 1J m k m k m k m k m k                 (14) Compute the partial derivative of J  , and let   0 , J m k      , the optimal value of   ,m k can be obtained as:   2 1 , ( , ) ( 1, ) 1 ( , ) 1 opt m k m k m k m k              (15) In practical application, because ( , )m k is unknown, we can substitute ( , )m k in the above formula with max( ( , ) 1, 0)m k  . In order to prevent the deadlock of  , 1m k  , the optimal smoothing factor  , opt m k is constrained below a max value max , for example max 0.998  . Equally, in order to ensure the smoothing performance under low posterior SNR, also set a lower limit value min of smoothing factor, for example, 0.3. The final gain function of perceptual weighting filter can be denoted as: ( , ) ( , ) ( , ) WPF PF G m k G m k W m k  (16) 4. Experimental simulation and analysis In order to measure noise reduction performance, the test is carried out under different noisy environment. Four noise types such as White noise, Pink noise, Speech Babble noise and HF channel noise in NOISE-92 noise database are applied in this experiment. The speech signals with noise at different SNR (-5dB, 0dB, 5dB, 10dB, 15dB) are obtained by superposition, and MOS method among subjective assessment methods as well as segSNR (segmental SNR) and WSS (Weighted-Slope Spectral distance) among objective assessment are used to analyze the performances of three methods. Both the pure speech and noise in the experiment are re-sampled as 16KHz, and a Hamming window is selected, the frame length is 512 sample points, the overlap between the frames is 50%. Wiener Filter (W F) method and Perceptual Filter (PF) method are used fo r comparison in the experiment. 4.1 Subjective assessment MOS score is a widely used subjective assessment method. Ten testers with normal hearing are invited to hear, and grade according to the MOS grading standard. Under White noise, Pink noise, Speech Babble noise and HFchannel noise, the MOS score results of three noise reduction methods are showed in Fig. 1. It is can be infer from Fig. 1 that the MOS score of W F is the lowest of three methods, because there is still residual music noise after WF noise reduction. There is almost no music noise to be heard after PF noise reduction, the MOS score of PF is higher than WF method. Because the MAN phenomenon of WPF is reduced based on PF method, the speech quality is improved. The proposed W PF method is more effective in noise reduction, and the speech hearing is better. -5 0 5 10 15 0 1 2 3 4 5 (a)White SNRin(dB) MO S -5 0 5 10 15 0 1 2 3 4 5 SNRin(dB) MO S (b)Pink WF PF WPF -5 0 5 10 15 0 1 2 3 4 5 (c)SpeechBabble SNRin(dB) MO S -5 0 5 10 15 0 1 2 3 4 5 (d)HFchannel SNRin(dB) MO S Figure 1: MOS comparisons 142 4.2 Objective assessment (1) segSNR Under W hite noise, Pink noise, SpeechBabble noise and HFchannel noise, the Segmental SNR of three no ise reduction methods are shown in Fig. 2: -5 0 5 10 15 -5 0 5 10 15 SNRin(dB) se gS NR (d B) (a)White -5 0 5 10 15 -5 0 5 10 15 SNRin(dB) se gS NR (d B) (b)Pink WF PF WPF -5 0 5 10 15 -5 0 5 10 15 SNRin(dB) se gS NR (d B) (c)SpeechBabble -5 0 5 10 15 -5 0 5 10 15 SNRin(dB) se gS NR (d B) (d)HFchannel Figure 2: Segmental SNR comparisons (2) W SS -10 0 10 20 20 40 60 80 100 120 SNRin(dB) WS S (a)White -10 0 10 20 20 40 60 80 100 120 SNRin(dB) WS S (b)Pink -10 0 10 20 20 40 60 80 100 120 SNRin(dB) WS S (c)SpeechBabble -10 0 10 20 20 40 60 80 100 120 SNRin(dB) WS S (d)HFchannel WF PF WPF Figure 3: WSS comparisons Under White noise, Pink noise, SpeechBabble noise and HFchannel noise, WSS of three me thods are shown in Fig. 3. From the objective assessments, the WPF noise reduction method is superior to WF method and PF method, the speech after WPF noise reduction is closer to pure speech, the distortion is less, and the speech quality is better. A better performance in back ground noise eliminate, residual noise suppression and speech distortion reduction is achieved. 5. Conclusions A weighting perceptual filter algorithm imitated hearing masking is proposed. Compared to the traditional speech noise reduction algorithm, real time weighting factor is introduced to improve the speech noise reduction performance, speech quality and naturalness, and the speech recognition rate of hearing -impaired 143 persons, which has a high reference value and guidance for subsequent digital hearing aids designing and application. Acknowledgments The work was supported by the National Natural Science Foundation of China under Grant No. 61301219, No. 61375028 and No. 61301295, the Natural Science Foundation of Jiangsu Province under Grant No. BK20130241. The authors would like to thank the reviewers for their valuable suggestions and comments. References Boll S., 1979. Suppression of acoustic noise in speech using spectral subtraction [J]. Acoustics, Speech and Signal Processing, IEEE Transactions on, 27(2): 113-120. DOI: 10.1109/TASSP.1979.1163209 Cappe O., 1994. Elimination of the musical noise phenomenon with the Ephraim and Malah noise suppressor [J]. IEEE Transactions on Speech and Audio Processing, 2(2): 345-349. DOI: 10.1109/89.279283 Chen B., Loizou P.C., 2007. A Laplacian-based MMSE estimator for speech enhancement [J]. Speech communication, 49(2): 134-143. DOI: 10.1016/j.specom.2006.12.005 Cohen I., Berdugo B., 2001. Speech enhancement for non-stationary noise environments [J]. Signal processing, 81(11): 2403-2418. DOI: 10.1016/S0165-1684(01)00128-1 Ephraim Y., Malah D., 1984. Speech enhancement using a minimum -mean square error short-time spectral amplitude estimator [J]. Acoustics, Speech and Signal Processing, IEEE Transactions on, 32(6): 1109-1121. DOI: 10.1109/TASSP.1984.1164453 Ephraim Y., Malah D., 1985. Speech enhancement using a minimum mean-square error log-spectral amplitude estimator [J]. Acoustics, Speech and Signal Processing, IEEE Transactions on, 33(2): 443-445. DOI: 10.1109/TASSP.1985.1164550 Gustafsson H., Nordholm S.E., Claesson I., 2001. Spectral subtraction using reduced delay convolution and adaptive averaging [J]. Speech and Audio Processing, IEEE Transactions on, 9(8): 799 -807. DOI: 10.1109/89.966083 Lim J S, Oppenheim A.V., 1979, Enhancement and bandwidth compression of noisy speech [J]. Proceedings of the IEEE, 67(12): 1586-1604. DOI: 10.1109/PROC.1979.11540 Sim B L, Tong Y.C., Chang J.S., et al. 1998, A parametric formulation of the generalized spectral subtraction method [J]. Speech and Audio Processing, IEEE Transactions on, 6(4): 328-337. DOI: 10.1109/89.701361 Udrea R.M., Vizireanu N.D., Ciochina S., 2008. An improved spectral subtraction method for speech enhancement using a perceptual weighting filter [J]. Digital Signal Processing, 18(4): 581 -587. DOI: 10.1016/j.dsp.2007.08.002 Virag N., 1999. Single channel speech enhancement based on masking properties of the human auditory system [J]. Speech and Audio Processing, IEEE Transactions on, 7(2): 126-137. DOI: 10.1109/89.748118 Wiener N. 1949. Extrapolation, interpolation, and smoothing of stationary time series [M]. Cambridge, MA: MIT press. [3] 144 http://dx.doi.org/10.1109/TASSP.1979.1163209 http://dx.doi.org/10.1109/89.279283 http://dx.doi.org/10.1016/j.specom.2006.12.005 http://dx.doi.org/10.1016/S0165-1684(01)00128-1 http://dx.doi.org/10.1109/TASSP.1984.1164453 http://dx.doi.org/10.1109/TASSP.1985.1164550 http://dx.doi.org/10.1109/89.966083 http://dx.doi.org/10.1016/j.dsp.2007.08.002 http://dx.doi.org/10.1109/89.748118