Microsoft Word - cet-01.docx


 CHEMICAL ENGINEERING TRANSACTIONS  
 

VOL. 46, 2015 

A publication of 

 
The Italian Association 
of Chemical Engineering  
Online at www.aidic.it/cet 

Guest Editors: Peiyu Ren, Yancang Li. Huiping Song 
Copyright © 2015, AIDIC Servizi S.r.l.,  
ISBN 978-88-95608-37-2; ISSN 2283-9216                                                                                
 

An Improved Algorithm for Noise Reduction in Hearing Aids 
Yueqin Feng*, Guichen Tang, Ruiyu Liang, Qingyun W ang 

School of Communication Engineering, Nanjing Institute of Technology, Jiangsu, Nanjing, 211167, China 
fengyueqin@njit.edu.cn 

The noise reduction algorithm is a vital algorithm in hearing aids. To improve performances, an improved 
algorithm is proposed. For the basic perception filter, the masking tone will be audible after filtering. To solve 
this problem, one weighted perception filter algorithm for the simulation of the auditory mechanism is designed. 
Based on the basic perception filter, the priori SNR is firstly estimated by the direct decision method and then 
revised by the optimal smoothing factor method. The optimal value is obtained based on minimum mean 
square error criterion, and the gain of weighted perception filter is computed. Experiment results show that the 
proposed method can effectively eliminate the residual noise, reduce the interference, and improve the speech 
comprehension of hearing-impaired patients under the noisy environment. 

1. Introduction 

How to improve the performance of hearing aids in the noisy environment is an important research direction. 
Many noise reduction methods have been proposed after decades of research. For single channel noise 
reduction algorithm, the problem needs to be solved is how to estimate desired speech signal from the signals 
corrupted by noise. The earliest method is the spectrum subtraction algorithm proposed by Boll in 1979  Boll 
(1979), which is computationally simple, but will bring out music noise. Afterwards, many improved algorithms 
have been proposed by researchers, such as the adaptive gain average spectrum subtraction algorithms 
proposed by Gaustafsson Gustafsson, Nordholm and Claesson (2001) reported, the Minimum Mean Square 
Error (MMSE) spectrum subtraction algorithms proposed Sim  (1998) reported, etc. These algorithms have 
achieved a certain effect. Another common single channel noise reduction method is the Wiener filter method 
Wiener (1949) reported, which takes the Minimum Mean-Square Error between estimated signal and real 
signal as the optimum criterion. Then Lim and Oppenheim proposed aniterative Wiener filter method Lim (1979) 
reported. In 1984, Ephraim et al proposed an optimal amplitude spectral estimation method based on MMSE 
Ephraim (1984) reported. Because the human ear sound perception of loudness is in direct proportion to the 
spectrum amplitude, it is more appropriate to use logarithmic spectrum . Thus Ephraim and Malah proposed a 
MMSE estimation method based on logarithmic spectrum Ephraim  and Malah (1985) reported. In addition, 
there are other improved forms of MMSE, such as MMSE estimation based on Laplace distribution proposed 
by Chen and Loizou, optimal modified logarithmic spectrum amplitude estimation proposed by Cohen, etc. 
Cohenand Berdugo (2001) reported. 
Because the human sound subjective perception is the result from the joint effect of human psychology and 
physiology, many studies for this research have been done. Now, the speech enhancement based on auditory 
masking model has been a hot research area. Virag et al. combined the auditory masking of human ear 
auditory system and the spectrum reduction algorithm to reduce the speech noise Virag (1999) reported. 
Loizou and Alam proposed a noise reduction method based on perceptual filter, by which the speech signal 
with noise is filtered by designing a perceptual filter Udrea (2008), Vizireanu and Ciochina reported. However, 
because of the variety of noise type and application occasions, no one is able to suppress all the noise. In 
addition, although the perceptual filter method performs better than the traditional noise  reduction methods, it 
still has residual audible noise. The ultimate cause is: after the noise reduction by perceptual method, the 
speech element is reduced, which leads to the reduction of original masking threshold, so the inaudible noise 
element which is masked and not processed exceeds the masking threshold and become audible. 
Therefore, a perceptual filter algorithm based on the mechanism of auditory masking is proposed. The 
algorithm designs the weighting factor of the perceptual filter based on the principle of MMSE, so the estimation 
of prior SNR is modified, and the optimum gain function is achieved. For three noise types with different SNR, 

                               
DOI: 10.3303/CET1546024

 
Please cite this article as: Feng Y.Q., Tang G.C., Liang R.Y., Wang Q.Y., 2015, An improved algorithm for noise reduction in hearing aids, 
Chemical Engineering Transactions, 46, 139-144  DOI:10.3303/CET1546024  

139

file:///C:/-Users-Loujun-AppData-Local-Yodao-DeskDict-frame-20150717185120-javascript/void(0);


three noise reduction algorithms are compared. The subjective and objective marker shows that the noise 
reduction performance of this algorithm is outstanding and the speech quality is good. 

2. The principle of auditory masking 

The speech enhancement algorithm based on auditory masking does not need to eliminate all the noise, but 
just satisfies the condition that the residual noise is not perceived by human, which reduce the distortion of 
speech and improve the hearing comfort of human ear. 
Se t ( )y n  to be the speech signal with noise, ( )x n  to be the pure speech signal, ( )d n  to be the additive 
noise signal, and suppose ( )x n  and ( )d n  are not relevant, then the speech signal with noise is: 

( ) ( ) ( )y n x n d n   
(1) 

After frame separation and Fourier transform, we get 

( , ) ( , ) ( , )Y m k X m k D m k   
(2)

Where, ( , )Y m k  is the amplitude spectrum of speech with noise, ( , )X m k  is the amplitude spectrum of the pure 

speech, ( , )D m k  is the amplitude spectrum of noise, m  is the number of frame, k  is the discrete frequency. 
Let ( , ) ( , ) ( , )X m k G m k Y m k



   is the speech signal enhanced by the perceptual filter, ( , )G m k  is the gain function of 
perceptual filter, then the error spectrum between enhanced speech and pure speech is: 

 

( , ) ( , ) ( , )

( , ) 1 ( , ) ( , ) ( , )

( , ) ( , )
X D

m k X m k X m k

G m k X m k G m k D m k

m k m k



 



 

    

 

   
           (3) 

Form formula (3) we can know that distortion of speech signal and distortion of noise signal is opposite with the 

variation trend of gain function ( , )G m k , and it is impossible to make both down simultaneously. Therefore, an 
ideal gain function needs to make a reasonable compromise between distortion of speech signal and distortion 
of noise signal. The purpose of perceptual filter is not to eliminate the residual noise completely, but keep the 
residual noise below the hearing threshold T  by using the auditory masking of human ear and can not be 
heard by human ear, and make the distortion of speech signal minimum. As: 

  
22

2

2
22

min 1 ( , ) ( , )min ( , )

constraint condition ( , ) ( , ) constraint condition ( , ) ( , ) ( , )

X

D

G m k X m km k

m k T m k G m k D m k T m k





  
 

  
： ：

 
(4) 

Bring in a Lagrange factor ( , )m k , and ( , ) 0m k  . Let Lagrange cost function be: 

 

2 2

2 22 2

( , ) ( , ) ( , ) ( , )

( , ) 1 ( , ) ( , ) ( , ) ( , ) ( , )

X D
L m k m k m k T m k

G m k X m k m k G m k D m k T m k

  



    

    
 

 
(5) 

Let 
0

L

G




 , then get:
 

2

2 2

( , )
( , )

( , ) ( , ) ( , )

( , )

( , ) ( , )

X m k
G m k

X m k m k D m k

m k

m k m k





 







 
(6) 

Where, ( , )m k  denotes the SNR of the 
m  th frame. Put the above formula into the constraint condition 

22
( , ) ( , ) ( , )G m k D m k T m k , and 

( , ) 0m k  , then get: 
2

( , )
( , ) max 1, 0 ( , )

( , )

D m k
m k m k

T m k
 

 
   
 
 

 
(7) 

As long as ( , )m k  satisfies formula (7), the residual noise can be masked. Put formula (6) into 
2

( , )
X

m k , then  

 
222

2

2

( , ) 1 ( , ) ( , )

( , )
1 ( , )

( , ) ( , )

X
m k G m k X m k

m k
X m k

m k m k





 

  

 
   

 

 
(8) 

It can be inferred from formula (8) that with the increase of ( , )m k , 
2

( , )
X

m k  increases, namely the speech 
distortion increases. Therefore, in order to reduce the distortion of speech, take the minimum value of ( , )m k , 

140


the gain function can be obtained: 

2
2

1 ( , )
( , ) min ,1

( , )( , )
1 max 1, 0

( , )

T m k
G m k

D m kD m k

T m k

 
  
  
   

 
 

 
(9) 

The gain function in formula (9) is obtained by keeping the speech distortion minimum while keeping the 
residual noise below the masking threshold. 

3. Improvement strategy 

When the power sprectrum 2( , )D m k  of the 
k  th discrete frequency point of the m  th frame of noise is higher 

than the corresponding masking threshold ( , )T m k , we can obtain ( , ) 1G m k  , namely the k th discrete 
frequency point of the m th frame of speech with noise is suppressed; and for the frequency point of 

2
( , ) ( , )D m k T m k , ( , ) 1G m k  , which is output directly without any reduction process, namely only the noise above  

the masking threshold is filtered, and the noise below the masking threshold still exist. So the MAN 
phenomenon is generated. Based on the consideration of real-time and low-power of digital hearing aids, 
usually a Wiener filter based on prior SNR is used to suppress, because the Wiener filter is computationally 
simple and the residual noise of Wiener filter is similar with white noise to make the auditory perception good. 

Here, a weighting factor W  is introduced and defined as: 

2( , )
, ( , ) ( , ) ( , )

1 ( , )( , )=

1,

abs

m k
T m k D m k T m k

m kW m k






 



 ot her s 

  
         (10) 

 
Where, ( , )
abs

T m k  is the absolute hearing threshold, ( , )m k  is prior SNR. When noise power spectrum satisfied 
2

( , ) ( , ) ( , )
abs

T m k D m k T m k  , suppress the speech with noise based on  Wiener filter, so the MAN phenomenon is 
reduced and the speech quality is improved. 

Usually a decision-directed method is used to estimate the prior SNR ( , )m k . 

2

2

( 1, )
ˆ( , ) (1 ) max( ( , ) 1, 0), 0 1

( 1, )

X m k

m k m k

E D m k

    







       
 

 
  

 
(11) 

Where,
2

( 1, )X m k


  denotes the estimated pure speech spectrum of the 
( 1)m   th frame, 2

( 1, )E D m k
 

 
  

denotes 

the estimated noise spectrum of the 
( 1)m   th frame,   is the smoothing factor, ( , )m k  is the posterior SNR. 

The value of smoothing factor   is very important to estimate the prior SNR. Usually, it is constant close to 1. 

Literature Cappe (1994) reported denotes that the residual music noise is lower when the value of   is closer 

to 1, but it will lead to the transient distortion of final signal increase. In order to balance both aspects, the value 

of   is usually set to be a constant between 0.95-0.99. However, the fixed value of   is not able to handle 

the situation of the abrupt change of posterior SNR. Therefore, in order to solve the above problem, an optimal 

smoothing factor method is used, where  is no longer a fixed value, but a smoothing coefficient varying with 

time.  

Rewrite the original prior SNR smoothing formula (11) to the formula effected by time-varying smoothing factor: 
   ˆ( , ) , ( 1, ) 1 , max( ( , ) 1, 0)m k m k m k m k m k                  

           (12) 

Where 
2 2

( 1, ) ( 1, ) ( 1, )m k X m k E D m k
  

    
  

. 

In order to reduce the error between estim ated value and real value, a MMSE criterion is used to update  ,m k .
 

Mean square error (MSE) is defined as: 

 
2

ˆ( , ) ( , ) ( 1, )J E m k m k m k      
   

(13) 

141


Substitute formula (13) in formula (14) and simplify, we can obtain: 

     
2 2 22

, ( 1, ) ( , ) 1 , ( , ) 1J m k m k m k m k m k                  
(14) 

Compute the partial derivative of J  , and let  
0

,

J

m k








 , the optimal value of  
,m k

 can be obtained as: 

 
2

1
,

( , ) ( 1, )
1

( , ) 1

opt
m k

m k m k

m k


 




  

  
   

(15) 

In practical application, because ( , )m k  is unknown, we can substitute ( , )m k  in the above formula 

with max( ( , ) 1, 0)m k  . In order to prevent the deadlock of  , 1m k  , the optimal smoothing factor  ,
opt

m k  is 
constrained below a max value max , for example max 0.998  . Equally, in order to ensure the smoothing 

performance under low posterior SNR, also set a lower limit value min  of smoothing factor, for example, 0.3. 

The final gain function of perceptual weighting filter can be denoted as: 
( , ) ( , ) ( , )

WPF PF
G m k G m k W m k                                        (16) 

4. Experimental simulation and analysis 

In order to measure noise reduction performance, the test is carried out under different noisy environment. Four 
noise types such as White noise, Pink noise, Speech Babble noise and HF channel noise in NOISE-92 noise 
database are applied in this experiment. The speech signals with noise at different SNR (-5dB, 0dB, 5dB, 10dB, 
15dB) are obtained by superposition, and MOS method among subjective assessment methods as well as 
segSNR (segmental SNR) and WSS (Weighted-Slope Spectral distance) among objective assessment are 
used to analyze the performances of three methods. Both the pure speech and noise in the experiment are 
re-sampled as 16KHz, and a Hamming window is selected, the frame length is 512 sample points, the overlap 
between the frames is 50%. Wiener Filter (W F) method and Perceptual Filter (PF) method are used fo r 
comparison in the experiment.   

4.1 Subjective assessment 

MOS score is a widely used subjective assessment method. Ten testers with normal hearing are invited to hear, 
and grade according to the MOS grading standard. Under White noise, Pink noise, Speech Babble noise and 
HFchannel noise, the MOS score results of three noise reduction methods are showed in Fig. 1. 
It is can be infer from Fig. 1 that the MOS score of W F is the lowest of three methods, because there is still 
residual music noise after WF noise reduction. There is almost no music noise to be heard after PF noise 
reduction, the MOS score of PF is higher than WF method. Because the MAN phenomenon of WPF is reduced 
based on PF method, the speech quality is improved. The proposed W PF method is more effective in noise 
reduction, and the speech hearing is better. 

-5 0 5 10 15
0

1

2

3

4

5

(a)White

SNRin(dB)

MO
S

-5 0 5 10 15
0

1

2

3

4

5

SNRin(dB)

MO
S

(b)Pink

 
WF

PF

WPF

-5 0 5 10 15
0

1

2

3

4

5

(c)SpeechBabble

SNRin(dB)

MO
S

-5 0 5 10 15
0

1

2

3

4

5

(d)HFchannel

SNRin(dB)

MO
S

 
Figure 1: MOS comparisons 

142


4.2 Objective assessment 

(1) segSNR 
Under W hite noise, Pink noise, SpeechBabble noise and HFchannel noise, the Segmental SNR of three no ise 
reduction methods are shown in Fig. 2: 

-5 0 5 10 15
-5

0

5

10

15

SNRin(dB)

se
gS

NR
(d

B)

(a)White

-5 0 5 10 15
-5

0

5

10

15

SNRin(dB)

se
gS

NR
(d

B)

(b)Pink

 
WF

PF

WPF

-5 0 5 10 15
-5

0

5

10

15

SNRin(dB)

se
gS

NR
(d

B)

(c)SpeechBabble

-5 0 5 10 15
-5

0

5

10

15

SNRin(dB)
se

gS
NR

(d
B)

(d)HFchannel

 
Figure 2: Segmental SNR comparisons 

(2) W SS 

-10 0 10 20
20

40

60

80

100

120

SNRin(dB)

WS
S

(a)White

-10 0 10 20
20

40

60

80

100

120

SNRin(dB)

WS
S

(b)Pink

 
-10 0 10 20
20

40

60

80

100

120

SNRin(dB)

WS
S

(c)SpeechBabble

-10 0 10 20
20

40

60

80

100

120

SNRin(dB)

WS
S

(d)HFchannel

WF

PF

WPF

Figure 3: WSS comparisons 

Under White noise, Pink noise, SpeechBabble noise and HFchannel noise, WSS of three me thods are shown 
in Fig. 3. 
From the objective assessments, the WPF noise reduction method is superior to WF method and PF method, 
the speech after WPF noise reduction is closer to pure speech, the distortion is less, and the speech quality is 
better. A better performance in back ground noise eliminate, residual noise suppression and speech distortion 
reduction is achieved.  

5. Conclusions 

A weighting perceptual filter algorithm imitated hearing masking is proposed. Compared to the traditional 
speech noise reduction algorithm, real time weighting factor is introduced to improve the speech noise 
reduction performance, speech quality and naturalness, and the speech recognition rate of hearing -impaired 

143


persons, which has a high reference value and guidance for subsequent digital hearing aids designing and 
application. 

Acknowledgments 

The work was supported by the National Natural Science Foundation of China under Grant No. 61301219, No. 
61375028 and No. 61301295, the Natural Science Foundation of Jiangsu Province under Grant No. 
BK20130241. The authors would like to thank the reviewers for their valuable suggestions and comments. 

References 

Boll S., 1979. Suppression of acoustic noise in speech using spectral subtraction [J]. Acoustics, Speech and 
Signal Processing, IEEE Transactions on, 27(2): 113-120. DOI: 10.1109/TASSP.1979.1163209 

Cappe O., 1994. Elimination of the musical noise phenomenon with the Ephraim and Malah noise suppressor 
[J]. IEEE Transactions on Speech and Audio Processing, 2(2): 345-349. DOI: 10.1109/89.279283 

Chen B., Loizou P.C., 2007. A Laplacian-based MMSE estimator for speech enhancement [J]. Speech 
communication, 49(2): 134-143. DOI: 10.1016/j.specom.2006.12.005 

Cohen I., Berdugo B., 2001. Speech enhancement for non-stationary noise environments [J]. Signal processing, 
81(11): 2403-2418. DOI: 10.1016/S0165-1684(01)00128-1 

Ephraim Y., Malah D., 1984. Speech enhancement using a minimum -mean square error short-time spectral 
amplitude estimator [J]. Acoustics, Speech and Signal Processing, IEEE Transactions on, 32(6): 1109-1121. 
DOI: 10.1109/TASSP.1984.1164453 

Ephraim Y., Malah D., 1985. Speech enhancement using a minimum mean-square error log-spectral amplitude 
estimator [J]. Acoustics, Speech and Signal Processing, IEEE Transactions on, 33(2): 443-445. DOI: 
10.1109/TASSP.1985.1164550 

Gustafsson H., Nordholm S.E., Claesson I., 2001. Spectral subtraction using reduced delay convolution and 
adaptive averaging [J]. Speech and Audio Processing, IEEE Transactions on, 9(8): 799 -807. DOI: 
10.1109/89.966083 

Lim J S, Oppenheim A.V., 1979, Enhancement and bandwidth compression of noisy speech [J]. Proceedings of 
the IEEE, 67(12): 1586-1604. DOI: 10.1109/PROC.1979.11540 

Sim B L, Tong Y.C., Chang J.S., et al. 1998, A parametric formulation of the generalized spectral subtraction 
method [J]. Speech and Audio Processing, IEEE Transactions on, 6(4): 328-337. DOI: 10.1109/89.701361  

Udrea R.M., Vizireanu N.D., Ciochina S., 2008. An improved spectral subtraction method for speech 
enhancement using a perceptual weighting filter [J]. Digital Signal Processing, 18(4): 581 -587. DOI: 
10.1016/j.dsp.2007.08.002 

Virag N., 1999. Single channel speech enhancement based on masking properties of the human auditory 
system [J]. Speech and Audio Processing, IEEE Transactions on, 7(2): 126-137. DOI: 10.1109/89.748118 

Wiener N. 1949. Extrapolation, interpolation, and smoothing of stationary time series  [M]. Cambridge, MA: MIT 
press. [3] 

144

http://dx.doi.org/10.1109/TASSP.1979.1163209
http://dx.doi.org/10.1109/89.279283
http://dx.doi.org/10.1016/j.specom.2006.12.005
http://dx.doi.org/10.1016/S0165-1684(01)00128-1
http://dx.doi.org/10.1109/TASSP.1984.1164453
http://dx.doi.org/10.1109/TASSP.1985.1164550
http://dx.doi.org/10.1109/89.966083
http://dx.doi.org/10.1016/j.dsp.2007.08.002
http://dx.doi.org/10.1109/89.748118