Temporal modulation normalization for robust speech feature extraction and recognition |
| |
Authors: | Xugang Lu Shigeki Matsuda Masashi Unoki Satoshi Nakamura |
| |
Affiliation: | (1) National Institute of Information and Communications Technology, Tokyo 184-8795, Japan;(2) Japan Advanced Institute of Science and Technology, Ishikawa 923-1292, Japan |
| |
Abstract: | Speech signals are produced by the articulatory movements with a certain modulation structure constrained by the regular phonetic
sequences. This modulation structure encodes most of the speech intelligibility information that can be used to discriminate
the speech from noise. In this study, we proposed a noise reduction algorithm based on this speech modulation property. Two
steps are involved in the proposed algorithm: one is the temporal modulation contrast normalization, another is the modulation
events preserved smoothing. The purpose for these processing is to normalize the modulation contrast of the clean and noisy
speech to be in the same level, and to smooth out the modulation artifacts caused by noise interferences. Since our proposed
method can be used independently for noise reduction, it can be combined with the traditional noise reduction methods to further
reduce the noise effect. We tested our proposed method as a front-end for robust speech recognition on the AURORA-2J data
corpus. Two advanced noise reduction methods, ETSI advanced front-end (AFE) method, and particle filtering (PF) with minimum
mean square error (MMSE) estimation method, are used for comparison and combinations. Experimental results showed that, as
an independent front-end processor, our proposed method outperforms the advanced methods, and as combined front-ends, further
improved the performance consistently than using each method independently. |
| |
Keywords: | |
本文献已被 SpringerLink 等数据库收录! |
|