Temporal modulation normalization for robust speech feature extraction and recognition期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Temporal modulation normalization for robust speech feature extraction and recognition

Authors:	Xugang Lu Shigeki Matsuda Masashi Unoki Satoshi Nakamura

Affiliation:	(1) National Institute of Information and Communications Technology, Tokyo 184-8795, Japan;(2) Japan Advanced Institute of Science and Technology, Ishikawa 923-1292, Japan

Abstract:	Speech signals are produced by the articulatory movements with a certain modulation structure constrained by the regular phonetic sequences. This modulation structure encodes most of the speech intelligibility information that can be used to discriminate the speech from noise. In this study, we proposed a noise reduction algorithm based on this speech modulation property. Two steps are involved in the proposed algorithm: one is the temporal modulation contrast normalization, another is the modulation events preserved smoothing. The purpose for these processing is to normalize the modulation contrast of the clean and noisy speech to be in the same level, and to smooth out the modulation artifacts caused by noise interferences. Since our proposed method can be used independently for noise reduction, it can be combined with the traditional noise reduction methods to further reduce the noise effect. We tested our proposed method as a front-end for robust speech recognition on the AURORA-2J data corpus. Two advanced noise reduction methods, ETSI advanced front-end (AFE) method, and particle filtering (PF) with minimum mean square error (MMSE) estimation method, are used for comparison and combinations. Experimental results showed that, as an independent front-end processor, our proposed method outperforms the advanced methods, and as combined front-ends, further improved the performance consistently than using each method independently.

Keywords:
本文献已被 SpringerLink 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏