Monaural speech separation based on MAXVQ and CASA for robust speech recognition期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Monaural speech separation based on MAXVQ and CASA for robust speech recognition

Authors:	Peng Li Yong Guan Shijin Wang Bo Xu Wenju Liu

Affiliation:	^aDigital Content Technology Research Centre, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China;^bNational Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China

Abstract:	Robustness is one of the most important topics for automatic speech recognition (ASR) in practical applications. Monaural speech separation based on computational auditory scene analysis (CASA) offers a solution to this problem. In this paper, a novel system is presented to separate the monaural speech of two talkers. Gaussian mixture models (GMMs) and vector quantizers (VQs) are used to learn the grouping cues on isolated clean data for each speaker. Given an utterance, speaker identification is firstly performed to identify the two speakers presented in the utterance, then the factorial-max vector quantization model (MAXVQ) is used to infer the mask signals and finally the utterance of the target speaker is resynthesized in the CASA framework. Recognition results on the 2006 speech separation challenge corpus prove that this proposed system can improve the robustness of ASR significantly.

Keywords:	Monaural speech separation Computational auditory scene analysis (CASA) Factorial-max vector quantization (MAXVQ) Automatic speech recognition (ASR)
本文献已被 ScienceDirect 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏