首页 | 本学科首页   官方微博 | 高级检索  
     

基于计算听觉场景分析和语者模型信息的语音识别鲁棒前端研究
引用本文:关勇,李鹏,刘文举,徐波.基于计算听觉场景分析和语者模型信息的语音识别鲁棒前端研究[J].自动化学报,2009,35(4):410-416.
作者姓名:关勇  李鹏  刘文举  徐波
作者单位:1.中国科学院自动化研究所模式识别国家重点实验室 北京 100190
基金项目:国家重点基础研究发展规划(973计划),国家自然科学基金,国家高技术研究发展计划(863计划) 
摘    要:传统抗噪算法无法解决人声背景下语音识别(Automatic speech recognition, ASR)系统的鲁棒性问题. 本文提出了一种基于计算听觉场景分析(Computational auditory scene analysis, CASA)和语者模型信息的混合语音分离系统. 该系统在CASA框架下, 利用语者模型信息和因子最大矢量量化(Factorial-max vector quantization, MAXVQ)方法进行实值掩码估计, 实现了两语者混合语音中有效地分离出目标说话人语音的目标, 从而为ASR系统提供了鲁棒的识别前端. 在语音分离挑战(Speech separation challenge, SSC)数据集上的评估表明, 相比基线系统, 本文所提出的系统的语音识别正确率提高了15.68%. 相关的实验结果也验证了本文提出的多语者识别和实值掩码估计的有效性.

关 键 词:计算听觉场景分析    语音分离    鲁棒语音识别    因子最大矢量量化    语者识别
收稿时间:2007-12-18
修稿时间:2008-3-12

A Robust Front-end for Speech Recognition Based on Computational Auditory Scene Analysis and Speaker Model
GUAN Yong LI Peng LIU Wen-Ju XU Bo .National Laboratory of Pattern Recognition.A Robust Front-end for Speech Recognition Based on Computational Auditory Scene Analysis and Speaker Model[J].Acta Automatica Sinica,2009,35(4):410-416.
Authors:GUAN Yong LI Peng LIU Wen-Ju XU Bo National Laboratory of Pattern Recognition
Affiliation:1.National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing 100190;2.Nokia Research Center, Beijing 100176;3.Digital Content Technology Research Center, Institute of Automation, Chinese Academy of Sciences, Beijing 100190
Abstract:Conventional noise robust speech recognition system does not work well when human speech is presented in the background. In this paper, a computational auditory scene analysis (CASA) and speaker model based speech segregation system is proposed to solve this problem. By utilizing speaker model and factorial-max vector quantization (MAXVQ) to estimate real-value masks in CASA framework, a robust front-end for speech recognition is constructed. Evaluations on speech separation challenge (SSC) showed that the proposed system won 15.68% improvement over the baseline system. The results of evaluation also proved the validity of the multi-speaker recognition and the real-value mask estimation module.
Keywords:Computational auditory scene analysis (CASA)  speech segregation  robust speech recognition  factorial-max vector quantization (MAXVQ)  speaker recognition
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《自动化学报》浏览原始摘要信息
点击此处可从《自动化学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号