A computational auditory scene analysis system for speech segregation and robust speech recognition期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

A computational auditory scene analysis system for speech segregation and robust speech recognition

Authors:	Yang Shao Soundararajan Srinivasan Zhaozhang Jin DeLiang Wang

Affiliation:	^aDepartment of Computer Science and Engineering, The Ohio State University, Columbus, OH 43210, USA;^bBiomedical Engineering Department, The Ohio State University, Columbus, OH 43210, USA;^cCenter for Cognitive Science, The Ohio State University, Columbus, OH 43210, USA

Abstract:	A conventional automatic speech recognizer does not perform well in the presence of multiple sound sources, while human listeners are able to segregate and recognize a signal of interest through auditory scene analysis. We present a computational auditory scene analysis system for separating and recognizing target speech in the presence of competing speech or noise. We estimate, in two stages, the ideal binary time–frequency (T–F) mask which retains the mixture in a local T–F unit if and only if the target is stronger than the interference within the unit. In the first stage, we use harmonicity to segregate the voiced portions of individual sources in each time frame based on multipitch tracking. Additionally, unvoiced portions are segmented based on an onset/offset analysis. In the second stage, speaker characteristics are used to group the T–F units across time frames. The resulting masks are used in an uncertainty decoding framework for automatic speech recognition. We evaluate our system on a speech separation challenge and show that our system yields substantial improvement over the baseline performance.

Keywords:	Speech segregation Computational Auditory Scene Analysis Binary time– frequency mask Robust speech recognition Uncertainty decoding
本文献已被 ScienceDirect 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏