A Robust Bimodal Speech Section Detection期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

A Robust Bimodal Speech Section Detection

Authors:	Kazumasa Murai Satoshi Nakamura

Affiliation:	1. ATR Spoken Language Telecommunications Research Laboratories, 2-2-2 Hikaridai Seika-cho, Soraku-gun, Kyoto, 619-0288, Japan

Abstract:	This paper discusses robust speech section detection by audio and video modalities. Most of today's speech recognition systems require speech section detection prior to any further analysis, and the accuracy of detected speech section s is said to affect the speech recognition accuracy. Because audio modalities are intrinsically disturbed by audio noise, we have been researching video modality speech section detection by detecting deformations in speech organ images. Video modalities are robust to audio noise, but their detection sections are longer than audio speech sections because deformations in related organs start before the speech to prepare for the articulation of the first phoneme, and also because the settling down motion lasts longer than the speech. We have verified that inaccurate detected sections caused by this excess length degrade the speech recognition rate, leading to speech recognition errors by insertions. To reduce insertion errors, and enhance the robustness of speech detection, we propose a method that takes advantage of the two types of modalities. According to our experiment, the proposed method is confirmed to reduce the insertion error rate as well as increase the recognition rate in noisy environment.

Keywords:
本文献已被 SpringerLink 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏