首页 | 本学科首页   官方微博 | 高级检索  
     


A Robust Bimodal Speech Section Detection
Authors:Kazumasa Murai  Satoshi Nakamura
Affiliation:1. ATR Spoken Language Telecommunications Research Laboratories, 2-2-2 Hikaridai Seika-cho, Soraku-gun, Kyoto, 619-0288, Japan
Abstract:This paper discusses robust speech section detection by audio and video modalities. Most of today's speech recognition systems require speech section detection prior to any further analysis, and the accuracy of detected speech section s is said to affect the speech recognition accuracy. Because audio modalities are intrinsically disturbed by audio noise, we have been researching video modality speech section detection by detecting deformations in speech organ images. Video modalities are robust to audio noise, but their detection sections are longer than audio speech sections because deformations in related organs start before the speech to prepare for the articulation of the first phoneme, and also because the settling down motion lasts longer than the speech. We have verified that inaccurate detected sections caused by this excess length degrade the speech recognition rate, leading to speech recognition errors by insertions. To reduce insertion errors, and enhance the robustness of speech detection, we propose a method that takes advantage of the two types of modalities. According to our experiment, the proposed method is confirmed to reduce the insertion error rate as well as increase the recognition rate in noisy environment.
Keywords:
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号