首页 | 本学科首页   官方微博 | 高级检索  
     

基于动态贝叶斯网络的音视频连续语音识别和音素切分
引用本文:吕国云,蒋冬梅,蒋晓悦,赵荣椿,侯云舒,孙阿利,H. Sahli,W. Verhelst.基于动态贝叶斯网络的音视频连续语音识别和音素切分[J].计算机应用,2007,27(7):1670-1673.
作者姓名:吕国云  蒋冬梅  蒋晓悦  赵荣椿  侯云舒  孙阿利  H. Sahli  W. Verhelst
作者单位:1. 西北工业大学,计算机学院,西安,710072
2. 布鲁塞尔自由大学 电子与信息处理系,比利时 布鲁塞尔 B-1050
基金项目:科技部与比利时弗拉芒大区科技合作项目 , 西北工业大学校科研和校改项目
摘    要:构造了两个单流单音素的动态贝叶斯网络(DBN)模型,以实现基于音频和视频特征的连续语音识别,并在描述词和对应音素具体关系的基础上,实现对音素的时间切分。实验结果表明,在基于音频特征的识别率方面:在低信噪比(0~15dB)时,DBN模型的识别率比HMM模型平均高12.79%;而纯净语音下,基于DBN模型的音素时间切分结果和三音素HMM模型的切分结果很接近。对基于视频特征的语音识别,DBN模型的识别率比HMM识别率高2.47%。实验最后还分析了音视频数据音素时间切分的异步关系,为基于多流DBN模型的音视频连续语音识别和确定音频和视频的异步关系奠定了基础。

关 键 词:动态贝叶斯网络  音视频  语音识别  音素切分
文章编号:1001-9081(2007)07-1670-04
收稿时间:2007-01-24
修稿时间:2007-01-24

Audio-video continuous speech recognition and phone segmentation based on dynamic Bayesian network
Lü Guo-yun,JIANG Dong-mei,JIANG Xiao-yue,ZHAO Rong-chun,HOU Yun-shu,SUN A-li,H. Sahli,W. Verhelst.Audio-video continuous speech recognition and phone segmentation based on dynamic Bayesian network[J].journal of Computer Applications,2007,27(7):1670-1673.
Authors:Lü Guo-yun  JIANG Dong-mei  JIANG Xiao-yue  ZHAO Rong-chun  HOU Yun-shu  SUN A-li  H Sahli  W Verhelst
Abstract:Two single stream dynamic Bayesian networks(DBN) models based on the improvement of the whole word state(WWS) DBN model were built for audio and video continuous speech recognition and phone segmentation.Different from the WWS DBN model in which a word is regarded as being composed of several states,directly,in our DBN model,a word is regarded as being composed of its corresponding phones.As a consequence,both world and phone level recognition results can be achieved with their time boundaries.The results of recognition and segmentation experiments show for recognition of audio stream,compared with hidden Markov model(HMM),DBN model has improvement of 12.79% in single noise ratio range from 0dB to 15dB.Segmentation results on audio show both DBN model and HMM have similar performance.Simultaneously,for recognition from video stream,compared with HMM,improvement of 2.47% is achieved.Finally,the asynchronous relationship of audio and video stream is analyzed,which provides the foundation for using the multi-stream DBN model to make audio-video continuous speech recognition and determine the asynchronous relationship of audio-video stream in the future.
Keywords:Dynamic Bayesian Networks(DBN)  audio-video  speech recognition  phone segmentation
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号