基于发音特征的音视频说话人识别鲁棒性的研究 Research on Robustness of Audio-Visual Speaker Recognition Based on Articulatory Features期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于发音特征的音视频说话人识别鲁棒性的研究

引用本文：	陈雁翔,刘鸣.基于发音特征的音视频说话人识别鲁棒性的研究[J].电子学报,2010,38(12):2920-2924.

作者姓名：	陈雁翔刘鸣

作者单位：	1. 合肥工业大学计算机与信息学院,安徽合肥 230009;2. 伊利诺伊大学香槟分校电子计算机工程系,伊利诺伊州 61801

基金项目：	国家自然科学基金，安徽省优秀青年科技基金

摘要：	人类对语音的感知是多模态的,会同时受到听觉和视觉的影响.以语音及其视觉特征的融合为研究核心,依据发音机理中揭示的音视频之间非同步关联的深层次成因,采用多个发音特征的非同步关联,去描述表面上观察到的音视频之间的非同步,提出了一个基于动态贝叶斯网络的语音与唇动联合模型,并通过音视频双模态的多层次融合,实现了说话人识别系统鲁棒性的提高.音视频双模态数据库上的实验表明了,在不同语音信噪比的条件下多层次融合均达到了更好的性能.
关键词：	发音特征音视频说话人识别动态贝叶斯网络
收稿时间：	2009-08-06
Research on Robustness of Audio-Visual Speaker Recognition Based on Articulatory Features

CHEN Yan-xiang,LIU Ming.Research on Robustness of Audio-Visual Speaker Recognition Based on Articulatory Features[J].Acta Electronica Sinica,2010,38(12):2920-2924.

Authors:	CHEN Yan-xiang LIU Ming

Affiliation:	1. School of Computer Science and Information,Hefei University of Technology,Hefei,Anhui 230009,China;2. Department of Electrical and Computer Engineering,University of Illinois at Urbana-Champaign,Illinois 61801,USA

Abstract:	Speech perception of human is bimodal because of the simultaneous audible and visible influence.This paper investigates how to fuse speech and visual speech features.From research on articulatory mechanism,the apparently observed audio-visual asynchrony is represented by asynchronous articulatory feature streams.An audio-visual model composed of speech and lip-moving is proposed based on Dynamic Bayesian Network,and then the multi-level fusion is implemented to improve the robustness of speaker recognition system.The experiment for audio-visual bimodal corpus shows that the multi-level fusion can improve the performance at all levels of acoustic signal-to-noise ratio (SNR) from 0 to 30dB.

Keywords:	articulatory feature audio-visual speaker recognition dynamic Bayesian network
本文献已被万方数据等数据库收录！
	点击此处可从《电子学报》浏览原始摘要信息
	点击此处可从《电子学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏