基于深度神经网络的语音驱动发音器官的运动合成 Speech-driven Articulator Motion Synthesis with Deep Neural Networks期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于深度神经网络的语音驱动发音器官的运动合成

引用本文：	唐郅, 侯进. 基于深度神经网络的语音驱动发音器官的运动合成. 自动化学报, 2016, 42(6): 923-930. doi: 10.16383/j.aas.2016.c150726

作者姓名：	唐郅侯进

作者单位：	西南交通大学信息科学与技术学院成都 611756

基金项目：	成都市科技项目(科技惠民技术研发项目)(2015-HM01-00050-SF), 四川省动漫研究中心2015年度科研项目(DM201504), 西南交通大学2015年研究生创新实验实践项目(YC201504109)资助

摘要：	实现一种基于深度神经网络的语音驱动发音器官运动合成的方法,并应用于语音驱动虚拟说话人动画合成. 通过深度神经网络(Deep neural networks, DNN)学习声学特征与发音器官位置信息之间的映射关系,系统根据输入的语音数据估计发音器官的运动轨迹,并将其体现在一个三维虚拟人上面. 首先,在一系列参数下对比人工神经网络(Artificial neural network, ANN)和DNN的实验结果,得到最优网络; 其次,设置不同上下文声学特征长度并调整隐层单元数,获取最佳长度; 最后,选取最优网络结构,由DNN 输出的发音器官运动轨迹信息控制发音器官运动合成,实现虚拟人动画. 实验证明,本文所实现的动画合成方法高效逼真.
关键词：	深度神经网络语音驱动运动合成虚拟说话人
收稿时间：	2015-10-31
Speech-driven Articulator Motion Synthesis with Deep Neural Networks

TANG Zhi, HOU Jin. Speech-driven Articulator Motion Synthesis with Deep Neural Networks. ACTA AUTOMATICA SINICA, 2016, 42(6): 923-930. doi: 10.16383/j.aas.2016.c150726

Authors:	TANG Zhi HOU Jin

Affiliation:	School of Information Science and Technology, Southwest Jiaotong University, Chengdu 611756

Abstract:	This paper implements a deep neural networks (DNN) approach for speech-driven articulator motion synthesis, which is applied to speech-driven talking avatar animation synthesis. We realize acoustic-articulatory mapping by DNN. The input of the system is acoustic speech and the output is the estimated articulatory movements on a three-dimensional avatar. First, through comparison on the performance between ANN and DNN under a series of parameters, the optimal network is obtained. Second, for different context acoustic length configurations, the number of hidden layer units is tuned for best performance. So we get the best context length. Finally, we select the optimal network structure and realize the avatar animation by using the articulatory motion trajectory information output from the DNN to control the articulator motion synthesis. The experiment proves that the method can vividly and efficiently realize talking avatar animation synthesis.

Keywords:	Deep neural networks (DNN) speech-driven motion synthesis talking avatar

	点击此处可从《自动化学报》浏览原始摘要信息
	点击此处可从《自动化学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏