文本-视觉语音合成综述 A Review of Text-to-Visual Speech Synthesis期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

文本-视觉语音合成综述

引用本文：	王志明,陶建华.文本-视觉语音合成综述[J].计算机研究与发展,2006,43(1):145-152.

作者姓名：	王志明陶建华

作者单位：	1. 北京科技大学计算机科学与技术系,北京,100083;中国科学院自动化研究所模式识别国家重点实验室,北京,100080 2. 中国科学院自动化研究所模式识别国家重点实验室,北京,100080

基金项目：	北京科技大学校科研和教改项目;国家重点实验室基金

摘要：	视觉信息对于理解语音的内容非常重要．不只是听力有障碍的人，普通人在交谈过程中也存在着一定程度的唇读，尤其是在语音质量受损的噪声环境下．正如文语转换系统可以使计算机像人一样讲话，文本-视觉语音合成系统可以使计算机模拟人类语音的双模态性，让计算机界面变得更为友好．回顾了文本-视觉语音合成的发展．文本驱动的视觉语音合成的实现方法可以分为两类：基于参数控制的方法和基于数据驱动的方法．详细介绍了参数控制类中的几个关键问题和数据驱动类中的几种不同实现方法。比较了这两类方法的优缺点及不同的适用环境．
关键词：	文本-视觉语音合成(TTVS) 视位协同发音人脸模型人脸动画
收稿时间：	09 2 2004 12:00AM
修稿时间：	2004-09-022005-05-27
A Review of Text-to-Visual Speech Synthesis

Wang Zhiming,Tao Jianhua.A Review of Text-to-Visual Speech Synthesis[J].Journal of Computer Research and Development,2006,43(1):145-152.

Authors:	Wang Zhiming Tao Jianhua

Affiliation:	1. Department of Computer Science and Technology, Beijing University of Science and Technology, Beijing 100083;2.National Laboratory of Pattern Recognition , Institute of Automation , Chinese Academy of Sciences, Beijing 100080

Abstract:	Visual information is important to the understanding of speech. Not only hearing-impaired people, but people with normal hearing also make use of visual information that accompanies speech, especially when the acoustic speech is degraded in the noise environment. As text-to-speech (TTS) synthesis makes computer speak like human, text-to-visual speech (TTVS) synthesis by computer face animation can incorporate bimodality of speech into human-computer interaction interface in order to make it friendly. The state-of-the-art of text-to-visual speech synthesis research is reviewed. Two classes of approaches, parameter control approach and data driven approach, are developed in visual speech synthesis. For the parameter control approach, three key problems are discussed: face model construction, animation control parameters definition, and the dynamic properties of control parameters. For the data driven approach, three main methods are introduced: video slice concatenation, key frame morphing, and face components combination. Finally, the advantages and disadvantages of each approach are discussed.

Keywords:	text-to-visual speech (TTVS) viseme co-articulation face model facial animation
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏