Visual Speech Synthesis by Morphing Visemes期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Visual Speech Synthesis by Morphing Visemes

Authors:	Ezzat Tony Poggio Tomaso

Affiliation:	(1) Center for Biological and Computational Learning, Artificial Intelligence Laboratory, MIT, Cambridge, MA, USA

Abstract:	We present MikeTalk, a text-to-audiovisual speech synthesizer which converts input text into an audiovisual speech stream. MikeTalk is built using visemes, which are a small set of images spanning a large range of mouth shapes. The visemes are acquired from a recorded visual corpus of a human subject which is specifically designed to elicit one instantiation of each viseme. Using optical flow methods, correspondence from every viseme to every other viseme is computed automatically. By morphing along this correspondence, a smooth transition between viseme images may be generated. A complete visual utterance is constructed by concatenating viseme transitions. Finally, phoneme and timing information extracted from a text-to-speech synthesizer is exploited to determine which viseme transitions to use, and the rate at which the morphing process should occur. In this manner, we are able to synchronize the visual speech stream with the audio speech stream, and hence give the impression of a photorealistic talking face.

Keywords:	computer vision machine learning facial modelling facial animation morphing optical flow speech synthesis lip synchronization
本文献已被 SpringerLink 等数据库收录！