首页 | 本学科首页   官方微博 | 高级检索  
     


Humanoid Audio–Visual Avatar With Emotive Text-to-Speech Synthesis
Abstract: Emotive audio–visual avatars are virtual computer agents which have the potential of improving the quality of human-machine interaction and human-human communication significantly. However, the understanding of human communication has not yet advanced to the point where it is possible to make realistic avatars that demonstrate interactions with natural- sounding emotive speech and realistic-looking emotional facial expressions. In this paper, We propose the various technical approaches of a novel multimodal framework leading to a text-driven emotive audio–visual avatar. Our primary work is focused on emotive speech synthesis, realistic emotional facial expression animation, and the co-articulation between speech gestures (i.e., lip movements) and facial expressions. A general framework of emotive text-to-speech (TTS) synthesis using a diphone synthesizer is designed and integrated into a generic 3-D avatar face model. Under the guidance of this framework, we therefore developed a realistic 3-D avatar prototype. A rule-based emotive TTS synthesis system module based on the Festival-MBROLA architecture has been designed to demonstrate the effectiveness of the framework design. Subjective listening experiments were carried out to evaluate the expressiveness of the synthetic talking avatar.
Keywords:
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号