首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In this work we elaborate on a novel image-based system for creating video-realistic eye animations to arbitrary spoken output. These animations are useful to give a face to multimedia applications such as virtual operators in dialog systems. Our eye animation system consists of two parts: eye control unit and rendering engine, which synthesizes eye animations by combining 3D and image-based models. The designed eye control unit is based on eye movement physiology and the statistical analysis of recorded human subjects. As already analyzed in previous publications, eye movements vary while listening and talking. We focus on the latter and are the first to design a new model which fully automatically couples eye blinks and movements with phonetic and prosodic information extracted from spoken language. We extended the already known simple gaze model by refining mutual gaze to better model human eye movements. Furthermore, we improved the eye movement models by considering head tilts, torsion, and eyelid movements. Mainly due to our integrated blink and gaze model and to the control of eye movements based on spoken language, subjective tests indicate that participants are not able to distinguish between real eye motions and our animations, which has not been achieved before.  相似文献   

2.
This paper proposes a deep bidirectional long short-term memory approach in modeling the long contextual, nonlinear mapping between audio and visual streams for video-realistic talking head. In training stage, an audio-visual stereo database is firstly recorded as a subject talking to a camera. The audio streams are converted into acoustic feature, i.e. Mel-Frequency Cepstrum Coefficients (MFCCs), and their textual labels are also extracted. The visual streams, in particular, the lower face region, are compactly represented by active appearance model (AAM) parameters by which the shape and texture variations can be jointly modeled. Given pairs of the audio and visual parameter sequence, a DBLSTM model is trained to learn the sequence mapping from audio to visual space. For any unseen speech audio, whether it is original recorded or synthesized by text-to-speech (TTS), the trained DBLSTM model can predict a convincing AAM parameter trajectory for the lower face animation. To further improve the realism of the proposed talking head, the trajectory tiling method is adopted to use the DBLSTM predicted AAM trajectory as a guide to select a smooth real sample image sequence from the recorded database. We then stitch the selected lower face image sequence back to a background face video of the same subject, resulting in a video-realistic talking head. Experimental results show that the proposed DBLSTM approach outperforms the existing HMM-based approach in both objective and subjective evaluations.  相似文献   

3.
真实感虚拟人脸的实现和应用   总被引:2,自引:0,他引:2  
实现了一个交互式人脸建膜和动画的工具,用户可以很容易从一个人的正面和侧面的照片构造出头部的三维模型,并在这个模型上实现特定人脸的表情和动画,同时可以进行口型和声音的同步。基于以上技术,实现了一个虚拟人脸的动画组件,可以应用于WINDOWS应用系统中,给用户提供更加新颖和友好的局面。  相似文献   

4.
本文描述了一种结合MPEG4标准和人脸肌肉模型对人类表情进行归一化定量描述的方法。该方法通过对人脸肌肉运动参数的归一化处理,实现了人脸模型表情的定量描述,为人脸动画制作和建立人脸动画表情库,提供了简洁的途径。  相似文献   

5.
In this paper we study the production and perception of speech in diverse conditions for the purposes of accurate, flexible and highly intelligible talking face animation. We recorded audio, video and facial motion capture data of a talker uttering a set of 180 short sentences, under three conditions: normal speech (in quiet), Lombard speech (in noise), and whispering. We then produced an animated 3D avatar with similar shape and appearance as the original talker and used an error minimization procedure to drive the animated version of the talker in a way that matched the original performance as closely as possible. In a perceptual intelligibility study with degraded audio we then compared the animated talker against the real talker and the audio alone, in terms of audio-visual word recognition rate across the three different production conditions. We found that the visual intelligibility of the animated talker was on par with the real talker for the Lombard and whisper conditions. In addition we created two incongruent conditions where normal speech audio was paired with animated Lombard speech or whispering. When compared to the congruent normal speech condition, Lombard animation yields a significant increase in intelligibility, despite the AV-incongruence. In a separate evaluation, we gathered subjective opinions on the different animations, and found that some degree of incongruence was generally accepted.  相似文献   

6.
Animating expressive faces across languages   总被引:2,自引:0,他引:2  
This paper describes a morphing-based audio driven facial animation system. Based on an incoming audio stream, a face image is animated with full lip synchronization and synthesized expressions. A novel scheme to implement a language independent system for audio-driven facial animation given a speech recognition system for just one language, in our case, English, is presented. The method presented here can also be used for text to audio-visual speech synthesis. Visemes in new expressions are synthesized to be able to generate animations with different facial expressions. An animation sequence using optical flow between visemes is constructed, given an incoming audio stream and still pictures of a face representing different visemes. The presented techniques give improved lip synchronization and naturalness to the animated video.  相似文献   

7.
面向纹理特征的真实感三维人脸动画方法   总被引:2,自引:0,他引:2  
纹理变化是人脸表情的重要组成部分,传统的人脸动画方法通常只是对纹理图像做简单的拉伸变换,没有考虑人脸细微纹理特征的变化,比如皱纹、酒窝等,该文提出了一种面向纹理特征变化的真实感三维人脸动画方法.给出局部表情比率图(Partial Expression Ratio Image,PERI)的概念及其获取方法,在此基础上,进一步给出了面向MPEG-4的PERI参数化与面向三维人脸动画的多方向PERI方法,前者通过有机结合MPEG-4的人脸动画参数(Facial Anlmation Parameter,FAP),实现人脸动画中细微表情特征的参数化表示;后者通过多方向PERI纹理特征调整方法,使得三维人脸模型在不同角度都具有较好的细微表情特征,该文提出的方法克服了传统人脸动画只考虑人脸曲面形变控制而忽略纹理变化的缺陷,实现面向纹理变化的具有细微表情特征的真实感三维人脸动画,实验表明,该文提出的方法能有效捕捉纹理变化细节,提高人脸动画的真实感。  相似文献   

8.
Humans are known to use a wide range of non-verbal behaviour while speaking. Generating naturalistic embodied speech for an artificial agent is therefore an application where techniques that draw directly on recorded human motions can be helpful. We present a system that uses corpus-based selection strategies to specify the head and eyebrow motion of an animated talking head. We first describe how a domain-specific corpus of facial displays was recorded and annotated, and outline the regularities that were found in the data. We then present two different methods of selecting motions for the talking head based on the corpus data: one that chooses the majority option in all cases, and one that makes a weighted choice among all of the options. We compare these methods to each other in two ways: through cross-validation against the corpus, and by asking human judges to rate the output. The results of the two evaluation studies differ: the cross-validation study favoured the majority strategy, while the human judges preferred schedules generated using weighted choice. The judges in the second study also showed a preference for the original corpus data over the output of either of the generation strategies.  相似文献   

9.
We have developed an easy-to-use and cost-effective system to construct textured 3D animated face models from videos with minimal user interaction. This is a particularly challenging task for faces due to a lack of prominent textures. We develop a robust system by following a model-based approach: we make full use of generic knowledge of faces in head motion determination, head tracking, model fitting, and multiple-view bundle adjustment. Our system first takes, with an ordinary video camera, images of a face of a person sitting in front of the camera turning their head from one side to the other. After five manual clicks on two images to indicate the position of the eye corners, nose tip and mouth corners, the system automatically generates a realistic looking 3D human head model that can be animated immediately (different poses, facial expressions and talking). A user, with a PC and a video camera, can use our system to generate his/her face model in a few minutes. The face model can then be imported in his/her favorite game, and the user sees themselves and their friends take part in the game they are playing. We have demonstrated the system on a laptop computer live at many events, and constructed face models for hundreds of people. It works robustly under various environment settings.  相似文献   

10.
针对现有语音生成说话人脸视频方法忽略说话人头部运动的问题,提出基于关键点表示的语音驱动说话人脸视频生成方法.分别利用人脸的面部轮廓关键点和唇部关键点表示说话人的头部运动信息和唇部运动信息,通过并行多分支网络将输入语音转换到人脸关键点,通过连续的唇部关键点和头部关键点序列及模板图像最终生成面部人脸视频.定量和定性实验表明,文中方法能合成清晰、自然、带有头部动作的说话人脸视频,性能指标较优.  相似文献   

11.
论文提出了一种新的基于三维人脸形变模型,并兼容于MPEG-4的三维人脸动画模型。采用基于均匀网格重采样的方法建立原型三维人脸之间的对齐,应用MPEG-4中定义的三维人脸动画规则,驱动三维模型自动生成真实感人脸动画。给定一幅人脸图像,三维人脸动画模型可自动重建其真实感的三维人脸,并根据FAP参数驱动模型自动生成人脸动画。  相似文献   

12.
从正面侧照片合成三维人脸   总被引:5,自引:1,他引:5  
实现了一具交互式人脸建模和动画的工具,用户可以从一个人正面和侧面的照片构造了出头部的三维模型,并基于这个模型实现特定表情和简单的动画。详细阐述在系统实现过程中应用到的人脸几何表示,一般人脸变化到特定人脸、弹性网格、肌肉模型、全视角巾图、表情提取等技术。  相似文献   

13.
14.
虚拟人面部行为的合成   总被引:17,自引:2,他引:17  
虚拟人是虚拟现实环境中很重要的一部分,对于虚拟人行为的研究除了应从宏观上考虑虚拟人的群体行为属性之外,以个体行为属性的研究也非常重要。个体行为包括自然行为和意识行为。自然行为主要是和脸部、头部以及四肢运动有关的行为。而意识行为则包括与语言和心理活动相关联的表情、发声以及对应的唇动手势动作等。本文旨在研究与意识行为有关的虚拟人面部图像合成技术,讨论了标准人脸图像的参数合成方法,给出了特定人脸图像与标  相似文献   

15.
Realistic talking heads have important use in interactive multimedia applications. This paper presents a novel framework to synthesize realistic facial animations driven by motion capture data using Laplacian deformation. We first capture the facial expression from a performer, then decompose the motion data into two components: the rigid movement of the head and the change of the facial expression. By making use of the local-detail preserving property of the Laplacian coordinate, we clone the captured facial expression onto a neutral 3D facial model using Laplacian deformation. We choose some expression “independent points” in the facial model as the fixed points when solving the Laplacian deformation equations. Experimental results show that our approach can synthesize realistic facial expressions in real time while preserving the facial details. We compare our method with the state-of-the-art facial expression synthesis methods to verify the advantages of our method. Our approach can be applied in real-time multimedia systems.  相似文献   

16.
This paper presents a novel approach for the generation of realistic speech synchronized 3D facial animation that copes with anticipatory and perseveratory coarticulation. The methodology is based on the measurement of 3D trajectories of fiduciary points marked on the face of a real speaker during the speech production of CVCV non-sense words. The trajectories are measured from standard video sequences using stereo vision photogrammetric techniques. The first stationary point of each trajectory associated with a phonetic segment is selected as its articulatory target. By clustering according to geometric similarity all articulatory targets of a same segment in different phonetic contexts, a set of phonetic context-dependent visemes accounting for coarticulation is identified. These visemes are then used to drive a set of geometric transformation/deformation models that reproduce the rotation and translation of the temporomandibular joint on the 3D virtual face, as well as the behavior of the lips, such as protrusion, and opening width and height of the natural articulation. This approach is being used to generate 3D speech synchronized animation from both natural and synthetic speech generated by a text-to-speech synthesizer.  相似文献   

17.
Image-based animation of facial expressions   总被引:1,自引:0,他引:1  
We present a novel technique for creating realistic facial animations given a small number of real images and a few parameters for the in-between images. This scheme can also be used for reconstructing facial movies where the parameters can be automatically extracted from the images. The in-between images are produced without ever generating a three-dimensional model of the face. Since facial motion due to expressions are not well defined mathematically our approach is based on utilizing image patterns in facial motion. These patterns were revealed by an empirical study which analyzed and compared image motion patterns in facial expressions. The major contribution of this work is showing how parameterized “ideal” motion templates can generate facial movies for different people and different expressions, where the parameters are extracted automatically from the image sequence. To test the quality of the algorithm, image sequences (one of which was taken from a TV news broadcast) were reconstructed, yielding movies hardly distinguishable from the originals. Published online: 2 October 2002 Correspondence to: A. Tal Work has been supported in part by the Israeli Ministry of Industry and Trade, The MOST Consortium  相似文献   

18.
Emotive audio–visual avatars are virtual computer agents which have the potential of improving the quality of human-machine interaction and human-human communication significantly. However, the understanding of human communication has not yet advanced to the point where it is possible to make realistic avatars that demonstrate interactions with natural- sounding emotive speech and realistic-looking emotional facial expressions. In this paper, We propose the various technical approaches of a novel multimodal framework leading to a text-driven emotive audio–visual avatar. Our primary work is focused on emotive speech synthesis, realistic emotional facial expression animation, and the co-articulation between speech gestures (i.e., lip movements) and facial expressions. A general framework of emotive text-to-speech (TTS) synthesis using a diphone synthesizer is designed and integrated into a generic 3-D avatar face model. Under the guidance of this framework, we therefore developed a realistic 3-D avatar prototype. A rule-based emotive TTS synthesis system module based on the Festival-MBROLA architecture has been designed to demonstrate the effectiveness of the framework design. Subjective listening experiments were carried out to evaluate the expressiveness of the synthetic talking avatar.   相似文献   

19.
This paper presents a new technique of unified probabilistic models for face recognition from only one single example image per person. The unified models, trained on an obtained training set with multiple samples per person, are used to recognize facial images from another disjoint database with a single sample per person. Variations between facial images are modeled as two unified probabilistic models: within-class variations and between-class variations. Gaussian Mixture Models are used to approximate the distributions of the two variations and exploit a classifier combination method to improve the performance. Extensive experimental results on the ORL face database and the authors‘ database (the ICT-JDL database) including totally 1,750 facial images of 350 individuals demonstrate that the proposed technique, compared with traditional eigenface method and some well-known traditional algorithms, is a significantly more effective and robust approach for face recognition.  相似文献   

20.
Expressive facial animations are essential to enhance the realism and the credibility of virtual characters. Parameter‐based animation methods offer a precise control over facial configurations while performance‐based animation benefits from the naturalness of captured human motion. In this paper, we propose an animation system that gathers the advantages of both approaches. By analyzing a database of facial motion, we create the human appearance space. The appearance space provides a coherent and continuous parameterization of human facial movements, while encapsulating the coherence of real facial deformations. We present a method to optimally construct an analogous appearance face for a synthetic character. The link between both appearance spaces makes it possible to retarget facial animation on a synthetic face from a video source. Moreover, the topological characteristics of the appearance space allow us to detect the principal variation patterns of a face and automatically reorganize them on a low‐dimensional control space. The control space acts as an interactive user‐interface to manipulate the facial expressions of any synthetic face. This interface makes it simple and intuitive to generate still facial configurations for keyframe animation, as well as complete temporal sequences of facial movements. The resulting animations combine the flexibility of a parameter‐based system and the realism of real human motion. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号