首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到17条相似文献,搜索用时 156 毫秒
1.
语音驱动唇形动画的同步是人脸动画的难点之一。首先以音节为识别单位,通过严格的声韵母建模方法,利用HTK工具包,识别得到语音文件中的音节序列与时间信息;然后利用基本唇形库和音节到唇形映射表,获得与音节序列对应的唇形序列;利用唇形序列的时间信息插值播放唇形序列,实现语音驱动的唇形动画。实验表明,该方法不仅大大减少了模型数目,而且能准确识别音节序列以及时间信息,可有效地实现语音与唇动的同步。  相似文献   

2.
基于机器学习的语音驱动人脸动画方法   总被引:19,自引:0,他引:19  
语音与唇动面部表情的同步是人脸动画的难点之一.综合利用聚类和机器学习的方法学习语音信号和唇动面部表情之间的同步关系,并应用于基于MEPG-4标准的语音驱动人脸动画系统中.在大规模音视频同步数据库的基础上,利用无监督聚类发现了能有效表征人脸运动的基本模式,采用神经网络学习训练,实现了从含韵律的语音特征到人脸运动基本模式的直接映射,不仅回避了语音识别鲁棒性不高的缺陷,同时学习的结果还可以直接驱动人脸网格.最后给出对语音驱动人脸动画系统定量和定性的两种分析评价方法.实验结果表明,基于机器学习的语音驱动人脸动画不仅能有效地解决语音视频同步的难题,增强动画的真实感和逼真性,同时基于MPEG-4的学习结果独立于人脸模型,还可用来驱动各种不同的人脸模型,包括真实视频、2D卡通人物以及3维虚拟人脸.  相似文献   

3.
为了提高中文唇音同步人脸动画视频的真实性, 本文提出一种基于改进Wav2Lip模型的文本音频驱动人脸动画生成技术. 首先, 构建了一个中文唇音同步数据集, 使用该数据集来预训练唇部判别器, 使其判别中文唇音同步人脸动画更加准确. 然后, 在Wav2Lip模型中, 引入文本特征, 提升唇音时间同步性从而提高人脸动画视频的真实性. 本文模型综合提取到的文本信息、音频信息和说话人面部信息, 在预训练的唇部判别器和视频质量判别器的监督下, 生成高真实感的唇音同步人脸动画视频. 与ATVGnet模型和Wav2Lip模型的对比实验表明, 本文模型生成的唇音同步人脸动画视频提升了唇形和音频之间的同步性, 提高了人脸动画视频整体的真实感. 本文成果为当前人脸动画生成需求提供一种解决方案.  相似文献   

4.
杜鹏  房宁  赵群飞 《计算机工程》2012,38(13):260-262,265
为解决动画流与语音流的同步问题,设计并实现一种人脸语音同步动画系统。将所有中文音素分为16组中文可视音素,并用输入的人脸图像合成对应的关键帧,分析输入文本得到中文可视音素序列和动画的关键帧序列,将该关键帧序列与语音流对齐,在关键帧之间插入过渡帧的同时,播放语音流和动画流,以实现人脸语音同步动画。实验结果表明,该系统能产生符合人们视觉和听觉感受的人脸语音同步动画。  相似文献   

5.
张毅  刘娇  罗元 《计算机教育》2012,(18):31-34
针对在噪声环境中基于语音的人机交互系统鲁棒性差的问题,本文提出一种新颖的基于唇形的人机交互方法。该方法通过定义不同的唇形来进行人机交互。首先使用adaboost算法实时检测到唇部,然后通过离散余弦变换(Discrete Cosine Transform DCT)提取唇部特征,最后使用支持向量机(Support Vector Machine SVM)进行唇形的识别。  相似文献   

6.
语音驱动口型动画是人脸表情动画中非常关键的部分。在研究语音与口型动画同步问题的基础上.提出一种真实、自然的语音口型动画实现方法。该方法首先对输入语音进行大段分割;再通过SAPI识别出具体的汉语序列信息;然后将汉语序列转换为音节序列;最后通过音节序列到口型序列的转换得到舍有口型时间信息的口型序列。在动画模块中利用该口型序列驱动3D人脸模型口型动画。取得了真实、自然的语音动画同步效果。  相似文献   

7.
提出了一种新颖的基于唇形的智能轮椅人机交互方法.该方法通过定义不同的唇形来控制智能轮椅的运动,使用者在噪杂环境中也能进行和谐、方便地人机交互.首先使用adaboost算法在视频帧中实时、准确地检测唇部,然后通过离散余弦变换(Discrete Cosine Transform DCT)提取唇部特征.针对经DCT变换后特征矢量维数较高以及支持向量机(Support Vector Machine SVM)在解决小样本、非线性及高维模式识别中表现出的优势,最后使用SVM进行唇形的识别,并将识别结果转换成控制指令来控制轮椅的运动.实验证明,该方法能有效地识别不同唇形并可用于实时控制智能轮椅的运动.  相似文献   

8.
在面部动画生成领域,克服人脸几何形状的复杂性是一项极具挑战性的任务。为了更好地应对这一挑战,文章采用了一种创新的方法,即将经过一维卷积堆叠和自注意力提取后的音频特征作为输入,通过Transformer模型从音频信号中生成面部动画。这个过程采用时间自回归模型逐步合成面部运动。使用BIWI数据集开展实验证明,该方法成功地将唇部顶点误差率缩小至令人满意的6.123%,同步率超过MeshTalk79.64%,这意味该方法在口型同步和面部表情生成方面表现出色,在完成面部动画生成任务中表现出很高的潜力,可为未来相关研究提供方向和参考。  相似文献   

9.
针对现有语音生成说话人脸视频方法忽略说话人头部运动的问题,提出基于关键点表示的语音驱动说话人脸视频生成方法.分别利用人脸的面部轮廓关键点和唇部关键点表示说话人的头部运动信息和唇部运动信息,通过并行多分支网络将输入语音转换到人脸关键点,通过连续的唇部关键点和头部关键点序列及模板图像最终生成面部人脸视频.定量和定性实验表明,文中方法能合成清晰、自然、带有头部动作的说话人脸视频,性能指标较优.  相似文献   

10.
郑文思  李永宏  丁丽娟 《计算机应用》2012,32(Z1):137-138,143
唇形轮廓的准确提取是唇形合成的基础.实现了基于Matlab的唇形参数提取平台,平台实现的主要功能包括:文件读取,关键点标记,文件播放及显示,参数提取及数据存储.考虑到边缘检测的局限性,平台采用手动标记的方法标记关键点,得到了较理想的唇形曲线及唇形人脸动画参数(FAP).实验证明,该方法简单、有效.  相似文献   

11.
传统的嘴部肌肉建模方法中对肌肉运动划分比较细致与独立,设置的控制参数较多,缺乏对肌肉运动的分解以及运动控制相关性的分析。本文提出一种基于运动分解的嘴部子运动单元模型。该方法首先建立基于标准Cadide-3模型的嘴部改进网格模型,然后根据解剖结构及肌肉运动分析,将嘴部运动分解为下颌骨旋转、唇部的收缩与舒张、凸出与上翘三个基本运动单元,最后根据输入的文本信息与音素间视觉权重函数,并结合嘴部的子运动合成得到语音同步的嘴部动画。实验结果表明,该方法能够快速合成与中文语音相匹配的唇部动画。  相似文献   

12.
Animating expressive faces across languages   总被引:2,自引:0,他引:2  
This paper describes a morphing-based audio driven facial animation system. Based on an incoming audio stream, a face image is animated with full lip synchronization and synthesized expressions. A novel scheme to implement a language independent system for audio-driven facial animation given a speech recognition system for just one language, in our case, English, is presented. The method presented here can also be used for text to audio-visual speech synthesis. Visemes in new expressions are synthesized to be able to generate animations with different facial expressions. An animation sequence using optical flow between visemes is constructed, given an incoming audio stream and still pictures of a face representing different visemes. The presented techniques give improved lip synchronization and naturalness to the animated video.  相似文献   

13.
Synthesizing expressive facial animation is a very challenging topic within the graphics community. In this paper, we present an expressive facial animation synthesis system enabled by automated learning from facial motion capture data. Accurate 3D motions of the markers on the face of a human subject are captured while he/she recites a predesigned corpus, with specific spoken and visual expressions. We present a novel motion capture mining technique that "learns" speech coarticulation models for diphones and triphones from the recorded data. A phoneme-independent expression eigenspace (PIEES) that encloses the dynamic expression signals is constructed by motion signal processing (phoneme-based time-warping and subtraction) and principal component analysis (PCA) reduction. New expressive facial animations are synthesized as follows: First, the learned coarticulation models are concatenated to synthesize neutral visual speech according to novel speech input, then a texture-synthesis-based approach is used to generate a novel dynamic expression signal from the PIEES model, and finally the synthesized expression signal is blended with the synthesized neutral visual speech to create the final expressive facial animation. Our experiments demonstrate that the system can effectively synthesize realistic expressive facial animation  相似文献   

14.
Creating speech-synchronized animation   总被引:1,自引:0,他引:1  
We present a facial model designed primarily to support animated speech. Our facial model takes facial geometry as input and transforms it into a parametric deformable model. The facial model uses a muscle-based parameterization, allowing for easier integration between speech synchrony and facial expressions. Our facial model has a highly deformable lip model that is grafted onto the input facial geometry to provide the necessary geometric complexity needed for creating lip shapes and high-quality renderings. Our facial model also includes a highly deformable tongue model that can represent the shapes the tongue undergoes during speech. We add teeth, gums, and upper palate geometry to complete the inner mouth. To decrease the processing time, we hierarchically deform the facial surface. We also present a method to animate the facial model over time to create animated speech using a model of coarticulation that blends visemes together using dominance functions. We treat visemes as a dynamic shaping of the vocal tract by describing visemes as curves instead of keyframes. We show the utility of the techniques described in this paper by implementing them in a text-to-audiovisual-speech system that creates animation of speech from unrestricted text. The facial and coarticulation models must first be interactively initialized. The system then automatically creates accurate real-time animated speech from the input text. It is capable of cheaply producing tremendous amounts of animated speech with very low resource requirements.  相似文献   

15.
A parametric model for human faces is described which is capable of expression and lip animation. With this model speech synchronized animation is reduced to varying parameters in accordance with a timed speech sequence.  相似文献   

16.
一种快速鲁棒的唇部建模方法研究与实现   总被引:1,自引:2,他引:1  
快速提取完整的嘴唇外形是计算机人脸动画和语音动画的首要任务之一,模仿真实感的嘴唇,建立逼真的唇部模型是该文的主要目的。文中主要采用Red Exclusion与Cr色调分离相结合的唇部检测方法。该方法首先利用肤色模型快速准确确定出人脸区域及嘴唇检测区域,然后在RGB空间采用红色排除法,在已有的唇部区域中只考虑绿色和蓝色光谱将唇部从背景图像中分割出来。最后利用此法得到的唇部信息与变形模板方法相结合建立唇部模型。该算法对近百幅人脸图片进行嘴部提取实验,结果令人满意。该方法能够快速检测出完整的嘴唇外形,建立较好的唇部模型,为人脸动画提供唇部素材及唇部模型。  相似文献   

17.
We present a novel approach to synthesizing accurate visible speech based on searching and concatenating optimal variable-length units in a large corpus of motion capture data. Based on a set of visual prototypes selected on a source face and a corresponding set designated for a target face, we propose a machine learning technique to automatically map the facial motions observed on the source face to the target face. In order to model the long distance coarticulation effects in visible speech, a large-scale corpus that covers the most common syllables in English was collected, annotated and analyzed. For any input text, a search algorithm to locate the optimal sequences of concatenated units for synthesis is described. A new algorithm to adapt lip motions from a generic 3D face model to a specific 3D face model is also proposed. A complete, end-to-end visible speech animation system is implemented based on the approach. This system is currently used in more than 60 kindergartens through third grade classrooms to teach students to read using a lifelike conversational animated agent. To evaluate the quality of the visible speech produced by the animation system, both subjective evaluation and objective evaluation are conducted. The evaluation results show that the proposed approach is accurate and powerful for visible speech synthesis.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号