首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
对于人脸视频中的每一帧,提出一种静态人脸表情识别算法,人脸表情运动参数被提取出来后,根据表情生理知识来分类表情;为了应对知识的不足,提出一种静态表情识别和动态表情识别相结合的算法,以基于多类表情马尔可夫链和粒子滤波的统计框架结合生理知识来同时提取人脸表情运动和识别表情.实验证明了算法的有效性.  相似文献   

2.
描述一种3维人脸模型的单视频驱动方法。该方法在传统肌肉模型的基础上,根据嘴部运动特性采用机构学原理建立嘴部运动控制模型,通过视频图像序列跟踪得到特征点的运动规律曲线,进而驱动眼部、嘴部及面部其他部分的网格点运动,产生具有真实感的面部表情动作。仿真结果表明,采用此方法可以得到逼真的人脸表情模拟动画。  相似文献   

3.
目的 相比于静态人脸表情图像识别,视频序列中的各帧人脸表情强度差异较大,并且含有中性表情的帧数较多,然而现有模型无法为视频序列中每帧图像分配合适的权重。为了充分利用视频序列中的时空维度信息和不同帧图像对视频表情识别的作用力差异特点,本文提出一种基于Transformer的视频序列表情识别方法。方法 首先,将一个视频序列分成含有固定帧数的短视频片段,并采用深度残差网络对视频片段中的每帧图像学习出高层次的人脸表情特征,从而生成一个固定维度的视频片段空间特征。然后,通过设计合适的长短时记忆网络(long short-term memory network,LSTM)和Transformer模型分别从该视频片段空间特征序列中进一步学习出高层次的时间维度特征和注意力特征,并进行级联输入到全连接层,从而输出该视频片段的表情分类分数值。最后,将一个视频所有片段的表情分类分数值进行最大池化,实现该视频的最终表情分类任务。结果 在公开的BAUM-1s (Bahcesehir University multimodal)和RML (Ryerson Multimedia Lab)视频情感数据集上的试验结果表明,该方法分别取得了60.72%和75.44%的正确识别率,优于其他对比方法的性能。结论 该方法采用端到端的学习方式,能够有效提升视频序列表情识别性能。  相似文献   

4.
The recognition of facial gestures and expressions in image sequences is an important and challenging problem. Most of the existing methods adopt the following paradigm. First, facial actions/features are retrieved from the images, then the facial expression is recognized based on the retrieved temporal parameters. In contrast to this mainstream approach, this paper introduces a new approach allowing the simultaneous retrieval of facial actions and expression using a particle filter adopting multi-class dynamics that are conditioned on the expression. For each frame in the video sequence, our approach is split into two consecutive stages. In the first stage, the 3D head pose is retrieved using a deterministic registration technique based on Online Appearance Models. In the second stage, the facial actions as well as the facial expression are simultaneously retrieved using a stochastic framework based on second-order Markov chains. The proposed fast scheme is either as robust as, or more robust than existing ones in a number of respects. We describe extensive experiments and provide evaluations of performance to show the feasibility and robustness of the proposed approach.  相似文献   

5.
《Graphical Models》2014,76(3):172-179
We present a performance-based facial animation system capable of running on mobile devices at real-time frame rates. A key component of our system is a novel regression algorithm that accurately infers the facial motion parameters from 2D video frames of an ordinary web camera. Compared with the state-of-the-art facial shape regression algorithm [1], which takes a two-step procedure to track facial animations (i.e., first regressing the 3D positions of facial landmarks, and then computing the head poses and expression coefficients), we directly regress the head poses and expression coefficients. This one-step approach greatly reduces the dimension of the regression target and significantly improves the tracking performance while preserving the tracking accuracy. We further propose to collect the training images of the user under different lighting environments, and make use of the data to learn a user-specific regressor, which can robustly handle lighting changes that frequently occur when using mobile devices.  相似文献   

6.
Extracting 3D facial animation parameters from multiview video clips   总被引:1,自引:0,他引:1  
We propose an accurate and inexpensive procedure that estimates 3D facial motion parameters from mirror-reflected multiview video clips. We place two planar mirrors near a subject's cheeks and use a single camera to simultaneously capture a marker's front and side view images. We also propose a novel closed-form linear algorithm to reconstruct 3D positions from real versus mirrored point correspondences in an uncalibrated environment. Our computer simulations reveal that exploiting mirrors' various reflective properties yields a more robust, accurate, and simpler 3D position estimation approach than general-purpose stereo vision methods that use a linear approach or maximum-likelihood optimization. Our experiments show a root mean square (RMS) error of less than 2 mm in 3D space with only 20-point correspondences. For semiautomatic 3D motion tracking, we use an adaptive Kalman predictor and filter to improve stability and infer the occluded markers' position. Our approach tracks more than 50 markers on a subject's face and lips from 30-frame-per-second video clips. We've applied the facial motion parameters estimated from the proposed method to our facial animation system.  相似文献   

7.
8.
To synthesize real-time and realistic facial animation, we present an effective algorithm which combines image- and geometry-based methods for facial animation simulation. Considering the numerous motion units in the expression coding system, we present a novel simplified motion unit based on the basic facial expression, and construct the corresponding basic action for a head model. As image features are difficult to obtain using the performance driven method, we develop an automatic image feature recognition method based on statistical learning, and an expression image semi-automatic labeling method with rotation invariant face detection, which can improve the accuracy and efficiency of expression feature identification and training. After facial animation redirection, each basic action weight needs to be computed and mapped automatically. We apply the blend shape method to construct and train the corresponding expression database according to each basic action, and adopt the least squares method to compute the corresponding control parameters for facial animation. Moreover, there is a pre-integration of diffuse light distribution and specular light distribution based on the physical method, to improve the plausibility and efficiency of facial rendering. Our work provides a simplification of the facial motion unit, an optimization of the statistical training process and recognition process for facial animation, solves the expression parameters, and simulates the subsurface scattering effect in real time. Experimental results indicate that our method is effective and efficient, and suitable for computer animation and interactive applications.  相似文献   

9.
为了解决复杂课堂场景下学生表情识别的遮挡的问题,同时发挥深度学习在智能教学评估应用上的优势,提出了一种基于深度注意力网络的课堂教学视频中学生表情识别模型与智能教学评估算法.构建了课堂教学视频库、表情库和行为库,利用裁剪和遮挡策略生成多路人脸图像,在此基础上构建了多路深度注意力网络,并通过自注意力机制为多路网络分配不同权...  相似文献   

10.
We present a novel data-driven skinning model—rigidity-aware skinning (RAS) model, for simulating both active and passive 3D facial animation of different identities in real time. Our model builds upon a linear blend skinning (LBS) scheme, where the bone set and skinning weights are shared for diverse identities and learned from the data via a sparse and localized skinning decomposition algorithm. Our model characterizes the animated face into the active expression and the passive deformation: The former is represented by an LBS-based multi-linear model learned from the FaceWareHouse data set, and the latter is represented by a spatially varying as-rigid-as-possible deformation applied to the LBS-based multi-linear model, whose rigidity parameters are learned from the data by a novel rigidity estimation algorithm. Our RAS model is not only generic and expressive for faithfully modelling medium-scale facial deformation, but also compact and lightweight for generating vivid facial animation in real time. We validate the efficiency and effectiveness of our RAS model for real-time 3D facial animation and expression editing.  相似文献   

11.
Synthesizing expressive facial animation is a very challenging topic within the graphics community. In this paper, we present an expressive facial animation synthesis system enabled by automated learning from facial motion capture data. Accurate 3D motions of the markers on the face of a human subject are captured while he/she recites a predesigned corpus, with specific spoken and visual expressions. We present a novel motion capture mining technique that "learns" speech coarticulation models for diphones and triphones from the recorded data. A phoneme-independent expression eigenspace (PIEES) that encloses the dynamic expression signals is constructed by motion signal processing (phoneme-based time-warping and subtraction) and principal component analysis (PCA) reduction. New expressive facial animations are synthesized as follows: First, the learned coarticulation models are concatenated to synthesize neutral visual speech according to novel speech input, then a texture-synthesis-based approach is used to generate a novel dynamic expression signal from the PIEES model, and finally the synthesized expression signal is blended with the synthesized neutral visual speech to create the final expressive facial animation. Our experiments demonstrate that the system can effectively synthesize realistic expressive facial animation  相似文献   

12.
We present a novel performance‐driven approach to animating cartoon faces starting from pure 2D drawings. A 3D approximate facial model automatically built from front and side view master frames of character drawings is introduced to enable the animated cartoon faces to be viewed from angles different from that in the input video. The expressive mappings are built by artificial neural network (ANN) trained from the examples of the real face in the video and the cartoon facial drawings in the facial expression graph for a specific character. The learned mapping model makes the resultant facial animation to properly get the desired expressiveness, instead of a mere reproduction of the facial actions in the input video sequence. Furthermore, the lit sphere, capturing the lighting in the painting artwork of faces, is utilized to color the cartoon faces in terms of the 3D approximate facial model, reinforcing the hand‐drawn appearance of the resulting facial animation. We made a series of comparative experiments to test the effectiveness of our method by recreating the facial expression in the commercial animation. The comparison results clearly demonstrate the superiority of our method not only in generating high quality cartoon‐style facial expressions, but also in speeding up the animation production of cartoon faces. Copyright © 2011 John Wiley & Sons, Ltd.  相似文献   

13.
面向增强视频的基于结构和运动恢复的摄像机定标   总被引:1,自引:0,他引:1  
提出了一种高效鲁棒的长序列摄像机定标算法,能稳定处理焦距未知且变化的视频序列,适用于增强视频的应用.该算法从长视频序列中根据特征匹配点提炼出相互之间具有较长基线的关键帧,以保证求解的稳定性.算法先在关键帧序列上渐进式求解,以准确恢复特征匹配点的互维结构信息;利用精确恢复的三维点,求解整个序列的摄像机运动参数.该算法选择最适合初始化的三帧求解,并将解及时从射影空间转换到欧氏空间.实验结果显示了所恢复的摄像机参数和三维点的高度精确性,证明了该方法稳定高效,能够满足增强视频的高端要求.  相似文献   

14.
目的 人脸表情识别是计算机视觉的核心问题之一。一方面,表情的产生对应着面部肌肉的一个连续动态变化过程,另一方面,该运动过程中的表情峰值帧通常包含了能够识别该表情的完整信息。大部分已有的人脸表情识别算法要么基于表情视频序列,要么基于单幅表情峰值图像。为此,提出了一种融合时域和空域特征的深度神经网络来分析和理解视频序列中的表情信息,以提升表情识别的性能。方法 该网络包含两个特征提取模块,分别用于学习单幅表情峰值图像中的表情静态“空域特征”和视频序列中的表情动态“时域特征”。首先,提出了一种基于三元组的深度度量融合技术,通过在三元组损失函数中采用不同的阈值,从单幅表情峰值图像中学习得到多个不同的表情特征表示,并将它们组合在一起形成一个鲁棒的且更具辩识能力的表情“空域特征”;其次,为了有效利用人脸关键组件的先验知识,准确提取人脸表情在时域上的运动特征,提出了基于人脸关键点轨迹的卷积神经网络,通过分析视频序列中的面部关键点轨迹,学习得到表情的动态“时域特征”;最后,提出了一种微调融合策略,取得了最优的时域特征和空域特征融合效果。结果 该方法在3个基于视频序列的常用人脸表情数据集CK+(the extended Cohn-Kanade dataset)、MMI (the MMI facial expression database)和Oulu-CASIA (the Oulu-CASIA NIR&VIS facial expression database)上的识别准确率分别为98.46%、82.96%和87.12%,接近或超越了当前同类方法中的表情识别最高性能。结论 提出的融合时空特征的人脸表情识别网络鲁棒地分析和理解了视频序列中的面部表情空域和时域信息,有效提升了人脸表情的识别性能。  相似文献   

15.
We present techniques for improving performance driven facial animation, emotion recognition, and facial key-point or landmark prediction using learned identity invariant representations. Established approaches to these problems can work well if sufficient examples and labels for a particular identity are available and factors of variation are highly controlled. However, labeled examples of facial expressions, emotions and key-points for new individuals are difficult and costly to obtain. In this paper we improve the ability of techniques to generalize to new and unseen individuals by explicitly modeling previously seen variations related to identity and expression. We use a weakly-supervised approach in which identity labels are used to learn the different factors of variation linked to identity separately from factors related to expression. We show how probabilistic modeling of these sources of variation allows one to learn identity-invariant representations for expressions which can then be used to identity-normalize various procedures for facial expression analysis and animation control. We also show how to extend the widely used techniques of active appearance models and constrained local models through replacing the underlying point distribution models which are typically constructed using principal component analysis with identity–expression factorized representations. We present a wide variety of experiments in which we consistently improve performance on emotion recognition, markerless performance-driven facial animation and facial key-point tracking.  相似文献   

16.
In this paper, we present an automatic and efficient approach to the capture of dense facial motion parameters, which extends our previous work of 3D reconstruction from mirror-reflected multiview video. To narrow search space and rapidly generate 3D candidate position lists, we apply mirrored-epipolar bands. For automatic tracking, we utilize spatial proximity of facial surfaces and temporal coherence to find the best trajectories and rectify statuses of missing and false tracking. More than 300 markers on a subject’s face are tracked from video at a process speed of 9.2 frames per second (fps) on a regular PC. The estimated 3D facial motion trajectories have been applied to our facial animation system and can be used for facial motion analysis.  相似文献   

17.
This paper addresses the dynamic recognition of basic facial expressions in videos using feature subset selection. Feature selection has been already used by some static classifiers where the facial expression is recognized from one single image. Past work on dynamic facial expression recognition has emphasized the issues of feature extraction and classification, however, less attention has been given to the critical issue of feature selection in the dynamic scenario. The main contributions of the paper are as follows. First, we show that dynamic facial expression recognition can be casted into a classical classification problem. Second, we combine a facial dynamics extractor algorithm with a feature selection scheme for generic classifiers.We show that the paradigm of feature subset selection with a wrapper technique can improve the dynamic recognition of facial expressions. We provide evaluations of performance on real video sequences using five standard machine learning approaches: Support Vector Machines, K Nearest Neighbor, Naive Bayes, Bayesian Networks, and Classification Trees.  相似文献   

18.
This paper presents a novel approach for action recognition, localization and video matching based on a hierarchical codebook model of local spatio-temporal video volumes. Given a single example of an activity as a query video, the proposed method finds similar videos to the query in a target video dataset. The method is based on the bag of video words (BOV) representation and does not require prior knowledge about actions, background subtraction, motion estimation or tracking. It is also robust to spatial and temporal scale changes, as well as some deformations. The hierarchical algorithm codes a video as a compact set of spatio-temporal volumes, while considering their spatio-temporal compositions in order to account for spatial and temporal contextual information. This hierarchy is achieved by first constructing a codebook of spatio-temporal video volumes. Then a large contextual volume containing many spatio-temporal volumes (ensemble of volumes) is considered. These ensembles are used to construct a probabilistic model of video volumes and their spatio-temporal compositions. The algorithm was applied to three available video datasets for action recognition with different complexities (KTH, Weizmann, and MSR II) and the results were superior to other approaches, especially in the case of a single training example and cross-dataset1 action recognition.  相似文献   

19.
For effective interaction between humans and socially adept, intelligent service robots, a key capability required by this class of sociable robots is the successful interpretation of visual data. In addition to crucial techniques like human face detection and recognition, an important next step for enabling intelligence and empathy within social robots is that of emotion recognition. In this paper, an automated and interactive computer vision system is investigated for human facial expression recognition and tracking based on the facial structure features and movement information. Twenty facial features are adopted since they are more informative and prominent for reducing the ambiguity during classification. An unsupervised learning algorithm, distributed locally linear embedding (DLLE), is introduced to recover the inherent properties of scattered data lying on a manifold embedded in high-dimensional input facial images. The selected person-dependent facial expression images in a video are classified using the DLLE. In addition, facial expression motion energy is introduced to describe the facial muscle’s tension during the expressions for person-independent tracking for person-independent recognition. This method takes advantage of the optical flow which tracks the feature points’ movement information. Finally, experimental results show that our approach is able to separate different expressions successfully.  相似文献   

20.
基于人脸表情特征的情感交互系统*   总被引:1,自引:1,他引:0  
徐红  彭力 《计算机应用研究》2012,29(3):1111-1115
设计了一套基于人脸表情特征的情感交互系统(情感虚拟人),关键技术分别为情感识别、情感计算、情感合成与输出三个方面。情感识别部分首先采用特征块的方法对面部静态表情图形进行预处理,然后利用二维主元分析(2DPCA)提取特征,最后利用多级量子神经网络分类器实现七类表情识别分类;在情感计算部分建立了隐马尔可夫情感模型(HMM),并且用改进的遗传算法估计模型中的参数;在情感合成与输出阶段,首先采用NURBS曲面和面片相结合的算法,建立人脸三维网格模型,然后采用关键帧技术,实现了符合人类行为规律的连续表情动画。最后完成了基于人脸表情特征的情感交互系统的设计。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号