首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Human facial gestures often exhibit such natural stochastic variations as how often the eyes blink, how often the eyebrows and the nose twitch, and how the head moves while speaking. The stochastic movements of facial features are key ingredients for generating convincing facial expressions. Although such small variations have been simulated using noise functions in many graphics applications, modulating noise functions to match natural variations induced from the affective states and the personality of characters is difficult and not intuitive. We present a technique for generating subtle expressive facial gestures (facial expressions and head motion) semi‐automatically from motion capture data. Our approach is based on Markov random fields that are simulated in two levels. In the lower level, the coordinated movements of facial features are captured, parameterized, and transferred to synthetic faces using basis shapes. The upper level represents independent stochastic behavior of facial features. The experimental results show that our system generates expressive facial gestures synchronized with input speech.  相似文献   

2.
Automatic analysis of head gestures and facial expressions is a challenging research area and it has significant applications in human-computer interfaces. We develop a face and head gesture detector in video streams. The detector is based on face landmark paradigm in that appearance and configuration information of landmarks are used. First we detect and track accurately facial landmarks using adaptive templates, Kalman predictor and subspace regularization. Then the trajectories (time series) of facial landmark positions during the course of the head gesture or facial expression are converted in various discriminative features. Features can be landmark coordinate time series, facial geometric features or patches on expressive regions of the face. We use comparatively, two feature sequence classifiers, that is, Hidden Markov Models (HMM) and Hidden Conditional Random Fields (HCRF), and various feature subspace classifiers, that is, ICA (Independent Component Analysis) and NMF (Non-negative Matrix Factorization) on the spatiotemporal data. We achieve 87.3% correct gesture classification on a seven-gesture test database, and the performance reaches 98.2% correct detection under a fusion scheme. Promising and competitive results are also achieved on classification of naturally occurring gesture clips of LIlir TwoTalk Corpus.  相似文献   

3.
目的 针对当前视频情感判别方法大多仅依赖面部表情、而忽略了面部视频中潜藏的生理信号所包含的情感信息,本文提出一种基于面部表情和血容量脉冲(BVP)生理信号的双模态视频情感识别方法。方法 首先对视频进行预处理获取面部视频;然后对面部视频分别提取LBP-TOP和HOG-TOP两种时空表情特征,并利用视频颜色放大技术获取BVP生理信号,进而提取生理信号情感特征;接着将两种特征分别送入BP分类器训练分类模型;最后利用模糊积分进行决策层融合,得出情感识别结果。结果 在实验室自建面部视频情感库上进行实验,表情单模态和生理信号单模态的平均识别率分别为80%和63.75%,而融合后的情感识别结果为83.33%,高于融合前单一模态的情感识别精度,说明了本文融合双模态进行情感识别的有效性。结论 本文提出的双模态时空特征融合的情感识别方法更能充分地利用视频中的情感信息,有效增强了视频情感的分类性能,与类似的视频情感识别算法对比实验验证了本文方法的优越性。另外,基于模糊积分的决策层融合算法有效地降低了不可靠决策信息对融合的干扰,最终获得更优的识别精度。  相似文献   

4.
面向复杂场景的人物视觉理解技术能够提升社会智能化协作效率,加速社会治理智能化进程,并在服务人类社会的经济活动、建设智慧城市等方面展现出巨大活力,具有重大的社会效益和经济价值。人物视觉理解技术主要包括实时人物识别、个体行为分析与群体交互理解、人机协同学习、表情与语音情感识别和知识引导下视觉理解等,当环境处于复杂场景中,特别是考虑“人物—行为—场景”整体关联的视觉表达与理解,相关问题的研究更具有挑战性。其中,大规模复杂场景实时人物识别主要集中在人脸检测、人物特征理解以及场景分析等,是复杂场景下人物视觉理解技术的重要研究基础;个体行为分析与群体交互理解主要集中在视频行人重识别、视频动作识别、视频问答和视频对话等,是视觉理解的关键行为组成部分;同时,在个体行为分析和群体交互理解中,形成综合利用知识与先验的机器学习模式,包含视觉问答对话、视觉语言导航两个重点研究方向;情感的识别与合成主要集中在人脸表情识别、语音情感识别与合成以及知识引导下视觉分析等方面,是情感交互的核心技术。本文围绕上述核心关键技术,阐述复杂场景下人物视觉理解领域的研究热点与应用场景,总结国内外相关成果与进展,展望该领域的前沿技术与发展趋势。  相似文献   

5.
Chen  Jingying  Xu  Ruyi  Liu  Leyuan 《Multimedia Tools and Applications》2018,77(22):29871-29887

Facial expression recognition (FER) is important in vision-related applications. Deep neural networks demonstrate impressive performance for face recognition; however, it should be noted that this method relies heavily on a great deal of manually labeled training data, which is not available for facial expressions in real-world applications. Hence, we propose a powerful facial feature called deep peak–neutral difference (DPND) for FER. DPND is defined as the difference between two deep representations of the fully expressive (peak) and neutral facial expression frames. The difference tends to emphasize the facial parts that are changed in the transition from the neutral to the expressive face and to eliminate the face identity information retained in the fine-tuned deep neural network for facial expression, the network has been trained on large-scale face recognition dataset. Furthermore, unsupervised clustering and semi-supervised classification methods are presented to automatically acquire the neutral and peak frames from the expression sequence. The proposed facial expression feature achieved encouraging results on public databases, which suggests that it has strong potential to recognize facial expressions in real-world applications.

  相似文献   

6.
Automatic analysis of human facial expression is a challenging problem with many applications. Most of the existing automated systems for facial expression analysis attempt to recognize a few prototypic emotional expressions, such as anger and happiness. Instead of representing another approach to machine analysis of prototypic facial expressions of emotion, the method presented in this paper attempts to handle a large range of human facial behavior by recognizing facial muscle actions that produce expressions. Virtually all of the existing vision systems for facial muscle action detection deal only with frontal-view face images and cannot handle temporal dynamics of facial actions. In this paper, we present a system for automatic recognition of facial action units (AUs) and their temporal models from long, profile-view face image sequences. We exploit particle filtering to track 15 facial points in an input face-profile sequence, and we introduce facial-action-dynamics recognition from continuous video input using temporal rules. The algorithm performs both automatic segmentation of an input video into facial expressions pictured and recognition of temporal segments (i.e., onset, apex, offset) of 27 AUs occurring alone or in a combination in the input face-profile video. A recognition rate of 87% is achieved.  相似文献   

7.
8.
Automated analysis of human affective behavior has attracted increasing attention from researchers in psychology, computer science, linguistics, neuroscience, and related disciplines. However, the existing methods typically handle only deliberately displayed and exaggerated expressions of prototypical emotions despite the fact that deliberate behaviour differs in visual appearance, audio profile, and timing from spontaneously occurring behaviour. To address this problem, efforts to develop algorithms that can process naturally occurring human affective behaviour have recently emerged. Moreover, an increasing number of efforts are reported toward multimodal fusion for human affect analysis including audiovisual fusion, linguistic and paralinguistic fusion, and multi-cue visual fusion based on facial expressions, head movements, and body gestures. This paper introduces and surveys these recent advances. We first discuss human emotion perception from a psychological perspective. Next we examine available approaches to solving the problem of machine understanding of human affective behavior, and discuss important issues like the collection and availability of training and test data. We finally outline some of the scientific and engineering challenges to advancing human affect sensing technology.  相似文献   

9.
基于D—S证据理论的表情识别技术   总被引:1,自引:0,他引:1  
王嵘  马希荣 《计算机科学》2009,36(1):231-233
在情感计算理论基础上,提出了基于D-S理论的信息融合的表情识别技术,设计并实现了系统IFFER.在表情识别模块中的分类器训练采用JAFFE表情库.识别中首先利用色度匹配及亮度匹配将人脸图像进行眼部及嘴部的分割,再分别用训练好的眼部SVM分类器及嘴部SVM分类器进行识别,将识别后的结果利用D-S证据理论进行融合.实验结果表明,对分割后的两部分图像进行识别,无论从训练上还是识别上,数据的维数都大大减少,提高了效率.在识别率上,融合后的结果相对于融合前的有显著的提高.  相似文献   

10.
基于人脸表情特征的情感交互系统*   总被引:1,自引:1,他引:0  
徐红  彭力 《计算机应用研究》2012,29(3):1111-1115
设计了一套基于人脸表情特征的情感交互系统(情感虚拟人),关键技术分别为情感识别、情感计算、情感合成与输出三个方面。情感识别部分首先采用特征块的方法对面部静态表情图形进行预处理,然后利用二维主元分析(2DPCA)提取特征,最后利用多级量子神经网络分类器实现七类表情识别分类;在情感计算部分建立了隐马尔可夫情感模型(HMM),并且用改进的遗传算法估计模型中的参数;在情感合成与输出阶段,首先采用NURBS曲面和面片相结合的算法,建立人脸三维网格模型,然后采用关键帧技术,实现了符合人类行为规律的连续表情动画。最后完成了基于人脸表情特征的情感交互系统的设计。  相似文献   

11.
Considerable research has been done on using information from multiple modalities, like hands, facial gestures or speech, for better interaction between humans and computers, and many promising human–computer interfaces (HCI) have been developed in recent years. However, most of the current HCI systems have a few drawbacks: firstly, they are highly dependent on the performance of individual sensors. S econdly, the information fusion process from these sensors tends to ignore the semantic nature of the modalities, which may reinforce or clarify each other over time. Finally, they are not robust enough at representing the imprecise nature of human gestures, since individual gestures are highly ambiguous in themselves. In this paper, we propose an approach for the semantic fusion of different input modalities, based on transferable belief models. We show that this approach allows for a better representation of the ambiguity involved in recognizing gestures. Ambiguity is resolved by combining the beliefs of the individual sensors on the input information, to form new extended concepts, based on a pre-defined domain specific knowledge base, represented by conceptual graphs. We apply this technique to a multimodal system consisting of a hand gesture recognition sensor and a brain computing interface. It is shown that the technique can successfully combine individual gestures obtained from the two sensors, to form meaningful concepts and resolve ambiguity. The advantage of this approach is that it is robust even if one of the sensors is inefficient or has no input. Another important feature is its scalability, wherein more input modalities, like speech or facial gestures, can be easily integrated into the system at minimal cost, to form a comprehensive HCI interface.  相似文献   

12.
针对人脸识别中每个人只有小规模训练样本的情况,在基于表示的分类(representation based classification,RBC)方法基础上使用由无关类组成的差异字典。差异字典一般由具有面部姿态变化与表情变化的人脸及其基准脸构成,需要训练样本为基准脸才能得到较好的识别效果。为防止小规模训练样本中有非基准脸使差异字典出现识别效果下降的情况,使用灰度对称脸将训练样本中的非基准脸转换为近似基准脸,进行差异字典的训练。实验结果表明,该人脸识别方法在小样本情况下的ORL、GT(Georgia tech)、FERET人脸库上具有良好的表现。  相似文献   

13.
A real-time speech-driven synthetic talking face provides an effective multimodal communication interface in distributed collaboration environments. Nonverbal gestures such as facial expressions are important to human communication and should be considered by speech-driven face animation systems. In this paper, we present a framework that systematically addresses facial deformation modeling, automatic facial motion analysis, and real-time speech-driven face animation with expression using neural networks. Based on this framework, we learn a quantitative visual representation of the facial deformations, called the motion units (MUs). A facial deformation can be approximated by a linear combination of the MUs weighted by MU parameters (MUPs). We develop an MU-based facial motion tracking algorithm which is used to collect an audio-visual training database. Then, we construct a real-time audio-to-MUP mapping by training a set of neural networks using the collected audio-visual training database. The quantitative evaluation of the mapping shows the effectiveness of the proposed approach. Using the proposed method, we develop the functionality of real-time speech-driven face animation with expressions for the iFACE system. Experimental results show that the synthetic expressive talking face of the iFACE system is comparable with a real face in terms of the effectiveness of their influences on bimodal human emotion perception.  相似文献   

14.
For effective interaction between humans and socially adept, intelligent service robots, a key capability required by this class of sociable robots is the successful interpretation of visual data. In addition to crucial techniques like human face detection and recognition, an important next step for enabling intelligence and empathy within social robots is that of emotion recognition. In this paper, an automated and interactive computer vision system is investigated for human facial expression recognition and tracking based on the facial structure features and movement information. Twenty facial features are adopted since they are more informative and prominent for reducing the ambiguity during classification. An unsupervised learning algorithm, distributed locally linear embedding (DLLE), is introduced to recover the inherent properties of scattered data lying on a manifold embedded in high-dimensional input facial images. The selected person-dependent facial expression images in a video are classified using the DLLE. In addition, facial expression motion energy is introduced to describe the facial muscle’s tension during the expressions for person-independent tracking for person-independent recognition. This method takes advantage of the optical flow which tracks the feature points’ movement information. Finally, experimental results show that our approach is able to separate different expressions successfully.  相似文献   

15.
Psychologists have long explored mechanisms with which humans recognize other humans' affective states from modalities, such as voice and face display. This exploration has led to the identification of the main mechanisms, including the important role played in the recognition process by the modalities' dynamics. Constrained by the human physiology, the temporal evolution of a modality appears to be well approximated by a sequence of temporal segments called onset, apex, and offset. Stemming from these findings, computer scientists, over the past 15 years, have proposed various methodologies to automate the recognition process. We note, however, two main limitations to date. The first is that much of the past research has focused on affect recognition from single modalities. The second is that even the few multimodal systems have not paid sufficient attention to the modalities' dynamics: The automatic determination of their temporal segments, their synchronization to the purpose of modality fusion, and their role in affect recognition are yet to be adequately explored. To address this issue, this paper focuses on affective face and body display, proposes a method to automatically detect their temporal segments or phases, explores whether the detection of the temporal phases can effectively support recognition of affective states, and recognizes affective states based on phase synchronization/alignment. The experimental results obtained show the following: 1) affective face and body displays are simultaneous but not strictly synchronous; 2) explicit detection of the temporal phases can improve the accuracy of affect recognition; 3) recognition from fused face and body modalities performs better than that from the face or the body modality alone; and 4) synchronized feature-level fusion achieves better performance than decision-level fusion.  相似文献   

16.
17.
为了降低样貌、姿态、眼镜以及表情定义不统一等因素对人脸表情识别的影响,提出一种人脸样貌独立判别的协作表情识别算法。首先,采用自动的人脸检测算法定位、对齐视频每帧的人脸区域,并从人脸视频序列中选择峰值表情的人脸;然后,采用峰值人脸与某个表情类内的所有人脸产生表情类内差异人脸信息,并通过计算峰值表情人脸与表情类内差异人脸的差异信息获得协作的表情表示;最终,采用基于稀疏的分类器与表情表示决定每个人脸表情的标签。采用欧美与亚洲人脸的数据库进行仿真实验,结果表明本算法获得了较好的表情识别准确率,对不同样貌、佩戴眼镜的人脸样本也具有较好的识别效果。  相似文献   

18.
A novel method based on fusion of texture and shape information is proposed for facial expression and Facial Action Unit (FAU) recognition from video sequences. Regarding facial expression recognition, a subspace method based on Discriminant Non-negative Matrix Factorization (DNMF) is applied to the images, thus extracting the texture information. In order to extract the shape information, the system firstly extracts the deformed Candide facial grid that corresponds to the facial expression depicted in the video sequence. A Support Vector Machine (SVM) system designed on an Euclidean space, defined over a novel metric between grids, is used for the classification of the shape information. Regarding FAU recognition, the texture extraction method (DNMF) is applied on the differences images of the video sequence, calculated taking under consideration the neutral and the expressive frame. An SVM system is used for FAU classification from the shape information. This time, the shape information consists of the grid node coordinate displacements between the neutral and the expressed facial expression frame. The fusion of texture and shape information is performed using various approaches, among which are SVMs and Median Radial Basis Functions (MRBFs), in order to detect the facial expression and the set of present FAUs. The accuracy achieved using the Cohn–Kanade database is 92.3% when recognizing the seven basic facial expressions (anger, disgust, fear, happiness, sadness, surprise and neutral), and 92.1% when recognizing the 17 FAUs that are responsible for facial expression development.  相似文献   

19.
人机交互中的人脸表情识别研究进展   总被引:8,自引:4,他引:4       下载免费PDF全文
随着人机交互与情感计算技术的快速发展,人脸表情识别已成为人们研究的热点。为了阐明人机交互中人脸表情识别的研究方向及进展,该文从人脸表情数据库、表情特征提取、表情分类方法、鲁棒的表情识别、精细的表情识别、混合表情识别、非基本表情识别等方面对人脸表情识别的研究现状进行了分析。最后总结了人脸表情识别研究的热点及趋势,同时指出了人脸表情识别研究存在的局限性,并对人脸表情识别的发展进行了展望。  相似文献   

20.
Synthesizing expressive facial animation is a very challenging topic within the graphics community. In this paper, we present an expressive facial animation synthesis system enabled by automated learning from facial motion capture data. Accurate 3D motions of the markers on the face of a human subject are captured while he/she recites a predesigned corpus, with specific spoken and visual expressions. We present a novel motion capture mining technique that "learns" speech coarticulation models for diphones and triphones from the recorded data. A phoneme-independent expression eigenspace (PIEES) that encloses the dynamic expression signals is constructed by motion signal processing (phoneme-based time-warping and subtraction) and principal component analysis (PCA) reduction. New expressive facial animations are synthesized as follows: First, the learned coarticulation models are concatenated to synthesize neutral visual speech according to novel speech input, then a texture-synthesis-based approach is used to generate a novel dynamic expression signal from the PIEES model, and finally the synthesized expression signal is blended with the synthesized neutral visual speech to create the final expressive facial animation. Our experiments demonstrate that the system can effectively synthesize realistic expressive facial animation  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号