共查询到20条相似文献,搜索用时 31 毫秒
1.
Towards modeling embodied conversational agent character profiles using appraisal theory predictions in expression synthesis 总被引:2,自引:2,他引:0
Appraisal theories in psychology study facial expressions in order to deduct information regarding the underlying emotion
elicitation processes. Scherer’s component process model provides predictions regarding particular face muscle deformations
that are attributed as reactions to the cognitive appraisal stimuli in the study of emotion episodes. In the current work,
MPEG-4 facial animation parameters are used in order to evaluate these theoretical predictions for intermediate and final
expressions of a given emotion episode. We manipulate parameters such as intensity and temporal evolution of synthesized facial
expressions. In emotion episodes originating from identical stimuli, by varying the cognitive appraisals of the stimuli and
mapping them to different expression intensities and timings, various behavioral patterns can be generated and thus different
agent character profiles can be defined. The results of the synthesis process are consequently applied to Embodied Conversational
Agents (ECAs), aiming to render their interaction with humans, or other ECAs, more affective. 相似文献
2.
We propose a method for automatically copying facial motion from one 3D face model to another, while preserving the compliance of the motion to the MPEG-4 Face and Body Animation (FBA) standard. Despite the enormous progress in the field of Facial Animation, producing a new animatable face from scratch is still a tremendous task for an artist. Although many methods exist to animate a face automatically based on procedural methods, these methods still need to be initialized by defining facial regions or similar, and they lack flexibility because the artist can only obtain the facial motion that a particular algorithm offers. Therefore a very common approach is interpolation between key facial expressions, usually called morph targets, containing either speech elements (visemes) or emotional expressions. Following the same approach, the MPEG-4 Facial Animation specification offers a method for interpolation of facial motion from key positions, called Facial Animation Tables, which are essentially morph targets corresponding to all possible motions specified in MPEG-4. The problem of this approach is that the artist needs to create a new set of morph targets for each new face model. In case of MPEG-4 there are 86 morph targets, which is a lot of work to create manually. Our method solves this problem by cloning the morph targets, i.e. by automatically copying the motion of vertices, as well as geometry transforms, from source face to target face while maintaining the regional correspondences and the correct scale of motion. It requires the user only to identify a subset of the MPEG-4 Feature Points in the source and target faces. The scale of the movement is normalized with respect to MPEG-4 normalization units (FAPUs), meaning that the MPEG-4 FBA compliance of the copied motion is preserved. Our method is therefore suitable not only for cloning of free facial expressions, but also of MPEG-4 compatible facial motion, in particular the Facial Animation Tables. We believe that Facial Motion Cloning offers dramatic time saving to artists producing morph targets for facial animation or MPEG-4 Facial Animation Tables. 相似文献
3.
For effective interaction between humans and socially adept, intelligent service robots, a key capability required by this
class of sociable robots is the successful interpretation of visual data. In addition to crucial techniques like human face
detection and recognition, an important next step for enabling intelligence and empathy within social robots is that of emotion
recognition. In this paper, an automated and interactive computer vision system is investigated for human facial expression
recognition and tracking based on the facial structure features and movement information. Twenty facial features are adopted
since they are more informative and prominent for reducing the ambiguity during classification. An unsupervised learning algorithm,
distributed locally linear embedding (DLLE), is introduced to recover the inherent properties of scattered data lying on a
manifold embedded in high-dimensional input facial images. The selected person-dependent facial expression images in a video
are classified using the DLLE. In addition, facial expression motion energy is introduced to describe the facial muscle’s
tension during the expressions for person-independent tracking for person-independent recognition. This method takes advantage
of the optical flow which tracks the feature points’ movement information. Finally, experimental results show that our approach
is able to separate different expressions successfully. 相似文献
4.
人脸表情控制是生物特征识别研究的重要内容,本文提出了一种基于MPEG-4中FAP、FAT标准进行三维人脸表情合成的方法。首先对人脸进行关键点定义和区域分割,然后使用面向表面的自由形状变形方法(SOFFD)生成人脸表情。在特定人脸表情生成过程中,使用了基表情的比例合成方法。实验表明,该方法可以有效地合成各种真实的人脸表情。 相似文献
5.
Avatars are increasingly used to express our emotions in our online communications. Such avatars are used based on the assumption
that avatar expressions are interpreted universally among all cultures. This paper investigated cross-cultural evaluations
of avatar expressions designed by Japanese and Western designers. The goals of the study were: (1) to investigate cultural
differences in avatar expression evaluation and apply findings from psychological studies of human facial expression recognition,
(2) to identify expressions and design features that cause cultural differences in avatar facial expression interpretation.
The results of our study confirmed that (1) there are cultural differences in interpreting avatars’ facial expressions, and
the psychological theory that suggests physical proximity affects facial expression recognition accuracy is also applicable
to avatar facial expressions, (2) positive expressions have wider cultural variance in interpretation than negative ones,
(3) use of gestures and gesture marks may sometimes cause counter-effects in recognizing avatar facial expressions. 相似文献
6.
George Caridakis Amaryllis Raouzaiou Elisabetta Bevacqua Maurizio Mancini Kostas Karpouzis Lori Malatesta Catherine Pelachaud 《Language Resources and Evaluation》2007,41(3-4):367-388
This work is about multimodal and expressive synthesis on virtual agents, based on the analysis of actions performed by human
users. As input we consider the image sequence of the recorded human behavior. Computer vision and image processing techniques
are incorporated in order to detect cues needed for expressivity features extraction. The multimodality of the approach lies
in the fact that both facial and gestural aspects of the user’s behavior are analyzed and processed. The mimicry consists
of perception, interpretation, planning and animation of the expressions shown by the human, resulting not in an exact duplicate
rather than an expressive model of the user’s original behavior. 相似文献
7.
8.
Variations in illumination degrade the performance of appearance based face recognition. We present a novel algorithm for
the normalization of color facial images using a single image and its co-registered 3D pointcloud (3D image). The algorithm
borrows the physically based Phong’s lighting model from computer graphics which is used for rendering computer images and
employs it in a reverse mode for the calculation of face albedo from real facial images. Our algorithm estimates the number
of the dominant light sources and their directions from the specularities in the facial image and the corresponding 3D points.
The intensities of the light sources and the parameters of the Phong’s model are estimated by fitting the Phong’s model onto
the facial skin data. Unlike existing approaches, our algorithm takes into account both Lambertian and specular reflections
as well as attached and cast shadows. Moreover, our algorithm is invariant to facial pose and expression and can effectively
handle the case of multiple extended light sources. The algorithm was tested on the challenging FRGC v2.0 data and satisfactory
results were achieved. The mean fitting error was 6.3% of the maximum color value. Performing face recognition using the normalized
images increased both identification and verification rates. 相似文献
9.
We introduce the WASABI ([W]ASABI [A]ffect [S]imulation for [A]gents with [B]elievable [I]nteractivity) Affect Simulation
Architecture, in which a virtual human’s cognitive reasoning capabilities are combined with simulated embodiment to achieve
the simulation of primary and secondary emotions. In modeling primary emotions we follow the idea of “Core Affect” in combination
with a continuous progression of bodily feeling in three-dimensional emotion space (PAD space), that is subsequently categorized
into discrete emotions. In humans, primary emotions are understood as onto-genetically earlier emotions, which directly influence
facial expressions. Secondary emotions, in contrast, afford the ability to reason about current events in the light of experiences
and expectations. By technically representing aspects of each secondary emotion’s connotative meaning in PAD space, we not
only assure their mood-congruent elicitation, but also combine them with facial expressions, that are concurrently driven
by primary emotions. Results of an empirical study suggest that human players in a card game scenario judge our virtual human
MAX significantly older when secondary emotions are simulated in addition to primary ones. 相似文献
10.
Li Huang Congyong Su 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2006,10(12):1193-1200
Given a person’s neutral face, we can predict his/her unseen expression by machine learning techniques for image processing. Different from the prior expression cloning or image analogy approaches, we try to hallucinate the person’s plausible facial expression with the help of a large face expression database. In the first step, regularization network based nonlinear manifold learning is used to obtain a smooth estimation for unseen facial expression, which is better than the reconstruction results of PCA. In the second step, Markov network is adopted to learn the low-level local facial feature’s relationship between the residual neutral and the expressional face image’s patches in the training set, then belief propagation is employed to infer the expressional residual face image for that person. By integrating the two approaches, we obtain the final results. The experimental results show that the hallucinated facial expression is not only expressive but also close to the ground truth. 相似文献
11.
12.
Gaobo Yang Weiwei Chen Qiya Zhou Zhaoyang Zhang 《Journal of Real-Time Image Processing》2009,4(4):303-316
This paper presents a compressed-domain motion object extraction algorithm based on optical flow approximation for MPEG-2
video stream. The discrete cosine transform (DCT) coefficients of P and B frames are estimated to reconstruct DC + 2AC image
using their motion vectors and the DCT coefficients in I frames, which can be directly extracted from MPEG-2 compressed domain.
Initial optical flow is estimated with Black’s optical flow estimation framework, in which DC image is substituted by DC + 2AC
image to provide more intensity information. A high confidence measure is exploited to generate dense and accurate motion
vector field by removing noisy and false motion vectors. Global motion estimation and iterative rejection are further utilized
to separate foreground and background motion vectors. Region growing with automatic seed selection is performed to extract
accurate object boundary by motion consistency model. The object boundary is further refined by partially decoding the boundary
blocks to improve the accuracy. Experimental results on several test sequences demonstrate that the proposed approach can
achieve compressed-domain video object extraction for MPEG-2 video stream in CIF format with real-time performance. 相似文献
13.
Sidra Batool Kazmi Qurat-ul-Ain M. Arfan Jaffar 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2012,16(3):369-379
A human face does not play its role in the identification of an individual but also communicates useful information about
a person’s emotional state at a particular time. No wonder automatic face expression recognition has become an area of great
interest within the computer science, psychology, medicine, and human–computer interaction research communities. Various feature
extraction techniques based on statistical to geometrical data have been used for recognition of expressions from static images
as well as real-time videos. In this paper, we present a method for automatic recognition of facial expressions from face
images by providing discrete wavelet transform features to a bank of seven parallel support vector machines (SVMs). Each SVM
is trained to recognize a particular facial expression, so that it is most sensitive to that expression. Multi-classification
is achieved by combining multiple SVMs performing binary classification using one-against-all approach. The outputs of all
SVMs are combined using a maximum function. The classification efficiency is tested on static images from the publicly available
Japanese Female Facial Expression database. The experiments using the proposed method demonstrate promising results. 相似文献
14.
具有真实感的三维人脸模型的构造和动画是计算机图形学领域中一个重要的研究课题.如何在三维人脸模型上实时地模拟人脸的运动,产生具有真实感的人脸表情和动作,是其中的一个难点.提出一种实时的三维人脸动画方法,该方法将人脸模型划分成若干个运动相对独立的功能区,然后使用提出的基于加权狄里克利自由变形DFFD(Dirichlet free-form deformation)和刚体运动模拟的混合技术模拟功能区的运动.同时,通过交叉的运动控制点模拟功能区之间运动的相互影响.在该方法中,人脸模型的运动通过移动控制点来驱动.为了简化人脸模型的驱动,提出了基于MPEG-4中脸部动画参数FAP(facial animation parameters)流和基于肌肉模型的两种高层驱动方法.这两种方法不但具有较高的真实感,而且具有良好的计算性能,能实时模拟真实人脸的表情和动作. 相似文献
15.
张亚妮 《计算机应用与软件》2003,20(9):61-62,82
MPEG-4提出的基于对象的编码格式,将人脸作为一个特殊的对象,为人脸建模和动画研究奠定了基础。本文通过对MPEG-4人脸动画标准的分析,提出基于MPEG-4人脸动画系统的设计思想和需解决的关键问题。 相似文献
16.
为了实时地生成自然真实的人脸表情,提出了一种基于MPEG-4人脸动画框架的人脸表情图像变形方法。该方法首先采用face alignment工具提取人脸照片中的88个特征点;接着在此基础上,对标准人脸网格进行校准变形,以进一步生成特定人脸的三角网格;然后根据人脸动画参数(FAP)移动相应的面部关键特征点及其附近的关联特征点,并在移动过程中保证在多个FAP的作用下的人脸三角网格拓扑结构不变;最后对发生形变的所有三角网格区域通过仿射变换进行面部纹理填充,生成了由FAP所定义的人脸表情图像。该方法的输入是一张中性人脸照片和一组人脸动画参数,输出是对应的人脸表情图像。为了实现细微表情动作和虚拟说话人的合成,还设计了一种眼神表情动作和口内细节纹理的生成算法。基于5分制(MOS)的主观评测实验表明,利用该人脸图像变形方法生成的表情脸像自然度得分为3.67。虚拟说话人合成的实验表明,该方法具有很好的实时性,在普通PC机上的平均处理速度为66.67 fps,适用于实时的视频处理和人脸动画的生成。 相似文献
17.
视觉语音参数估计在视觉语音的研究中占有重要的地位.从MPEG-4定义的人脸动画参数FAP中选择24个与发音有直接关系的参数来描述视觉语音,将统计学习方法和基于规则的方法结合起来,利用人脸颜色概率分布信息和先验形状及边缘知识跟踪嘴唇轮廓线和人脸特征点,取得了较为精确的跟踪效果.在滤除参考点跟踪中的高频噪声后,利用人脸上最为突出的4个参考点估计出主要的人脸运动姿态,从而消除了全局运动的影响,最后根据这些人脸特征点的运动计算出准确的视觉语音参数,并得到了实际应用. 相似文献
18.
Mei Si Stacy C. Marsella David V. Pynadath 《Autonomous Agents and Multi-Agent Systems》2010,20(1):14-31
Cognitive appraisal theories, which link human emotional experience to their interpretations of events happening in the environment,
are leading approaches to model emotions. Cognitive appraisal theories have often been used both for simulating “real emotions”
in virtual characters and for predicting the human user’s emotional experience to facilitate human–computer interaction. In
this work, we investigate the computational modeling of appraisal in a multi-agent decision-theoretic framework using Partially
Observable Markov Decision Process-based (POMDP) agents. Domain-independent approaches are developed for five key appraisal
dimensions (motivational relevance, motivation congruence, accountability, control and novelty). We also discuss how the modeling
of theory of mind (recursive beliefs about self and others) is realized in the agents and is critical for simulating social
emotions. Our model of appraisal is applied to three different scenarios to illustrate its usages. This work not only provides
a solution for computationally modeling emotion in POMDP-based agents, but also illustrates the tight relationship between
emotion and cognition—the appraisal dimensions are derived from the processes and information required for the agent’s decision-making
and belief maintenance processes, which suggests a uniform cognitive structure for emotion and cognition. 相似文献
19.
论文提出了一种新的基于三维人脸形变模型,并兼容于MPEG-4的三维人脸动画模型。采用基于均匀网格重采样的方法建立原型三维人脸之间的对齐,应用MPEG-4中定义的三维人脸动画规则,驱动三维模型自动生成真实感人脸动画。给定一幅人脸图像,三维人脸动画模型可自动重建其真实感的三维人脸,并根据FAP参数驱动模型自动生成人脸动画。 相似文献
20.
Robert J. Moore Nicolas Ducheneaut Eric Nickell 《Computer Supported Cooperative Work (CSCW)》2007,16(3):265-305
To date the most popular and sophisticated types of virtual worlds can be found in the area of video gaming, especially in
the genre of Massively Multiplayer Online Role Playing Games (MMORPG). Game developers have made great strides in achieving
game worlds that look and feel increasingly realistic. However, despite these achievements in the visual realism of virtual
game worlds, they are much less sophisticated when it comes to modeling face-to-face interaction. In face-to-face, ordinary
social activities are “accountable,” that is, people use a variety of kinds of observational information about what others
are doing in order to make sense of others’ actions and to tightly coordinate their own actions with others. Such information
includes: (1) the real-time unfolding of turns-at-talk; (2) the observability of embodied activities; and (3) the direction
of eye gaze for the purpose of gesturing. But despite the fact that today’s games provide virtual bodies, or “avatars,” for
players to control, these avatars display much less information about players’ current state than real bodies do. In this
paper, we discuss the impact of the lack of each type of information on players’ ability to tightly coordinate their activities
and offer guidelines for improving coordination and, ultimately, the players’ social experience.
“They come here to talk turkey with suits from around the world, and they consider it just as good as a face-to-face. They
more or less ignore what is being said—a lot gets lost in translation, after all. They pay attention to the facial expressions
and body language of the people they are talking to. And that’s how they know what’s going on inside a person’s head—by condensing
fact from the vapor of nuance.” —Neal Stephenson, Snow Crash, 1992 相似文献