期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Head and facial gestures synthesis using PAD model for an expressive talking avatar

Jia Jia Zhiyong Wu Shen Zhang Helen M. Meng Lianhong Cai 《Multimedia Tools and Applications》2014,73(1):439-461

相似文献

2.

A novel visual interface for human-robot communication

《Advanced Robotics》2013,27(8):827-852

The purpose of a robot is to execute tasks for people. People should be able to communicate with robots in a natural way. People naturally express themselves through body language using facial gestures and expressions. We have built a human-robot interface based on head gestures for use in robot applications. Our interface can track a person's facial features in real time (30 Hz video frame rate). No special illumination or facial makeup is needed to achieve robust tracking. We use dedicated vision hardware based on correlation image matching to implement the face tracking. Tracking using correlation matching suffers from the problems of changing shade and deformation or even disappearance of facial features. By using multiple Kalman filters we are able to overcome these problems. Our system can accurately predict and robustly track the positions of facial features despite disturbances and rapid movements of the head (including both translational and rotational motion). Since we can reliably track faces in real-time we are also able to recognize motion gestures of the face. Our system can recognize a large set of gestures (15) ranging from yes, no and may be to detecting winks, blinks and sleeping. We have used an approach that decomposes each gesture into a set of atomic actions, e.g. a nod for yes consists of an atomic up followed by a down motion. Our system can understand gestures by monitoring the transition between atomic actions. 相似文献

3.

IEMOCAP: interactive emotional dyadic motion capture database

Carlos Busso Murtaza Bulut Chi-Chun Lee Abe Kazemzadeh Emily Mower Samuel Kim Jeannette N. Chang Sungbok Lee Shrikanth S. Narayanan 《Language Resources and Evaluation》2008,42(4):335-359

相似文献

4.

Non-manual grammatical marker recognition based on multi-scale,spatio-temporal analysis of head pose and facial expressions

Jingjing Liu Bo Liu Shaoting Zhang Fei Yang Peng Yang Dimitris N. Metaxas Carol Neidle 《Image and vision computing》2014

Changes in eyebrow configuration, in conjunction with other facial expressions and head gestures, are used to signal essential grammatical information in signed languages. This paper proposes an automatic recognition system for non-manual grammatical markers in American Sign Language (ASL) based on a multi-scale, spatio-temporal analysis of head pose and facial expressions. The analysis takes account of gestural components of these markers, such as raised or lowered eyebrows and different types of periodic head movements. To advance the state of the art in non-manual grammatical marker recognition, we propose a novel multi-scale learning approach that exploits spatio-temporally low-level and high-level facial features. Low-level features are based on information about facial geometry and appearance, as well as head pose, and are obtained through accurate 3D deformable model-based face tracking. High-level features are based on the identification of gestural events, of varying duration, that constitute the components of linguistic non-manual markers. Specifically, we recognize events such as raised and lowered eyebrows, head nods, and head shakes. We also partition these events into temporal phases. We separate the anticipatory transitional movement (the onset) from the linguistically significant portion of the event, and we further separate the core of the event from the transitional movement that occurs as the articulators return to the neutral position towards the end of the event (the offset). This partitioning is essential for the temporally accurate localization of the grammatical markers, which could not be achieved at this level of precision with previous computer vision methods. In addition, we analyze and use the motion patterns of these non-manual events. Those patterns, together with the information about the type of event and its temporal phases, are defined as the high-level features. Using this multi-scale, spatio-temporal combination of low- and high-level features, we employ learning methods for accurate recognition of non-manual grammatical markers in ASL sentences. 相似文献

5.

A Unified Probabilistic Framework for Spontaneous Facial Action Modeling and Understanding

Tong Yan Chen Jixu Ji Qiang 《IEEE transactions on pattern analysis and machine intelligence》2010,32(2):258-273

Facial expression is a natural and powerful means of human communication. Recognizing spontaneous facial actions, however, is very challenging due to subtle facial deformation, frequent head movements, and ambiguous and uncertain facial motion measurements. Because of these challenges, current research in facial expression recognition is limited to posed expressions and often in frontal view. A spontaneous facial expression is characterized by rigid head movements and nonrigid facial muscular movements. More importantly, it is the coherent and consistent spatiotemporal interactions among rigid and nonrigid facial motions that produce a meaningful facial expression. Recognizing this fact, we introduce a unified probabilistic facial action model based on the Dynamic Bayesian network (DBN) to simultaneously and coherently represent rigid and nonrigid facial motions, their spatiotemporal dependencies, and their image measurements. Advanced machine learning methods are introduced to learn the model based on both training data and subjective prior knowledge. Given the model and the measurements of facial motions, facial action recognition is accomplished through probabilistic inference by systematically integrating visual measurements with the facial action model. Experiments show that compared to the state-of-the-art techniques, the proposed system yields significant improvements in recognizing both rigid and nonrigid facial motions, especially for spontaneous facial expressions. 相似文献

6.

Robust classification of face and head gestures in video

Hatice Ç?nar Akak?n Bülent Sankur 《Image and vision computing》2011,29(7):470-483

Automatic analysis of head gestures and facial expressions is a challenging research area and it has significant applications in human-computer interfaces. We develop a face and head gesture detector in video streams. The detector is based on face landmark paradigm in that appearance and configuration information of landmarks are used. First we detect and track accurately facial landmarks using adaptive templates, Kalman predictor and subspace regularization. Then the trajectories (time series) of facial landmark positions during the course of the head gesture or facial expression are converted in various discriminative features. Features can be landmark coordinate time series, facial geometric features or patches on expressive regions of the face. We use comparatively, two feature sequence classifiers, that is, Hidden Markov Models (HMM) and Hidden Conditional Random Fields (HCRF), and various feature subspace classifiers, that is, ICA (Independent Component Analysis) and NMF (Non-negative Matrix Factorization) on the spatiotemporal data. We achieve 87.3% correct gesture classification on a seven-gesture test database, and the performance reaches 98.2% correct detection under a fusion scheme. Promising and competitive results are also achieved on classification of naturally occurring gesture clips of LIlir TwoTalk Corpus. 相似文献

7.

Recognizing Facial Expressions in Image Sequences Using Local Parameterized Models of Image Motion 总被引：12，自引：6，他引：12

Black Michael J. Yacoob Yaser 《International Journal of Computer Vision》1997,25(1):23-48

相似文献

8.

Real-time speech-driven face animation with expressions using neural networks 总被引：3，自引：0，他引：3

Pengyu Hong Zhen Wen Huang T.S. 《Neural Networks, IEEE Transactions on》2002,13(4):916-927

A real-time speech-driven synthetic talking face provides an effective multimodal communication interface in distributed collaboration environments. Nonverbal gestures such as facial expressions are important to human communication and should be considered by speech-driven face animation systems. In this paper, we present a framework that systematically addresses facial deformation modeling, automatic facial motion analysis, and real-time speech-driven face animation with expression using neural networks. Based on this framework, we learn a quantitative visual representation of the facial deformations, called the motion units (MUs). A facial deformation can be approximated by a linear combination of the MUs weighted by MU parameters (MUPs). We develop an MU-based facial motion tracking algorithm which is used to collect an audio-visual training database. Then, we construct a real-time audio-to-MUP mapping by training a set of neural networks using the collected audio-visual training database. The quantitative evaluation of the mapping shows the effectiveness of the proposed approach. Using the proposed method, we develop the functionality of real-time speech-driven face animation with expressions for the iFACE system. Experimental results show that the synthetic expressive talking face of the iFACE system is comparable with a real face in terms of the effectiveness of their influences on bimodal human emotion perception. 相似文献

9.

On creating multimodal virtual humans—real time speech driven facial gesturing

Goranka?Zoric Email author Rober?Forchheimer Igor?S.?Pandzic 《Multimedia Tools and Applications》2011,54(1):165-179

Because of extensive use of different computer devices, human-computer interaction design nowadays moves towards creating user centric interfaces. It assumes incorporating different modalities that humans use in everyday communication. Virtual humans, who look and behave believably, fit perfectly in the concept of designing interfaces in more natural, effective, as well as social oriented way. In this paper we present a novel method for automatic speech driven facial gesturing for virtual humans capable of real time performance. Facial gestures included are various nods and head movements, blinks, eyebrow gestures and gaze. A mapping from speech to facial gestures is based on the prosodic information obtained from the speech signal. It is realized using a hybrid approach—Hidden Markov Models, rules and global statistics. Further, we test the method using an application prototype—a system for speech driven facial gesturing suitable for virtual presenters. Subjective evaluation of the system confirmed that the synthesized facial movements are consistent and time aligned with the underlying speech, and thus provide natural behavior of the whole face. 相似文献

10.

A graphical model based solution to the facial feature point tracking problem

Serhan Co?ar Müjdat Çetin 《Image and vision computing》2011,29(5):335-350

In this paper a facial feature point tracker that is motivated by applications such as human-computer interfaces and facial expression analysis systems is proposed. The proposed tracker is based on a graphical model framework. The facial features are tracked through video streams by incorporating statistical relations in time as well as spatial relations between feature points. By exploiting the spatial relationships between feature points, the proposed method provides robustness in real-world conditions such as arbitrary head movements and occlusions. A Gabor feature-based occlusion detector is developed and used to handle occlusions. The performance of the proposed tracker has been evaluated on real video data under various conditions including occluded facial gestures and head movements. It is also compared to two popular methods, one based on Kalman filtering exploiting temporal relations, and the other based on active appearance models (AAM). Improvements provided by the proposed approach are demonstrated through both visual displays and quantitative analysis. 相似文献

11.

Bi-modal emotion recognition from expressive face and body gestures

Hatice Gunes Massimo Piccardi 《Journal of Network and Computer Applications》2007,30(4):1334-1345

Psychological research findings suggest that humans rely on the combined visual channels of face and body more than any other channel when they make judgments about human communicative behavior. However, most of the existing systems attempting to analyze the human nonverbal behavior are mono-modal and focus only on the face. Research that aims to integrate gestures as an expression mean has only recently emerged. Accordingly, this paper presents an approach to automatic visual recognition of expressive face and upper-body gestures from video sequences suitable for use in a vision-based affective multi-modal framework. Face and body movements are captured simultaneously using two separate cameras. For each video sequence single expressive frames both from face and body are selected manually for analysis and recognition of emotions. Firstly, individual classifiers are trained from individual modalities. Secondly, we fuse facial expression and affective body gesture information at the feature and at the decision level. In the experiments performed, the emotion classification using the two modalities achieved a better recognition accuracy outperforming classification using the individual facial or bodily modality alone. 相似文献

12.

Simultaneous Facial Action Tracking and Expression Recognition in the Presence of Head Motion 总被引：1，自引：0，他引：1

Fadi Dornaika Franck Davoine 《International Journal of Computer Vision》2008,76(3):257-281

The recognition of facial gestures and expressions in image sequences is an important and challenging problem. Most of the existing methods adopt the following paradigm. First, facial actions/features are retrieved from the images, then the facial expression is recognized based on the retrieved temporal parameters. In contrast to this mainstream approach, this paper introduces a new approach allowing the simultaneous retrieval of facial actions and expression using a particle filter adopting multi-class dynamics that are conditioned on the expression. For each frame in the video sequence, our approach is split into two consecutive stages. In the first stage, the 3D head pose is retrieved using a deterministic registration technique based on Online Appearance Models. In the second stage, the facial actions as well as the facial expression are simultaneously retrieved using a stochastic framework based on second-order Markov chains. The proposed fast scheme is either as robust as, or more robust than existing ones in a number of respects. We describe extensive experiments and provide evaluations of performance to show the feasibility and robustness of the proposed approach. 相似文献

13.

Application notes - Algorithms for Assessing the Quality of Facial Images

Abdel-Mottaleb M. Mahoor M.H. 《Computational Intelligence Magazine, IEEE》2007,2(2):10-17

In this paper, we presented algorithms to assess the quality of facial images affected by factors such as blurriness, lighting conditions, head pose variations, and facial expressions. We developed face recognition prediction functions for images affected by blurriness, lighting conditions, and head pose variations based upon the eigenface technique. We also developed a classifier for images affected by facial expressions to assess their quality for recognition by the eigenface technique. Our experiments using different facial image databases showed that our algorithms are capable of assessing the quality of facial images. These algorithms could be used in a module for facial image quality assessment in a face recognition system. In the future, we will integrate the different measures of image quality to produce a single measure that indicates the overall quality of a face image 相似文献

14.

Visual facial expression modeling and early predicting from 3D data via subtle feature enhancing

Lumei Su Feng Lu 《Multimedia Tools and Applications》2016,75(20):12563-12580

This work investigates a new challenging problem: how to exactly recognize facial expression captured by a high-frame rate 3D sensing as early as possible, while most works generally focus on improving the recognition rate of 2D facial expression recognition. The recognition of subtle facial expressions in their early stage is unfortunately very sensitive to noise that cannot be ignored due to their low intensity. To overcome this problem, two novel feature enhancement methods, namely, adaptive wavelet spectral subtraction method and SVM-based linear discriminant analysis, are proposed to refine subtle features of facial expressions by employing an estimated noise model or not. Experiments on a custom-made dataset built using a high-speed 3D motion capture system corroborated that the two proposed methods outperform other feature refinement methods by enhancing the discriminability of subtle facial expression features and consequently make correct recognitions earlier. 相似文献

15.

将高解析度动作撷取资料校正应用在脸部动画系统

黄建峰林奕城《软件学报》2000,11(9):1139-1150

提出一个新的方法来产生脸部动画,即利用动作撷取系统捕捉真人脸上的细微动作,再将动态资料用来驱动脸部模型产生动画,首先,利用ＯｘｆｏｒｄＭｅｔｒｉｃｓ’ＶＩＣＯＮ８系统,在真人的脸上贴了２３个反光标记物,用以进行动作撷取,得到三维动态资料后,必须经过后继处理才能使用,因此,提出了消除头部运动的方法,并估计人头的旋转支点,经过处理后,剩余的动态资料代表脸部表情的变化,因此,可以直接运用到脸部模型。用相似文献

16.

Mediating the expression of emotion in educational collaborative virtual environments: an experimental study

Marc?Fabri Email author David?Moore Dave?Hobbs 《Virtual Reality》2004,7(2):66-81

The use of avatars with emotionally expressive faces is potentially highly beneficial to communication in collaborative virtual environments (CVEs), especially when used in a distance learning context. However, little is known about how, or indeed whether, emotions can effectively be transmitted through the medium of a CVE. Given this, an avatar head model with limited but human-like expressive abilities was built, designed to enrich CVE communication. Based on the facial action coding system (FACS), the head was designed to express, in a readily recognisable manner, the six universal emotions. An experiment was conducted to investigate the efficacy of the model. Results indicate that the approach of applying the FACS model to virtual face representations is not guaranteed to work for all expressions of a particular emotion category. However, given appropriate use of the model, emotions can effectively be visualised with a limited number of facial features. A set of exemplar facial expressions is presented. 相似文献

17.

将高解析度动作撷取资料校正应用在脸部画系统

黄建峰林奕成欧阳明《软件学报》2000,11(9):1141-1150

提出一个新的方法来产生脸部动画,即利用动作撷取系统捕捉真人脸上的细微动作,再将动态资料用来驱动脸部模型产生动画.首先,OXfor Metrics'VICON8系统,在真人的脸上贴了23全反光标记物,用以进行动作撷取.得到三维动态资料后,必须经过后继处理才能使用,因此,提出了消除头部运动的方法,并估计人头的旋转支点,经过处理后,剩余的动态资料代表脸部表情的变化,因此,可以直接运用到脸部模型.用2.5D的脸模型来实作系统,这样可兼得二维模型与三维模型的优点:简单、在小角度旋转时显得生动、自然.在脸部动务的制作中,利用一个特殊的内差公式来计算非特征点的位移,并将脸部分成数个区域,用以限制模型上三维点的移动,使动画更加自然,此动画系统在Pentium Ⅲ500MHz的机器上,并配有OpenGL的加速卡,更新率可以超过每秒30张. 相似文献

18.

Expressive Speech Animation Synthesis with Phoneme‐Level Controls

Z. Deng U. Neumann 《Computer Graphics Forum》2008,27(8):2096-2113

This paper presents a novel data‐driven expressive speech animation synthesis system with phoneme‐level controls. This system is based on a pre‐recorded facial motion capture database, where an actress was directed to recite a pre‐designed corpus with four facial expressions (neutral, happiness, anger and sadness). Given new phoneme‐aligned expressive speech and its emotion modifiers as inputs, a constrained dynamic programming algorithm is used to search for best‐matched captured motion clips from the processed facial motion database by minimizing a cost function. Users optionally specify ‘hard constraints’ (motion‐node constraints for expressing phoneme utterances) and ‘soft constraints’ (emotion modifiers) to guide this search process. We also introduce a phoneme–Isomap interface for visualizing and interacting phoneme clusters that are typically composed of thousands of facial motion capture frames. On top of this novel visualization interface, users can conveniently remove contaminated motion subsequences from a large facial motion dataset. Facial animation synthesis experiments and objective comparisons between synthesized facial motion and captured motion showed that this system is effective for producing realistic expressive speech animations. 相似文献

19.

Evolving an emotion recognition module for an intelligent agent using genetic programming and a genetic algorithm

Rahadian Yusuf Dipak G. Sharma Ivan Tanev Katsunori Shimohara 《Artificial Life and Robotics》2016,21(1):85-90

Most studies use the facial expression to recognize a user’s emotion; however, gestures, such as nodding, shaking the head, or stillness can also be indicators of the user’s emotion. In our research, we use the facial expression and gestures to detect and recognize a user’s emotion. The pervasive Microsoft Kinect sensor captures video data, from which several features representing facial expressions and gestures are extracted. An in-house extensible markup language-based genetic programming engine (XGP) evolves the emotion recognition module of our system. To improve the computational performance of the recognition module, we implemented and compared several approaches, including directed evolution, collaborative filtering via canonical voting, and a genetic algorithm, for an automated voting system. The experimental results indicate that XGP is feasible for evolving emotion classifiers. In addition, the obtained results verify that collaborative filtering improves the generality of recognition. From a psychological viewpoint, the results prove that different people might express their emotions differently, as the emotion classifiers that are evolved for particular users might not be applied successfully to other user(s). 相似文献

20.

Representation Analysis and Synthesis of Lip Images Using Dimensionality Reduction

Michal Aharon Ron Kimmel 《International Journal of Computer Vision》2006,67(3):297-312

Understanding facial expressions in image sequences is an easy task for humans. Some of us are capable of lipreading by interpreting the motion of the mouth. Automatic lipreading by a computer is a challenging task, with so far limited success. The inverse problem of synthesizing real looking lip movements is also highly non-trivial. Today, the technology to automatically generate an image series that imitates natural postures is far from perfect. We introduce a new framework for facial image representation, analysis and synthesis, in which we focus just on the lower half of the face, specifically the mouth. It includes interpretation and classification of facial expressions and visual speech recognition, as well as a synthesis procedure of facial expressions that yields natural looking mouth movements. Our image analysis and synthesis processes are based on a parametrization of the mouth configuration set of images. These images are represented as points on a two-dimensional flat manifold that enables us to efficiently define the pronunciation of each word and thereby analyze or synthesize the motion of the lips. We present some examples of automatic lips motion synthesis and lipreading, and propose a generalization of our solution to the problem of lipreading different subjects. 相似文献