首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
《Advanced Robotics》2013,27(8):827-852
The purpose of a robot is to execute tasks for people. People should be able to communicate with robots in a natural way. People naturally express themselves through body language using facial gestures and expressions. We have built a human-robot interface based on head gestures for use in robot applications. Our interface can track a person's facial features in real time (30 Hz video frame rate). No special illumination or facial makeup is needed to achieve robust tracking. We use dedicated vision hardware based on correlation image matching to implement the face tracking. Tracking using correlation matching suffers from the problems of changing shade and deformation or even disappearance of facial features. By using multiple Kalman filters we are able to overcome these problems. Our system can accurately predict and robustly track the positions of facial features despite disturbances and rapid movements of the head (including both translational and rotational motion). Since we can reliably track faces in real-time we are also able to recognize motion gestures of the face. Our system can recognize a large set of gestures (15) ranging from yes, no and may be to detecting winks, blinks and sleeping. We have used an approach that decomposes each gesture into a set of atomic actions, e.g. a nod for yes consists of an atomic up followed by a down motion. Our system can understand gestures by monitoring the transition between atomic actions.  相似文献   

3.
4.
Changes in eyebrow configuration, in conjunction with other facial expressions and head gestures, are used to signal essential grammatical information in signed languages. This paper proposes an automatic recognition system for non-manual grammatical markers in American Sign Language (ASL) based on a multi-scale, spatio-temporal analysis of head pose and facial expressions. The analysis takes account of gestural components of these markers, such as raised or lowered eyebrows and different types of periodic head movements. To advance the state of the art in non-manual grammatical marker recognition, we propose a novel multi-scale learning approach that exploits spatio-temporally low-level and high-level facial features. Low-level features are based on information about facial geometry and appearance, as well as head pose, and are obtained through accurate 3D deformable model-based face tracking. High-level features are based on the identification of gestural events, of varying duration, that constitute the components of linguistic non-manual markers. Specifically, we recognize events such as raised and lowered eyebrows, head nods, and head shakes. We also partition these events into temporal phases. We separate the anticipatory transitional movement (the onset) from the linguistically significant portion of the event, and we further separate the core of the event from the transitional movement that occurs as the articulators return to the neutral position towards the end of the event (the offset). This partitioning is essential for the temporally accurate localization of the grammatical markers, which could not be achieved at this level of precision with previous computer vision methods. In addition, we analyze and use the motion patterns of these non-manual events. Those patterns, together with the information about the type of event and its temporal phases, are defined as the high-level features. Using this multi-scale, spatio-temporal combination of low- and high-level features, we employ learning methods for accurate recognition of non-manual grammatical markers in ASL sentences.  相似文献   

5.
Facial expression is a natural and powerful means of human communication. Recognizing spontaneous facial actions, however, is very challenging due to subtle facial deformation, frequent head movements, and ambiguous and uncertain facial motion measurements. Because of these challenges, current research in facial expression recognition is limited to posed expressions and often in frontal view. A spontaneous facial expression is characterized by rigid head movements and nonrigid facial muscular movements. More importantly, it is the coherent and consistent spatiotemporal interactions among rigid and nonrigid facial motions that produce a meaningful facial expression. Recognizing this fact, we introduce a unified probabilistic facial action model based on the Dynamic Bayesian network (DBN) to simultaneously and coherently represent rigid and nonrigid facial motions, their spatiotemporal dependencies, and their image measurements. Advanced machine learning methods are introduced to learn the model based on both training data and subjective prior knowledge. Given the model and the measurements of facial motions, facial action recognition is accomplished through probabilistic inference by systematically integrating visual measurements with the facial action model. Experiments show that compared to the state-of-the-art techniques, the proposed system yields significant improvements in recognizing both rigid and nonrigid facial motions, especially for spontaneous facial expressions.  相似文献   

6.
Automatic analysis of head gestures and facial expressions is a challenging research area and it has significant applications in human-computer interfaces. We develop a face and head gesture detector in video streams. The detector is based on face landmark paradigm in that appearance and configuration information of landmarks are used. First we detect and track accurately facial landmarks using adaptive templates, Kalman predictor and subspace regularization. Then the trajectories (time series) of facial landmark positions during the course of the head gesture or facial expression are converted in various discriminative features. Features can be landmark coordinate time series, facial geometric features or patches on expressive regions of the face. We use comparatively, two feature sequence classifiers, that is, Hidden Markov Models (HMM) and Hidden Conditional Random Fields (HCRF), and various feature subspace classifiers, that is, ICA (Independent Component Analysis) and NMF (Non-negative Matrix Factorization) on the spatiotemporal data. We achieve 87.3% correct gesture classification on a seven-gesture test database, and the performance reaches 98.2% correct detection under a fusion scheme. Promising and competitive results are also achieved on classification of naturally occurring gesture clips of LIlir TwoTalk Corpus.  相似文献   

7.
8.
A real-time speech-driven synthetic talking face provides an effective multimodal communication interface in distributed collaboration environments. Nonverbal gestures such as facial expressions are important to human communication and should be considered by speech-driven face animation systems. In this paper, we present a framework that systematically addresses facial deformation modeling, automatic facial motion analysis, and real-time speech-driven face animation with expression using neural networks. Based on this framework, we learn a quantitative visual representation of the facial deformations, called the motion units (MUs). A facial deformation can be approximated by a linear combination of the MUs weighted by MU parameters (MUPs). We develop an MU-based facial motion tracking algorithm which is used to collect an audio-visual training database. Then, we construct a real-time audio-to-MUP mapping by training a set of neural networks using the collected audio-visual training database. The quantitative evaluation of the mapping shows the effectiveness of the proposed approach. Using the proposed method, we develop the functionality of real-time speech-driven face animation with expressions for the iFACE system. Experimental results show that the synthetic expressive talking face of the iFACE system is comparable with a real face in terms of the effectiveness of their influences on bimodal human emotion perception.  相似文献   

9.
Because of extensive use of different computer devices, human-computer interaction design nowadays moves towards creating user centric interfaces. It assumes incorporating different modalities that humans use in everyday communication. Virtual humans, who look and behave believably, fit perfectly in the concept of designing interfaces in more natural, effective, as well as social oriented way. In this paper we present a novel method for automatic speech driven facial gesturing for virtual humans capable of real time performance. Facial gestures included are various nods and head movements, blinks, eyebrow gestures and gaze. A mapping from speech to facial gestures is based on the prosodic information obtained from the speech signal. It is realized using a hybrid approach—Hidden Markov Models, rules and global statistics. Further, we test the method using an application prototype—a system for speech driven facial gesturing suitable for virtual presenters. Subjective evaluation of the system confirmed that the synthesized facial movements are consistent and time aligned with the underlying speech, and thus provide natural behavior of the whole face.  相似文献   

10.
In this paper a facial feature point tracker that is motivated by applications such as human-computer interfaces and facial expression analysis systems is proposed. The proposed tracker is based on a graphical model framework. The facial features are tracked through video streams by incorporating statistical relations in time as well as spatial relations between feature points. By exploiting the spatial relationships between feature points, the proposed method provides robustness in real-world conditions such as arbitrary head movements and occlusions. A Gabor feature-based occlusion detector is developed and used to handle occlusions. The performance of the proposed tracker has been evaluated on real video data under various conditions including occluded facial gestures and head movements. It is also compared to two popular methods, one based on Kalman filtering exploiting temporal relations, and the other based on active appearance models (AAM). Improvements provided by the proposed approach are demonstrated through both visual displays and quantitative analysis.  相似文献   

11.
Psychological research findings suggest that humans rely on the combined visual channels of face and body more than any other channel when they make judgments about human communicative behavior. However, most of the existing systems attempting to analyze the human nonverbal behavior are mono-modal and focus only on the face. Research that aims to integrate gestures as an expression mean has only recently emerged. Accordingly, this paper presents an approach to automatic visual recognition of expressive face and upper-body gestures from video sequences suitable for use in a vision-based affective multi-modal framework. Face and body movements are captured simultaneously using two separate cameras. For each video sequence single expressive frames both from face and body are selected manually for analysis and recognition of emotions. Firstly, individual classifiers are trained from individual modalities. Secondly, we fuse facial expression and affective body gesture information at the feature and at the decision level. In the experiments performed, the emotion classification using the two modalities achieved a better recognition accuracy outperforming classification using the individual facial or bodily modality alone.  相似文献   

12.
The recognition of facial gestures and expressions in image sequences is an important and challenging problem. Most of the existing methods adopt the following paradigm. First, facial actions/features are retrieved from the images, then the facial expression is recognized based on the retrieved temporal parameters. In contrast to this mainstream approach, this paper introduces a new approach allowing the simultaneous retrieval of facial actions and expression using a particle filter adopting multi-class dynamics that are conditioned on the expression. For each frame in the video sequence, our approach is split into two consecutive stages. In the first stage, the 3D head pose is retrieved using a deterministic registration technique based on Online Appearance Models. In the second stage, the facial actions as well as the facial expression are simultaneously retrieved using a stochastic framework based on second-order Markov chains. The proposed fast scheme is either as robust as, or more robust than existing ones in a number of respects. We describe extensive experiments and provide evaluations of performance to show the feasibility and robustness of the proposed approach.  相似文献   

13.
In this paper, we presented algorithms to assess the quality of facial images affected by factors such as blurriness, lighting conditions, head pose variations, and facial expressions. We developed face recognition prediction functions for images affected by blurriness, lighting conditions, and head pose variations based upon the eigenface technique. We also developed a classifier for images affected by facial expressions to assess their quality for recognition by the eigenface technique. Our experiments using different facial image databases showed that our algorithms are capable of assessing the quality of facial images. These algorithms could be used in a module for facial image quality assessment in a face recognition system. In the future, we will integrate the different measures of image quality to produce a single measure that indicates the overall quality of a face image  相似文献   

14.
This work investigates a new challenging problem: how to exactly recognize facial expression captured by a high-frame rate 3D sensing as early as possible, while most works generally focus on improving the recognition rate of 2D facial expression recognition. The recognition of subtle facial expressions in their early stage is unfortunately very sensitive to noise that cannot be ignored due to their low intensity. To overcome this problem, two novel feature enhancement methods, namely, adaptive wavelet spectral subtraction method and SVM-based linear discriminant analysis, are proposed to refine subtle features of facial expressions by employing an estimated noise model or not. Experiments on a custom-made dataset built using a high-speed 3D motion capture system corroborated that the two proposed methods outperform other feature refinement methods by enhancing the discriminability of subtle facial expression features and consequently make correct recognitions earlier.  相似文献   

15.
黄建峰  林奕城 《软件学报》2000,11(9):1139-1150
提出一个新的方法来产生脸部动画,即利用动作撷取系统捕捉真人脸上的细微动作,再将动态资料用来驱动脸部模型产生动画,首先,利用Oxford Metrics’VICON8系统,在真人的脸上贴了23个反光标记物,用以进行动作撷取,得到三维动态资料后,必须经过后继处理才能使用,因此,提出了消除头部运动的方法,并估计人头的旋转支点,经过处理后,剩余的动态资料代表脸部表情的变化,因此,可以直接运用到脸部模型。用  相似文献   

16.
The use of avatars with emotionally expressive faces is potentially highly beneficial to communication in collaborative virtual environments (CVEs), especially when used in a distance learning context. However, little is known about how, or indeed whether, emotions can effectively be transmitted through the medium of a CVE. Given this, an avatar head model with limited but human-like expressive abilities was built, designed to enrich CVE communication. Based on the facial action coding system (FACS), the head was designed to express, in a readily recognisable manner, the six universal emotions. An experiment was conducted to investigate the efficacy of the model. Results indicate that the approach of applying the FACS model to virtual face representations is not guaranteed to work for all expressions of a particular emotion category. However, given appropriate use of the model, emotions can effectively be visualised with a limited number of facial features. A set of exemplar facial expressions is presented.  相似文献   

17.
黄建峰  林奕成  欧阳明 《软件学报》2000,11(9):1141-1150
提出一个新的方法来产生脸部动画,即利用动作撷取系统捕捉真人脸上的细微动作,再将动态资料用来驱动脸部模型产生动画.首先,OXfor Metrics'VICON8系统,在真人的脸上贴了23全反光标记物,用以进行动作撷取.得到三维动态资料后,必须经过后继处理才能使用,因此,提出了消除头部运动的方法,并估计人头的旋转支点,经过处理后,剩余的动态资料代表脸部表情的变化,因此,可以直接运用到脸部模型.用2.5D的脸模型来实作系统,这样可兼得二维模型与三维模型的优点:简单、在小角度旋转时显得生动、自然.在脸部动务的制作中,利用一个特殊的内差公式来计算非特征点的位移,并将脸部分成数个区域,用以限制模型上三维点的移动,使动画更加自然,此动画系统在Pentium Ⅲ500MHz的机器上,并配有OpenGL的加速卡,更新率可以超过每秒30张.  相似文献   

18.
This paper presents a novel data‐driven expressive speech animation synthesis system with phoneme‐level controls. This system is based on a pre‐recorded facial motion capture database, where an actress was directed to recite a pre‐designed corpus with four facial expressions (neutral, happiness, anger and sadness). Given new phoneme‐aligned expressive speech and its emotion modifiers as inputs, a constrained dynamic programming algorithm is used to search for best‐matched captured motion clips from the processed facial motion database by minimizing a cost function. Users optionally specify ‘hard constraints’ (motion‐node constraints for expressing phoneme utterances) and ‘soft constraints’ (emotion modifiers) to guide this search process. We also introduce a phoneme–Isomap interface for visualizing and interacting phoneme clusters that are typically composed of thousands of facial motion capture frames. On top of this novel visualization interface, users can conveniently remove contaminated motion subsequences from a large facial motion dataset. Facial animation synthesis experiments and objective comparisons between synthesized facial motion and captured motion showed that this system is effective for producing realistic expressive speech animations.  相似文献   

19.
Most studies use the facial expression to recognize a user’s emotion; however, gestures, such as nodding, shaking the head, or stillness can also be indicators of the user’s emotion. In our research, we use the facial expression and gestures to detect and recognize a user’s emotion. The pervasive Microsoft Kinect sensor captures video data, from which several features representing facial expressions and gestures are extracted. An in-house extensible markup language-based genetic programming engine (XGP) evolves the emotion recognition module of our system. To improve the computational performance of the recognition module, we implemented and compared several approaches, including directed evolution, collaborative filtering via canonical voting, and a genetic algorithm, for an automated voting system. The experimental results indicate that XGP is feasible for evolving emotion classifiers. In addition, the obtained results verify that collaborative filtering improves the generality of recognition. From a psychological viewpoint, the results prove that different people might express their emotions differently, as the emotion classifiers that are evolved for particular users might not be applied successfully to other user(s).  相似文献   

20.
Understanding facial expressions in image sequences is an easy task for humans. Some of us are capable of lipreading by interpreting the motion of the mouth. Automatic lipreading by a computer is a challenging task, with so far limited success. The inverse problem of synthesizing real looking lip movements is also highly non-trivial. Today, the technology to automatically generate an image series that imitates natural postures is far from perfect. We introduce a new framework for facial image representation, analysis and synthesis, in which we focus just on the lower half of the face, specifically the mouth. It includes interpretation and classification of facial expressions and visual speech recognition, as well as a synthesis procedure of facial expressions that yields natural looking mouth movements. Our image analysis and synthesis processes are based on a parametrization of the mouth configuration set of images. These images are represented as points on a two-dimensional flat manifold that enables us to efficiently define the pronunciation of each word and thereby analyze or synthesize the motion of the lips. We present some examples of automatic lips motion synthesis and lipreading, and propose a generalization of our solution to the problem of lipreading different subjects.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号