首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 437 毫秒
1.
Emerging significance of person-independent, emotion specific facial feature tracking has been actively tracked in the machine vision society for decades. Among distinct methods, the Constrained Local Model (CLM) has shown significant results in person-independent feature tracking. In this paper, we propose an automatic, efficient, and robust method for emotion specific facial feature detection and tracking from image sequences. A novel tracking system along with 17-point feature model on the frontal face region has also been proposed to facilitate the tracking of human basic facial expressions. The proposed feature tracking system keeps patch images and face shapes till certain number of key frames incorporating CLM-based tracker. After that, incremental patch and shape clustering algorithms is applied to build appearance model and structure model of similar patches and similar shapes respectively. The clusters in each model are built and updated incrementally and online, controlled by amount of facial muscle movement. The overall performance of the proposed Robust Incremental Clustering-based Facial Feature Tracking (RICFFT) is evaluated on the FGnet database and the Extended Cohn-Kanade (CK+) database. RICFFT demonstrates mean tracking accuracy of 97.45% and 96.64% for FGnet and CK+ database respectively. Also, RICFFT is more robust by minimizing average shape distortion error of 0.20% and 1.86% for FGnet and CK+ (apex frame) database, as compared with classic method CLM.  相似文献   

2.
This paper presents a hierarchical multi-state pose-dependent approach for facial feature detection and tracking under varying facial expression and face pose. For effective and efficient representation of feature points, a hybrid representation that integrates Gabor wavelets and gray-level profiles is proposed. To model the spatial relations among feature points, a hierarchical statistical face shape model is proposed to characterize both the global shape of human face and the local structural details of each facial component. Furthermore, multi-state local shape models are introduced to deal with shape variations of some facial components under different facial expressions. During detection and tracking, both facial component states and feature point positions, constrained by the hierarchical face shape model, are dynamically estimated using a switching hypothesized measurements (SHM) model. Experimental results demonstrate that the proposed method accurately and robustly tracks facial features in real time under different facial expressions and face poses.  相似文献   

3.
针对跨年龄人脸验证任务中面部纹理、形状特征变化的问题,提出一种基于双编码平均局部二值模式(dual-coded average local binary pattern,DCALBP)与深度学习算法相结合的多任务人脸验证算法.首先,使用多任务卷积神经网络(multi-task convolutional neural ...  相似文献   

4.
基于肤色模型和椭圆环模板的人脸跟踪及姿态估计   总被引:3,自引:0,他引:3  
文章提出了一种基于肤色模型,结合椭圆环模板进行人脸跟踪及姿态估计的算法。该算法在基于肤色模型实现人脸跟踪及特征定位的过程中首先利用肤色模型定位人脸肤色区域,在跟踪中增加了自适应学习模块,使得原始的肤色模型能够在不同光照下实现自适应调整。然后利用人脸形状的先验知识,通过椭圆环模板实现人脸边缘的精确定位。最后根据所得到的面部特征和人脸边缘位置估计出人脸的姿态。实验表明,该算法能够在自然光照条件下取得较为满意的跟踪结果,同时对人脸在旋转、缩放、遮挡等条件下,多人脸背景下的跟踪有较强的鲁邦性。  相似文献   

5.
Most of the research on sign language recognition concentrates on recognizing only manual signs (hand gestures and shapes), discarding a very important component: the non-manual signals (facial expressions and head/shoulder motion). We address the recognition of signs with both manual and non-manual components using a sequential belief-based fusion technique. The manual components, which carry information of primary importance, are utilized in the first stage. The second stage, which makes use of non-manual components, is only employed if there is hesitation in the decision of the first stage. We employ belief formalism both to model the hesitation and to determine the sign clusters within which the discrimination takes place in the second stage. We have implemented this technique in a sign tutor application. Our results on the eNTERFACE’06 ASL database show an improvement over the baseline system which uses parallel or feature fusion of manual and non-manual features: we achieve an accuracy of 81.6%.  相似文献   

6.
疲劳驾驶研究中,面部关键特征精确定位与跟踪是个难点。提出了一种基于主动形状模型ASM和肤色模型的疲劳驾驶检测方法。首先,利用肤色模型检测到人脸区域为ASM提供初始定位;然后基于ASM进行人眼和嘴巴跟踪获得眼睛与嘴巴区域;再利用Canny算子对两个区域精确定位,获得疲劳检测参数;最后根据PERCLOS方法实现疲劳检测。考虑到基于HSV颜色模型的人脸检测不受姿势和角度的影响,但容易受到背景干扰,而ASM的优点是人脸关键点跟踪效果好,但初始定位困难,将二者结合实现了眼睛与嘴巴精确定位与跟踪。实验表明,眼睛检测准确率可以达到90.7%,哈欠检测准确率可以达到83.3%,疲劳检测准确率达到91.4%。  相似文献   

7.
基于统计模型与Gabor小波的人脸对齐   总被引:1,自引:0,他引:1  
余棉水  黎绍发 《计算机应用》2005,25(8):1771-1773
将基于Gabor小波的人脸特征点跟踪算法与基于统计模型的主动外观模型AAM人脸特征点定位方法结合起来,实现视频中人脸的自动对齐。先利用Gabor小波进行特征点跟踪,其结果作为AAM的初始形状。利用AAM的全局形状和纹理信息作为约束,对Gabor小波的局部跟踪错误进行校正。实验表明,该方法是有效的。  相似文献   

8.
Kim  Hyungjoon  Kim  HyeonWoo  Hwang  Eenjun 《Multimedia Tools and Applications》2020,79(23-24):15945-15963

Detection of facial landmarks and accurate tracking of their shape are essential in real-time applications such as virtual makeup, where users can see the makeup’s effect by moving their face in diverse directions. Typical face tracking techniques detect facial landmarks and track them using a point tracker such as the Kanade-Lucas-Tomasi (KLT) point tracker. Typically, 5 or 64 points are used for tracking a face. Even though these points are enough to track the approximate locations of facial landmarks, they are not sufficient to track the exact shape of facial landmarks. In this paper, we propose a method that can track the exact shape of facial landmarks in real-time by combining a deep learning technique and a point tracker. We detect facial landmarks accurately using SegNet, which performs semantic segmentation based on deep learning. Edge points of detected landmarks are tracked using the KLT point tracker. In spite of its popularity, the KLT point tracker suffers from the point loss problem. We solve this problem by executing SegNet periodically to recalculate the shape of facial landmarks. That is, by combining the two techniques, we can avoid the computational overhead of SegNet and the point loss problem of the KLT point tracker, which leads to accurate real-time shape tracking. We performed several experiments to evaluate the performance of our method and report some of the results herein.

  相似文献   

9.
10.
This paper presents an integrated approach for tracking hands, faces and specific facial features (eyes, nose, and mouth) in image sequences. For hand and face tracking, we employ a state-of-the-art blob tracker which is specifically trained to track skin-colored regions. In this paper we extend the skin color tracker by proposing an incremental probabilistic classifier, which can be used to maintain and continuously update the belief about the class of each tracked blob, which can be left-hand, right hand or face as well as to associate hand blobs with their corresponding faces. An additional contribution of this paper is related to the employment of a novel method for the detection and tracking of specific facial features within each detected facial blob which consists of an appearance-based detector and a feature-based tracker. The proposed approach is intended to provide input for the analysis of hand gestures and facial expressions that humans utilize while engaged in various conversational states with robots that operate autonomously in public places. It has been integrated into a system which runs in real time on a conventional personal computer which is located on a mobile robot. Experimental results confirm its effectiveness for the specific task at hand.  相似文献   

11.
Person-independent, emotion specific facial feature tracking have been of interest in the machine vision society for decades. Among various methods, the constrained local model (CLM) has shown significant results in person-independent feature tracking. In 63this paper, we propose an automatic, efficient, and robust method for emotion specific facial feature detection and tracking from image sequences. Considering a 17-point feature model on the frontal face region, the proposed tracking framework incorporates CLM with two incremental clustering algorithms to increase robustness and minimize tracking error during feature tracking. The Patch Clustering algorithm is applied to build an appearance model of face frames by organizing previously encountered similar patches into clusters while the shape Clustering algorithm is applied to build a structure model of face shapes by organizing previously encountered similar shapes into clusters followed by Bayesian adaptive resonance theory (ART). Both models are used to explore the similar features/shapes in the successive images. The clusters in each model are built and updated incrementally and online, controlled by amount of facial muscle movement. The overall performance of the proposed incremental clustering-based facial feature tracking (ICFFT) is evaluated using the FGnet database and the extended Cohn-Kanade (CK+) database. ICFFT demonstrates better results than baseline-method CLM and provides robust tracking as well as improved localization accuracy of emotion specific facial features tracking.  相似文献   

12.
Human face plays a crucial role in interpersonal communication. If we synthesize vivid expressional face in cyberspace, we could make the interaction between computer and human more natural and friendly. In this paper, we present a simple methodology for mimicking realistic face by manipulating emotional states. Compared with traditional methods of facial expression synthesis, our approach takes three advantages at the same time. They are (1) generating facial expressions under quantitative control of emotional states, (2) rendering shape and illumination changes on face simultaneously and (3) synthesizing expressional face for any new person by only utilizing a neutral face image. We have discussed the implementation method in the paper and demonstrated the effects of our approach by using a series of interesting experiments, such as predicting unseen expressions for an unfamiliar person, simulating one’s facial expressions with someone else’s style, extracting pure emotional expressions from the admixtures.  相似文献   

13.
Face localization, feature extraction, and modeling are the major issues in automatic facial expression recognition. In this paper, a method for facial expression recognition is proposed. A face is located by extracting the head contour points using the motion information. A rectangular bounding box is fitted for the face region using those extracted contour points. Among the facial features, eyes are the most prominent features used for determining the size of a face. Hence eyes are located and the visual features of a face are extracted based on the locations of eyes. The visual features are modeled using support vector machine (SVM) for facial expression recognition. The SVM finds an optimal hyperplane to distinguish different facial expressions with an accuracy of 98.5%.  相似文献   

14.
As is widely recognized, sign language recognition is a very challenging visual recognition problem. In this paper, we propose a feature covariance matrix based serial particle filter for isolated sign language recognition. At the preprocessing stage, the fusion of the median and mode filters is employed to extract the foreground and thereby enhances hand detection. We propose to serially track the hands of the signer, as opposed to tracking both hands at the same time, to reduce the misdirection of target objects. Subsequently, the region around the tracked hands is extracted to generate the feature covariance matrix as a compact representation of the tracked hand gesture, and thereby reduce the dimensionality of the features. In addition, the proposed feature covariance matrix is able to adapt to new signs due to its ability to integrate multiple correlated features in a natural way, without any retraining process. The experimental results show that the hand trajectories as obtained through the proposed serial hand tracking are closer to the ground truth. The sign gesture recognition based on the proposed methods yields a 87.33% recognition rate for the American Sign Language. The proposed hand tracking and feature extraction methodology is an important milestone in the development of expert systems designed for sign language recognition, such as automated sign language translation systems.  相似文献   

15.
《Real》1996,2(2):67-79
Many researchers have studied techniques related to the analysis and synthesis of human heads under motion with face deformations. These techniques can be used for defining low-rate image compression algorithms (model-based image coding), cinema technologies, video-phones, as well as for applications of virtual reality, etc. Such techniques need a real-time performance and a strong integration between the mechanisms of motion estimation and those of rendering and animation of the 3D synthetic head/face. In this paper, a complete and integrated system for tracking and synthesizing facial motions in real-time with low-cost architectures is presented. Facial deformations curves represented as spatiotemporal B-splines are used for tracking in order to model the main facial features. In addition, the system proposed is capable of adapting a generic 3D wire-frame model of a head/face to the face that must be tracked; therefore, the simulations of the face deformations are produced by using a realistic patterned face.  相似文献   

16.
基于一般人脸模型修改的特定人脸合成技术   总被引:16,自引:2,他引:14  
目前计算机模拟领域中的人脸合成技术由于其广泛的应用前景得到了越来越多的重视,它可以用于人们语言感知模型研究、虚拟环境、通信技术、辅助教学、医疗研究、电影制作、游戏娱乐等诸多方面。由于在实现中人脸的千差万别,使一般人脸模型的个体化成为人脸合成中的一项关键技术。个体人脸的差异主要表现在不同的面部几何特征和纹理特征两方面。针对这两方面,在已知一般人脸中性模型与一般人脸基本表情模型基础上,根据特定人脸不同  相似文献   

17.
Optical flow provides a constraint on the motion of a deformable model. We derive and solve a dynamic system incorporating flow as a hard constraint, producing a model-based least-squares optical flow solution. Our solution also ensures the constraint remains satisfied when combined with edge information, which helps combat tracking error accumulation. Constraint enforcement can be relaxed using a Kalman filter, which permits controlled constraint violations based on the noise present in the optical flow information, and enables optical flow and edge information to be combined more robustly and efficiently. We apply this framework to the estimation of face shape and motion using a 3D deformable face model. This model uses a small number of parameters to describe a rich variety of face shapes and facial expressions. We present experiments in extracting the shape and motion of a face from image sequences which validate the accuracy of the method. They also demonstrate that our treatment of optical flow as a hard constraint, as well as our use of a Kalman filter to reconcile these constraints with the uncertainty in the optical flow, are vital for improving the performance of our system.  相似文献   

18.
为了从一幅人脸图像中合成出该人脸其他姿态和表情下的图像,提出了一种基于张量子空间的多姿态人脸表情合成方法。首先,用标记过特征点的人脸图像集构造四维纹理特征张量和形状张量;其次,通过张量分解得到核张量以及各维的投影子空间(人物标识、表情、姿态、特征维);最后应用核张量以及表情、姿态子空间构造新的张量用于姿态、表情的合成,在合成新人脸图像的时候充分利用了影响人脸的各因素间的内在关系。实验结果表明,所提方法可以利用一张已知表情和姿态的人脸图合成出自然合理的其他姿态表情下的该人脸图像。  相似文献   

19.
Cemil  Ming C.   《Neurocomputing》2007,70(16-18):2891
Sign language (SL), which is a highly visual–spatial, linguistically complete, and natural language, is the main mode of communication among deaf people. Described in this paper are two different American Sign Language (ASL) word recognition systems developed using artificial neural networks (ANN) to translate the ASL words into English. Feature vectors of signing words taken at five time instants were used in the first system, while histograms of feature vectors of signing words were used in the second system. The systems use a sensory glove, Cyberglove™, and a Flock of Birds® 3-D motion tracker to extract the gesture features. The finger joint angle data obtained from strain gauges in the sensory glove define the hand shape, and the data from the tracker describe the trajectory of hand movement. In both systems, the data from these devices were processed by two neural networks: a velocity network and a word recognition network. The velocity network uses hand speed to determine the duration of words. Signs are defined by feature vectors such as hand shape, hand location, orientation, movement, bounding box, and distance. The second network was used as a classifier to convert ASL signs into words based on features or histograms of these features. We trained and tested our ANN models with 60 ASL words for a different number of samples. These methods were compared with each other. Our test results show that the accuracy of recognition of these two systems is 92% and 95%, respectively.  相似文献   

20.
Changes in eyebrow configuration, in conjunction with other facial expressions and head gestures, are used to signal essential grammatical information in signed languages. This paper proposes an automatic recognition system for non-manual grammatical markers in American Sign Language (ASL) based on a multi-scale, spatio-temporal analysis of head pose and facial expressions. The analysis takes account of gestural components of these markers, such as raised or lowered eyebrows and different types of periodic head movements. To advance the state of the art in non-manual grammatical marker recognition, we propose a novel multi-scale learning approach that exploits spatio-temporally low-level and high-level facial features. Low-level features are based on information about facial geometry and appearance, as well as head pose, and are obtained through accurate 3D deformable model-based face tracking. High-level features are based on the identification of gestural events, of varying duration, that constitute the components of linguistic non-manual markers. Specifically, we recognize events such as raised and lowered eyebrows, head nods, and head shakes. We also partition these events into temporal phases. We separate the anticipatory transitional movement (the onset) from the linguistically significant portion of the event, and we further separate the core of the event from the transitional movement that occurs as the articulators return to the neutral position towards the end of the event (the offset). This partitioning is essential for the temporally accurate localization of the grammatical markers, which could not be achieved at this level of precision with previous computer vision methods. In addition, we analyze and use the motion patterns of these non-manual events. Those patterns, together with the information about the type of event and its temporal phases, are defined as the high-level features. Using this multi-scale, spatio-temporal combination of low- and high-level features, we employ learning methods for accurate recognition of non-manual grammatical markers in ASL sentences.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号