期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Combining appearance and motion for face and gender recognition from videos

Abdenour Hadid Author Vitae Matti Pietikäinen Author Vitae 《Pattern recognition》2009,42(11):2818-2827

While many works consider moving faces only as collections of frames and apply still image-based methods, recent developments indicate that excellent results can be obtained using texture-based spatiotemporal representations for describing and analyzing faces in videos. Inspired by the psychophysical findings which state that facial movements can provide valuable information to face analysis, and also by our recent success in using LBP (local binary patterns) for combining appearance and motion for dynamic texture analysis, this paper investigates the combination of facial appearance (the shape of the face) and motion (the way a person is talking and moving his/her facial features) for face analysis in videos. We propose and study an approach for spatiotemporal face and gender recognition from videos using an extended set of volume LBP features and a boosting scheme. We experiment with several publicly available video face databases and consider different benchmark methods for comparison. Our extensive experimental analysis clearly assesses the promising performance of the LBP-based spatiotemporal representations for describing and analyzing faces in videos. 相似文献

2.

非合作面部晃动情况下的心率检测

下载免费PDF全文

戚刚杨学志吴秀霍亮《中国图象图形学报》2017,22(1):126-136

目的心率是直接反映人体健康的重要指标之一,基于视频的非接触式心率检测在医疗健康领域具有广泛的应用前景。然而,现有的基于视频的方法不适用于复杂的现实场景,主要原因是没有考虑视频中目标晃动干扰和空间尺度特征,使得血液容积脉冲信号提取不准确,检测精度不尽人意。为了克服以上缺陷,提出一种抗人脸晃动干扰的非接触式心率检测方法。方法本文方法主要包含3个步骤：首先,针对目标晃动干扰人脸区域选择的问题,利用判别响应图拟合检测参考图像的人脸区域及主要器官特征点,在人脸跟踪时首次引入倾斜校正思想,输出晃动干扰抑制后的人脸视频;然后,结合空间尺度的差异,采用颜色放大方法对晃动干扰抑制后的人脸视频进行时空处理,提取干净的血液容积脉冲信号;最后,考虑到小样本问题,通过傅里叶系数迭代插值的频域分析方法估计心率。结果在人脸静止的合作情况以及人脸晃动的非合作情况下采集视频,对心率检测结果进行定量分析,本文方法在两种情况下的准确率分别为97.84%和97.30%,与经典和最新的方法相比,合作情况准确率提升大于1%,非合作情况准确率提升大于7%,表现了出色的性能。结论提出了一种基于人脸视频处理的心率检测方法,通过有效分析人脸的晃动干扰和尺度特性,提取到干净的血液容积脉冲信号,提高了心率检测的精度和鲁棒性。相似文献

3.

基于最大化子模和RRWM的视频协同分割 总被引：1，自引：1，他引：0

苏亮亮唐俊梁栋王年《自动化学报》2016,42(10):1532-1541

成对视频共同运动模式的协同分割指的是同时检测出两个相关视频中共有的行为模式,是计算机视觉研究的一个热点.本文提出了一种新的成对视频协同分割方法.首先,利用稠密轨迹方法对视频运动部分进行检测,并对运动轨迹进行特征表示;然后,引入子模优化方法对单视频内的运动轨迹进行聚类分析;接着采用基于重加权随机游走的图匹配方法对成对视频运动轨迹进行匹配,该方法对出格点、变形和噪声都具有很强的鲁棒性;同时根据图匹配结果实现运动轨迹的共显著性度量;最后,将所有轨迹分类成共同运动轨迹和异常运动轨迹的问题转化为基于图割的马尔科夫随机场的二值化标签问题.通过典型运动视频数据集的比较实验,其结果验证了本文方法的有效性. 相似文献

4.

Extracting representative motion flows for effective video retrieval

Zhe Zhao Bin Cui Gao Cong Zi Huang Heng Tao Shen 《Multimedia Tools and Applications》2012,58(3):687-711

In this paper, we propose a novel motion-based video retrieval approach to find desired videos from video databases through trajectory matching. The main component of our approach is to extract representative motion features from the video, which could be broken down to the following three steps. First, we extract the motion vectors from each frame of videos and utilize Harris corner points to compensate the effect of the camera motion. Second, we find interesting motion flows from frames using sliding window mechanism and a clustering algorithm. Third, we merge the generated motion flows and select representative ones to capture the motion features of videos. Furthermore, we design a symbolic based trajectory matching method for effective video retrieval. The experimental results show that our algorithm is capable to effectively extract motion flows with high accuracy and outperforms existing approaches for video retrieval. 相似文献

5.

人类面部重演方法综述

下载免费PDF全文

刘锦陈鹏王茜付晓蒙戴娇韩冀中《中国图象图形学报》2022,27(9):2629-2651

随着计算机视觉领域图像生成研究的发展,面部重演引起广泛关注,这项技术旨在根据源人脸图像的身份以及驱动信息提供的嘴型、表情和姿态等信息合成新的说话人图像或视频。面部重演具有十分广泛的应用,例如虚拟主播生成、线上授课、游戏形象定制、配音视频中的口型配准以及视频会议压缩等,该项技术发展时间较短,但是涌现了大量研究。然而目前国内外几乎没有重点关注面部重演的综述,面部重演的研究概述只是在深度伪造检测综述中以深度伪造的内容出现。鉴于此,本文对面部重演领域的发展进行梳理和总结。本文从面部重演模型入手,对面部重演存在的问题、模型的分类以及驱动人脸特征表达进行阐述,列举并介绍了训练面部重演模型常用的数据集及评估模型的评价指标,对面部重演近年研究工作进行归纳、分析与比较,最后对面部重演的演化趋势、当前挑战、未来发展方向、危害及应对策略进行了总结和展望。相似文献

6.

基于特征运动的表情人脸识别 总被引：3，自引：0，他引：3

下载免费PDF全文

余冰金连甫陈平《中国图象图形学报》2002,7(11):1139-1143

人脸像的面部表情识别一直是人脸识别的一个难点，为了提高表情人脸识别的鲁棒性，提出了一种基于特征运动的人脸识别方法，该方法首先利用块匹配的方法来确定表情人脸和无表情人脸之间的运动向量，然后利用主成分分析方法（PCA）从这些运动向量中，产生低维子空间，称之为特征运动空间，测试时，先将测试人脸与无表情人脸之间的运动向量投影到特征运动空间，再根据这个运动向量在特征运动空间里的残差进行人脸识别，同时还介绍了基于特征运动的个人模型方法和公共模型方法，实验结果证明，该新算法在表情人脸的识别上，优于特征脸方法，有非常高的识别率。相似文献

7.

Affection arousal based highlight extraction for soccer video

Zengkai Wang Junqing Yu Yunfeng He Tao Guan 《Multimedia Tools and Applications》2014,73(1):519-546

Highlight extraction, as one of the key technologies in soccer video retrieval and summarization, has great academic and application value. According to the principle that the observer’s affection state would fluctuate with the evolution of game process when watching soccer match video, a novel highlight extraction approach based on the improved affection arousal model is proposed. Compared with the existing works, our main contributions include the following. A novel feature – shot intensity is exploited to replace the motion activity, which greatly improves the computational performance of affection arousal model. Another helpful feature – replay factor is designed and successfully fused into the affection arousal model. This makes the affection arousal model reflect the variation of the true match process more accurately. In addition, event temporal transition pattern (ETTP) in soccer video is utilized to detect highlights boundaries effectively combined with the affection arousal curve. Experiments conducted on real-world soccer game videos have demonstrated the efficiency and effectiveness of the proposed approach. 相似文献

8.

A motion-based scene tree for browsing and retrieval of compressed videos

Haoran Yi Deepu RajanLiang-Tien Chia 《Information Systems》2006

This paper describes a fully automatic content-based approach for browsing and retrieval of MPEG-2 compressed video. The first step of the approach is the detection of shot boundaries based on motion vectors available from the compressed video stream. The next step involves the construction of a scene tree from the shots obtained earlier. The scene tree is shown to capture some semantic information as well as to provide a construct for hierarchical browsing of compressed videos. Finally, we build a new model for video similarity based on global as well as local motion associated with each node in the scene tree. To this end, we propose new approaches to camera motion and object motion estimation. The experimental results demonstrate that the integration of the above techniques results in an efficient framework for browsing and searching large video databases. 相似文献

9.

Partially-supervised learning from facial trajectories for face recognition in video surveillance

《Information Fusion》2015

Face recognition (FR) is employed in several video surveillance applications to determine if facial regions captured over a network of cameras correspond to a target individuals. To enroll target individuals, it is often costly or unfeasible to capture enough high quality reference facial samples a priori to design representative facial models. Furthermore, changes in capture conditions and physiology contribute to a growing divergence between these models and faces captured during operations. Adaptive biometrics seek to maintain a high level of performance by updating facial models over time using operational data. Adaptive multiple classifier systems (MCSs) have been successfully applied to video-to-video FR, where the face of each target individual is modeled using an ensemble of 2-class classifiers (trained using target vs. non-target samples). In this paper, a new adaptive MCS is proposed for partially-supervised learning of facial models over time based on facial trajectories. During operations, information from a face tracker and individual-specific ensembles is integrated for robust spatio-temporal recognition and for self-update of facial models. The tracker defines a facial trajectory for each individual that appears in a video, which leads to the recognition of a target individual if the positive predictions accumulated along a trajectory surpass a detection threshold for an ensemble. When the number of positive ensemble predictions surpasses a higher update threshold, then all target face samples from the trajectory are combined with non-target samples (selected from the cohort and universal models) to update the corresponding facial model. A learn-and-combine strategy is employed to avoid knowledge corruption during self-update of ensembles. In addition, a memory management strategy based on Kullback–Leibler divergence is proposed to rank and select the most relevant target and non-target reference samples to be stored in memory as the ensembles evolves. For proof-of-concept, a particular realization of the proposed system was validated with videos from Face in Action dataset. Initially, trajectories captured from enrollment videos are used for supervised learning of ensembles, and then videos from various operational sessions are presented to the system for FR and self-update with high-confidence trajectories. At a transaction level, the proposed approach outperforms baseline systems that do not adapt to new trajectories, and provides comparable performance to ideal systems that adapt to all relevant target trajectories, through supervised learning. Subject-level analysis reveals the existence of individuals for which self-updating ensembles with unlabeled facial trajectories provides a considerable benefit. Trajectory-level analysis indicates that the proposed system allows for robust spatio-temporal video-to-video FR, and may therefore enhance security and situation analysis in video surveillance. 相似文献

10.

Modeling and Animating Realistic Faces from Images 总被引：4，自引：0，他引：4

Pighin Frédéric Szeliski Richard Salesin David H. 《International Journal of Computer Vision》2002,50(2):143-169

We present a new set of techniques for modeling and animating realistic faces from photographs and videos. Given a set of face photographs taken simultaneously, our modeling technique allows the interactive recovery of a textured 3D face model. By repeating this process for several facial expressions, we acquire a set of face models that can be linearly combined to express a wide range of expressions. Given a video sequence, this linear face model can be used to estimate the face position, orientation, and facial expression at each frame. We illustrate these techniques on several datasets and demonstrate robust estimations of detailed face geometry and motion. 相似文献

11.

Recognition of facial expressions and measurement of levels of interest from video 总被引：2，自引：0，他引：2

《Multimedia, IEEE Transactions on》2006,8(3):500-508

This paper presents a spatio-temporal approach in recognizing six universal facial expressions from visual data and using them to compute levels of interest. The classification approach relies on a two-step strategy on the top of projected facial motion vectors obtained from video sequences of facial expressions. First a linear classification bank was applied on projected optical flow vectors and decisions made by the linear classifiers were coalesced to produce a characteristic signature for each universal facial expression. The signatures thus computed from the training data set were used to train discrete hidden Markov models (HMMs) to learn the underlying model for each facial expression. The performances of the proposed facial expressions recognition were computed using five fold cross-validation on Cohn-Kanade facial expressions database consisting of 488 video sequences that includes 97 subjects. The proposed approach achieved an average recognition rate of 90.9% on Cohn-Kanade facial expressions database. Recognized facial expressions were mapped to levels of interest using the affect space and the intensity of motion around apex frame. Computed level of interest was subjectively analyzed and was found to be consistent with "ground truth" information in most of the cases. To further illustrate the efficacy of the proposed approach, and also to better understand the effects of a number of factors that are detrimental to the facial expression recognition, a number of experiments were conducted. The first empirical analysis was conducted on a database consisting of 108 facial expressions collected from TV broadcasts and labeled by human coders for subsequent analysis. The second experiment (emotion elicitation) was conducted on facial expressions obtained from 21 subjects by showing the subjects six different movies clips chosen in a manner to arouse spontaneous emotional reactions that would produce natural facial expressions. 相似文献

12.

Detecting Objectionable Videos

WANG Qian HU Wei-Ming TAN Tie-Niu 《自动化学报》2005,(2)

This paper addresses the problem of detecting objectionable videos, which has never been carefully studied before. Our method can be efficiently used to filter objectionable videos on Internet. One tensor based key-frame selection algorithm, one cube based color model and one objectionable video estimation algorithm are presented. The key frame selection is based on motion analysis using the three-dimensional structure tensor. Then the cube based color model is employed to detect skin color in each key frame. Finally, the video estimation algorithm is applied to estimate objectionable degree in videos. Experimental results on a variety of real-world videos downloaded from Internet show that this method is promising. 相似文献

13.

敏感视频检测

王谦胡卫明谭铁牛《自动化学报》2005,31(2):280-286

This paper addresses the problem of detecting objectionable videos, which has never been carefully studied before. Our method can be efficiently used to filter objectionable videos on Internet. One tensor based key-frame selection algorithm, one cube based color model and one objectionable video estimation algorithm are presented. The key frame selection is based on motion analysis using the three-dimensional structure tensor. Then the cube based color model is employed to detect skin color in each key frame. Finally, the video estimation algorithm is applied to estimate objectionable degree in videos. Experimental results on a variety of real-world videos downloaded from Internet show that this method is promising. 相似文献

14.

Activity based surveillance video content modelling

Tao Xiang Shaogang Gong 《Pattern recognition》2008,41(7):2309-2326

This paper tackles the problem of surveillance video content modelling. Given a set of surveillance videos, the aims of our work are twofold: firstly a continuous video is segmented according to the activities captured in the video; secondly a model is constructed for the video content, based on which an unseen activity pattern can be recognised and any unusual activities can be detected. To segment a video based on activity, we propose a semantically meaningful video content representation method and two segmentation algorithms, one being offline offering high accuracy in segmentation, and the other being online enabling real-time performance. Our video content representation method is based on automatically detected visual events (i.e. ‘what is happening in the scene’). This is in contrast to most previous approaches which represent video content at the signal level using image features such as colour, motion and texture. Our segmentation algorithms are based on detecting breakpoints on a high-dimensional video content trajectory. This differs from most previous approaches which are based on shot change detection and shot grouping. Having segmented continuous surveillance videos based on activity, the activity patterns contained in the video segments are grouped into activity classes and a composite video content model is constructed which is capable of generalising from a small training set to accommodate variations in unseen activity patterns. A run-time accumulative unusual activity measure is introduced to detect unusual behaviour while usual activity patterns are recognised based on an online likelihood ratio test (LRT) method. This ensures robust and reliable activity recognition and unusual activity detection at the shortest possible time once sufficient visual evidence has become available. Comparative experiments have been carried out using over 10 h of challenging outdoor surveillance video footages to evaluate the proposed segmentation algorithms and modelling approach. 相似文献

15.

基于时序性面部动作信息的驾驶员状态检测框架

崔子岩汪剑鸣金光浩《计算机应用研究》2019,36(11)

通过网络摄像头获取驾驶员面部视频输入网络进行检测的方法主要通过分析驾驶员口型等面部表情来判断是否疲劳驾驶,但说话等很多类似的状态也被误检为疲劳。针对以上问题提出了一种基于时序性面部动作信息的检测框架,对驾驶员状态进行检测,从而提高检测准确率、降低误检率。该框架通过检测视频中的脸部轮廓,提取脸部的多种特征,形成面部动作单元;通过训练对应的LSTM网络,形成时序性的面部动作单元,根据其相关性进行多种动作单元融合,检测最终驾驶员的状态。在公共YawDD数据集上的检测结果表明,相比于现有的方法,该检测方法的准确率提高到了93.1%,同时大幅降低了疲劳状态的误检率。相似文献

16.

A Road Departure Warning System Based on Video Motion Analysis and Fuzzy Logic

下载免费PDF全文

Juan Giralt Juan Moreno‐Garcia Luis Jimenez‐Linares Luis Rodriguez‐Benitez 《国际智能系统杂志》2017,32(8):830-842

相似文献

17.

Incremental learning patch-based bag of facial words representation for face recognition in videos

Chao Wang Yunhong Wang Zhaoxiang Zhang Yiding Wang 《Multimedia Tools and Applications》2014,72(3):2439-2467

相似文献

18.

基于感兴趣区域的头像视频前处理方法

曾鸿军沈燕飞王毅《计算机工程与应用》2017,53(6):188-192

在头像视频通信中,网络用户的关注度一般集中在人脸部分区域。根据MARR视觉原理,在保证视频通信主观质量的前提下,提出了一种基于感兴趣区域的头像视频前处理方法,能有效地提高视频的通信效率。该方法首先利用人脸检测算法对头像视频进行人脸检测,然后利用双边滤波算法对人脸之外的背景区域进行滤波操作,最后用H.264编码算法对处理后的视频图像进行编码传输。实验结果表明,所提出的方法能够有效地提高视频编码效率,最高可以达到28.571%。相似文献

19.

Facial expression recognition using tracked facial actions: Classifier performance analysis

Fadi Dornaika Abdelmalik Moujahid Bogdan Raducanu 《Engineering Applications of Artificial Intelligence》2013,26(1):467-477

In this paper, we address the analysis and recognition of facial expressions in continuous videos. More precisely, we study classifiers performance that exploit head pose independent temporal facial action parameters. These are provided by an appearance-based 3D face tracker that simultaneously provides the 3D head pose and facial actions. The use of such tracker makes the recognition pose- and texture-independent. Two different schemes are studied. The first scheme adopts a dynamic time warping technique for recognizing expressions where training data are given by temporal signatures associated with different universal facial expressions. The second scheme models temporal signatures associated with facial actions with fixed length feature vectors (observations), and uses some machine learning algorithms in order to recognize the displayed expression. Experiments quantified the performance of different schemes. These were carried out on CMU video sequences and home-made video sequences. The results show that the use of dimension reduction techniques on the extracted time series can improve the classification performance. Moreover, these experiments show that the best recognition rate can be above 90%. 相似文献

20.

Mirror MoCap: Automatic and efficient capture of dense 3D facial motion parameters from video

I-Chen Lin Ming Ouhyoung 《The Visual computer》2005,21(6):355-372

In this paper, we present an automatic and efficient approach to the capture of dense facial motion parameters, which extends our previous work of 3D reconstruction from mirror-reflected multiview video. To narrow search space and rapidly generate 3D candidate position lists, we apply mirrored-epipolar bands. For automatic tracking, we utilize spatial proximity of facial surfaces and temporal coherence to find the best trajectories and rectify statuses of missing and false tracking. More than 300 markers on a subject’s face are tracked from video at a process speed of 9.2 frames per second (fps) on a regular PC. The estimated 3D facial motion trajectories have been applied to our facial animation system and can be used for facial motion analysis. 相似文献