期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Free viewpoint action recognition using motion history volumes 总被引：5，自引：0，他引：5

Daniel Weinland Remi Ronfard Edmond Boyer 《Computer Vision and Image Understanding》2006,104(2-3):249

Action recognition is an important and challenging topic in computer vision, with many important applications including video surveillance, automated cinematography and understanding of social interaction. Yet, most current work in gesture or action interpretation remains rooted in view-dependent representations. This paper introduces Motion History Volumes (MHV) as a free-viewpoint representation for human actions in the case of multiple calibrated, and background-subtracted, video cameras. We present algorithms for computing, aligning and comparing MHVs of different actions performed by different people in a variety of viewpoints. Alignment and comparisons are performed efficiently using Fourier transforms in cylindrical coordinates around the vertical axis. Results indicate that this representation can be used to learn and recognize basic human action classes, independently of gender, body size and viewpoint. 相似文献

2.

Efficient 3D reconstruction for face recognition

Dalong Jiang Author Vitae Yuxiao Hu Author Vitae Author Vitae Lei Zhang Author Vitae Hongjiang Zhang Author Vitae Author Vitae 《Pattern recognition》2005,38(6):787-798

Face recognition with variant pose, illumination and expression (PIE) is a challenging problem. In this paper, we propose an analysis-by-synthesis framework for face recognition with variant PIE. First, an efficient two-dimensional (2D)-to-three-dimensional (3D) integrated face reconstruction approach is introduced to reconstruct a personalized 3D face model from a single frontal face image with neutral expression and normal illumination. Then, realistic virtual faces with different PIE are synthesized based on the personalized 3D face to characterize the face subspace. Finally, face recognition is conducted based on these representative virtual faces. Compared with other related work, this framework has following advantages: (1) only one single frontal face is required for face recognition, which avoids the burdensome enrollment work; (2) the synthesized face samples provide the capability to conduct recognition under difficult conditions like complex PIE; and (3) compared with other 3D reconstruction approaches, our proposed 2D-to-3D integrated face reconstruction approach is fully automatic and more efficient. The extensive experimental results show that the synthesized virtual faces significantly improve the accuracy of face recognition with changing PIE. 相似文献

3.

Real-time 2D+3D facial action and expression recognition

Filareti Tsalakanidou^{Author Vitae} Sotiris Malassiotis Author Vitae 《Pattern recognition》2010,43(5):1763-1775

This paper presents a completely automated facial action and facial expression recognition system using 2D+3D images recorded in real-time by a structured light sensor. It is based on local feature tracking and rule-based classification of geometric, appearance and surface curvature measurements. Several experiments conducted under relatively non-controlled conditions demonstrate the accuracy and robustness of the approach. 相似文献

4.

Efficient encoding of video descriptor distribution for action recognition

Saremi Mehrin Yaghmaee Farzin 《Multimedia Tools and Applications》2020,79(9-10):6025-6043

相似文献

5.

Common-sense reasoning for human action recognition

Jesús Martínez del Rincón Maria J. Santofimia Jean-Christophe Nebel 《Pattern recognition letters》2013

This paper presents a novel method that leverages reasoning capabilities in a computer vision system dedicated to human action recognition. The proposed methodology is decomposed into two stages. First, a machine learning based algorithm – known as bag of words – gives a first estimate of action classification from video sequences, by performing an image feature analysis. Those results are afterward passed to a common-sense reasoning system, which analyses, selects and corrects the initial estimation yielded by the machine learning algorithm. This second stage resorts to the knowledge implicit in the rationality that motivates human behaviour. Experiments are performed in realistic conditions, where poor recognition rates by the machine learning techniques are significantly improved by the second stage in which common-sense knowledge and reasoning capabilities have been leveraged. This demonstrates the value of integrating common-sense capabilities into a computer vision pipeline. 相似文献

6.

Efficient action recognition via local position offset of 3D skeletal body joints

Guoliang Lu Yiqi Zhou Xueyong Li Mineichi Kudo 《Multimedia Tools and Applications》2016,75(6):3479-3494

To accurately recognize human actions in less computational time is one important aspect for practical usage. This paper presents an efficient framework for recognizing actions by a RGB-D camera. The novel action patterns in the framework are extracted via computing position offset of 3D skeletal body joints locally in the temporal extent of video. Action recognition is then performed by assembling these offset vectors using a bag-of-words framework and also by considering the spatial independence of body joints. We conducted extensive experiments on two benchmarking datasets: UCF dataset and MSRC-12 dataset, to demonstrate the effectiveness of the proposed framework. Experimental results suggest that the proposed framework 1) is very fast to extract action patterns and very simple in implementation; and 2) can achieve a comparable or a better performance in recognition accuracy compared with the state-of-the-art approaches. 相似文献

7.

Multi-modality learning for human action recognition

Ren Ziliang Zhang Qieshi Gao Xiangyang Hao Pengyi Cheng Jun 《Multimedia Tools and Applications》2021,80(11):16185-16203

Multimedia Tools and Applications - The multi-modality based human action recognition is an increasing topic. Multi-modality can provide more abundant and complementary information than single... 相似文献

8.

3D human action analysis and recognition through GLAC descriptor on 2D motion and static posture images

Bulbul Mohammad Farhad Islam Saiful Ali Hazrat 《Multimedia Tools and Applications》2019,78(15):21085-21111

In this paper, we present an approach for identification of actions within depth action videos. First, we process the video to get motion history images (MHIs) and static history images (SHIs) corresponding to an action video based on the use of 3D Motion Trail Model (3DMTM). We then characterize the action video by extracting the Gradient Local Auto-Correlations (GLAC) features from the SHIs and the MHIs. The two sets of features i.e., GLAC features from MHIs and GLAC features from SHIs are concatenated to obtain a representation vector for action. Finally, we perform the classification on all the action samples by using the l2-regularized Collaborative Representation Classifier (l2-CRC) to recognize different human actions in an effective way. We perform evaluation of the proposed method on three action datasets, MSR-Action3D, DHA and UTD-MHAD. Through experimental results, we observe that the proposed method performs superior to other approaches.

相似文献

9.

Cross-domain human action recognition

Bian W Tao D Rui Y 《IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics》2012,42(2):298-307

Conventional human action recognition algorithms cannot work well when the amount of training videos is insufficient. We solve this problem by proposing a transfer topic model (TTM), which utilizes information extracted from videos in the auxiliary domain to assist recognition tasks in the target domain. The TTM is well characterized by two aspects: 1) it uses the bag-of-words model trained from the auxiliary domain to represent videos in the target domain; and 2) it assumes each human action is a mixture of a set of topics and uses the topics learned from the auxiliary domain to regularize the topic estimation in the target domain, wherein the regularization is the summation of Kullback-Leibler divergences between topic pairs of the two domains. The utilization of the auxiliary domain knowledge improves the generalization ability of the learned topic model. Experiments on Weizmann and KTH human action databases suggest the effectiveness of the proposed TTM for cross-domain human action recognition. 相似文献

10.

Robust relative attributes for human action recognition

Zhong Zhang Chunheng Wang Baihua Xiao Wen Zhou Shuang Liu 《Pattern Analysis & Applications》2015,18(1):157-171

相似文献

11.

基于特征融合的人体行为识别算法

下载免费PDF全文

周霞柳絮青王宪孙子文邓源《计算机工程与应用》2013,49(7):162-166

针对HOG特征在人体行为识别中仅仅表征人体局部梯度特征的不足,提出了一种扩展HOG（ExHOG）特征与CLBP特征相融合的人体行为识别方法。用背景差分法从视频中提取出完整的人体运动序列,并提取出扩展梯度方向直方图ExHOG及完备局部二值模式CLBP两种互补特征;利用K-L变换将这两种互补特征融合生成一个分类能力更强的行为特征;采用径向基函数神经网络RBFNN对行为特征进行识别分类。在KTH和Weizman行为公共数据库上进行了多组实验,结果表明提出的方法能够有效地识别人体运动类别。相似文献

12.

Slow feature analysis for human action recognition

Zhang Z Tao D 《IEEE transactions on pattern analysis and machine intelligence》2012,34(3):436-450

Slow Feature Analysis (SFA) extracts slowly varying features from a quickly varying input signal. It has been successfully applied to modeling the visual receptive fields of the cortical neurons. Sufficient experimental results in neuroscience suggest that the temporal slowness principle is a general learning principle in visual perception. In this paper, we introduce the SFA framework to the problem of human action recognition by incorporating the discriminative information with SFA learning and considering the spatial relationship of body parts. In particular, we consider four kinds of SFA learning strategies, including the original unsupervised SFA (U-SFA), the supervised SFA (S-SFA), the discriminative SFA (D-SFA), and the spatial discriminative SFA (SD-SFA), to extract slow feature functions from a large amount of training cuboids which are obtained by random sampling in motion boundaries. Afterward, to represent action sequences, the squared first order temporal derivatives are accumulated over all transformed cuboids into one feature vector, which is termed the Accumulated Squared Derivative (ASD) feature. The ASD feature encodes the statistical distribution of slow features in an action sequence. Finally, a linear support vector machine (SVM) is trained to classify actions represented by ASD features. We conduct extensive experiments, including two sets of control experiments, two sets of large scale experiments on the KTH and Weizmann databases, and two sets of experiments on the CASIA and UT-interaction databases, to demonstrate the effectiveness of SFA for human action recognition. Experimental results suggest that the SFA-based approach (1) is able to extract useful motion patterns and improves the recognition performance, (2) requires less intermediate processing steps but achieves comparable or even better performance, and (3) has good potential to recognize complex multiperson activities. 相似文献

13.

Deep spatiotemporal LSTM network with temporal pattern feature for 3D human action recognition

Yirui Wu Lianglei Wei Yucong Duan 《Computational Intelligence》2019,35(3):535-554

相似文献

14.

多模态零样本人体动作识别

下载免费PDF全文

吕露露黄毅高君宇杨小汕徐常胜《中国图象图形学报》2021,26(7):1658-1667

目的在人体行为识别算法的研究领域,通过视频特征实现零样本识别的研究越来越多。但是,目前大部分研究是基于单模态数据展开的,关于多模态融合的研究还较少。为了研究多种模态数据对零样本人体动作识别的影响,本文提出了一种基于多模态融合的零样本人体动作识别（zero-shot human action recognition framework based on multimodel fusion, ZSAR-MF）框架。方法本文框架主要由传感器特征提取模块、分类模块和视频特征提取模块组成。具体来说,传感器特征提取模块使用卷积神经网络（convolutional neural network, CNN）提取心率和加速度特征;分类模块利用所有概念（传感器特征、动作和对象名称）的词向量生成动作类别分类器;视频特征提取模块将每个动作的属性、对象分数和传感器特征映射到属性—特征空间中,最后使用分类模块生成的分类器对每个动作的属性和传感器特征进行评估。结果本文实验在Stanford-ECM数据集上展开,对比结果表明本文ZSAR-MF模型比基于单模态数据的零样本识别模型在识别准确率上提高了4 %左右。结论本文所提出的基于多模态融合的零样本人体动作识别框架,有效地融合了传感器特征和视频特征,并显著提高了零样本人体动作识别的准确率。相似文献

15.

Multi-cue based 3D residual network for action recognition

Zong Ming Wang Ruili Chen Zhe Wang Maoli Wang Xun Potgieter Johan 《Neural computing & applications》2021,33(10):5167-5181

Neural Computing and Applications - Convolutional neural network (CNN) is a natural structure for video modelling that has been successfully applied in the field of action recognition. The existing... 相似文献

16.

面向行为识别的拉普拉斯特征映射算法的改进

金成彬崔荣一金小峰《计算机应用研究》2014,31(12)

提出了一种面向行为识别的拉普拉斯特征映射算法的改进方法.首先,将Kinect提供的关节点数据作为姿态特征,采用Levenstein距离改进流形学习算法中的拉普拉斯特征映射算法,并映射到二维空间得到待识别行为的嵌入空间;其次,结合待识别行为的嵌入空间和训练数据建立先验模型;最后,通过重新设计的粒子动态模型和观察模型,采用粒子滤波算法进行行为识别.实验结果表明,该方法可以对重复动作、遮挡,以及动作幅度和速度都有明显差异的行为进行较好的识别,总体识别率达到92.4％. 相似文献

17.

Exploring trace transform for robust human action recognition

Georgios Goudelis Konstantinos Karpouzis Stefanos Kollias 《Pattern recognition》2013,46(12):3238-3248

Machine based human action recognition has become very popular in the last decade. Automatic unattended surveillance systems, interactive video games, machine learning and robotics are only few of the areas that involve human action recognition. This paper examines the capability of a known transform, the so-called Trace, for human action recognition and proposes two new feature extraction methods based on the specific transform. The first method extracts Trace transforms from binarized silhouettes, representing different stages of a single action period. A final history template composed from the above transforms, represents the whole sequence containing much of the valuable spatio-temporal information contained in a human action. The second, involves Trace for the construction of a set of invariant features that represent the action sequence and can cope with variations usually appeared in video capturing. The specific method takes advantage of the natural specifications of the Trace transform, to produce noise robust features that are invariant to translation, rotation, scaling and are effective, simple and fast to create. Classification experiments performed on two well known and challenging action datasets (KTH and Weizmann) using Radial Basis Function (RBF) Kernel SVM provided very competitive results indicating the potentials of the proposed techniques. 相似文献

18.

Multiple scale-specific representations for improved human action recognition

Amir H. Shabani John S. Zelek David A. Clausi 《Pattern recognition letters》2013

Human action recognition in video is important in many computer vision applications such as automated surveillance. Human actions can be compactly encoded using a sparse set of local spatio-temporal salient features at different scales. The existing bottom-up methods construct a single dictionary of action primitives from the joint features of all scales and hence, a single action representation. This representation cannot fully exploit the complementary characteristics of the motions across different scales. To address this problem, we introduce the concept of learning multiple dictionaries of action primitives at different resolutions and consequently, multiple scale-specific representations for a given video sample. Using a decoupled fusion of multiple representations, we improved the human classification accuracy of realistic benchmark databases by about 5%

5 %

, compared with the state-of-the art methods. 相似文献

19.

Boosted multi-class semi-supervised learning for human action recognition

Tianzhu Zhang Si Liu Changsheng Xu Hanqing Lu 《Pattern recognition》2011,44(10-11):2334-2342

Human action recognition is a challenging task due to significant intra-class variations, occlusion, and background clutter. Most of the existing work use the action models based on statistic learning algorithms for classification. To achieve good performance on recognition, a large amount of the labeled samples are therefore required to train the sophisticated action models. However, collecting labeled samples is labor-intensive. To tackle this problem, we propose a boosted multi-class semi-supervised learning algorithm in which the co-EM algorithm is adopted to leverage the information from unlabeled data. Three key issues are addressed in this paper. Firstly, we formulate the action recognition in a multi-class semi-supervised learning problem to deal with the insufficient labeled data and high computational expense. Secondly, boosted co-EM is employed for the semi-supervised model construction. To overcome the high dimensional feature space, weighted multiple discriminant analysis (WMDA) is used to project the features into low dimensional subspaces in which the Gaussian mixture models (GMM) are trained and boosting scheme is used to integrate the subspace models. Thirdly, we present the upper bound of the training error in multi-class framework, which is able to guide the novel classifier construction. In theory, the proposed solution is proved to minimize this upper error bound. Experimental results have shown good performance on public datasets. 相似文献

20.

Learning multi-level features for sensor-based human action recognition

《Pervasive and Mobile Computing》2017

相似文献