首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
In this paper, we utilize a line based pose representation to recognize human actions in videos. We represent the pose in each frame by employing a collection of line-pairs, so that limb and joint movements are better described and the geometrical relationships among the lines forming the human figure are captured. We contribute to the literature by proposing a new method that matches line-pairs of two poses to compute the similarity between them. Moreover, to encapsulate the global motion information of a pose sequence, we introduce line-flow histograms, which are extracted by matching line segments in consecutive frames. Experimental results on Weizmann and KTH datasets emphasize the power of our pose representation, and show the effectiveness of using pose ordering and line-flow histograms together in grasping the nature of an action and distinguishing one from the others.  相似文献   

2.
Although multiple methods have been proposed for human action recognition, the existing multi-view approaches cannot well discover meaningful relationship among multiple action categories from different views. To handle this problem, this paper proposes an multi-view learning approach for multi-view action recognition. First, the proposed method leverages the popular visual representation method, bag-of-visual-words (BoVW)/fisher vector (FV), to represent individual videos in each view. Second, the sparse coding algorithm is utilized to transfer the low-level features of various views into the discriminative and high-level semantics space. Third, we employ the multi-task learning (MTL) approach for joint action modeling and discovery of latent relationship among different action categories. The extensive experimental results on M2I and IXMAS datasets have demonstrated the effectiveness of our proposed approach. Moreover, the experiments further demonstrate that the discovered latent relationship can benefit multi-view model learning to augment the performance of action recognition.  相似文献   

3.
Human actions can be considered as a sequence of body poses over time, usually represented by coordinates corresponding to human skeleton models. Recently, a variety of low-cost devices have been released, able to produce markerless real time pose estimation. Nevertheless, limitations of the incorporated RGB-D sensors can produce inaccuracies, necessitating the utilization of alternative representation and classification schemes in order to boost performance. In this context, we propose a method for action recognition where skeletal data are initially processed in order to obtain robust and invariant pose representations and then vectors of dissimilarities to a set of prototype actions are computed. The task of recognition is performed in the dissimilarity space using sparse representation. A new publicly available dataset is introduced in this paper, created for evaluation purposes. The proposed method was also evaluated on other public datasets, and the results are compared to those of similar methods.  相似文献   

4.
Much of the existing work on action recognition combines simple features with complex classifiers or models to represent an action. Parameters of such models usually do not have any physical meaning nor do they provide any qualitative insight relating the action to the actual motion of the body or its parts. In this paper, we propose a new representation of human actions called sequence of the most informative joints (SMIJ), which is extremely easy to interpret. At each time instant, we automatically select a few skeletal joints that are deemed to be the most informative for performing the current action based on highly interpretable measures such as the mean or variance of joint angle trajectories. We then represent the action as a sequence of these most informative joints. Experiments on multiple databases show that the SMIJ representation is discriminative for human action recognition and performs better than several state-of-the-art algorithms.  相似文献   

5.
6.
人体动作可以由人体不同局部区域的动作语义的组合来描述,由此提出了一种基于局部语义的人体动作识别方法。首先,该方法定义了一组局部动作语义用于描述人体局部区域运动的视觉表现,并对每一个局部语义进行建模。然后,通过这些局部动作语义的贡献值组合来进行构建动作表征。最后,将基于局部动作语义的动作表征输入支持向量机构建动作模型,进行动作分类。比较实验说明所提出方法能够较好地识别现实场景下的人体动作。  相似文献   

7.
Due to the exponential growth of the video data stored and uploaded in the Internet websites especially YouTube, an effective analysis of video actions has become very necessary. In this paper, we tackle the challenging problem of human action recognition in realistic video sequences. The proposed system combines the efficiency of the Bag-of-visual-Words strategy and the power of graphs for structural representation of features. It is built upon the commonly used Space–Time Interest Points (STIP) local features followed by a graph-based video representation which models the spatio-temporal relations among these features. The experiments are realized on two challenging datasets: Hollywood2 and UCF YouTube Action. The experimental results show the effectiveness of the proposed method.  相似文献   

8.
The conditional random fields (CRFs) model, as one of the most successful discriminative approaches, has received renewed attention recently for human action recognition. However, the existing CRFs model formulations have typically limited capabilities to capture higher order dependencies among the given states and deeper intermediate representations within the target states, which are potentially useful and significant to model the complex action recognition scenarios. In this paper, we present a novel double-layer CRFs (DL-CRFs) model for human action recognition in the graphical model framework. In problem formulation, an augmented top layer as the high-level and global variable is designed in the DL-CRFs model, with the global perception perspective to acquire higher-order dependencies between the target states. Meanwhile, we exploit the additional intermediate variables to explicitly perceive the intermediate representations between the target states and observation features. We then propose to decompose the DL-CRFs model in two parts, that are the top linear-chain CRFs model and the bottom one, in order to execute ease inference both during the parameter learning phase and test time. Lastly, the assumed DL-CRFs model parameters can be learned with block-coordinate primal–dual Frank–Wolfe algorithm with gap sampling scheme in a structured support vector machine framework. Experimental results and discussions on two public benchmark datasets demonstrate that the proposed approach performs better than other state-of-the-art methods in several evaluation criteria.  相似文献   

9.
10.
Constructing the bag-of-features model from Space–time interest points (STIPs) has been successfully utilized for human action recognition. However, how to eliminate a large number of irrelevant STIPs for representing a specific action in realistic scenarios as well as how to select discriminative codewords for effective bag-of-features model still need to be further investigated. In this paper, we propose to select more representative codewords based on our pruned interest points algorithm so as to reduce computational cost as well as improve recognition performance. By taking human perception into account, attention based saliency map is employed to choose salient interest points which fall into salient regions, since visual saliency can provide strong evidence for the location of acting subjects. After salient interest points are identified, each human action is represented with the bag-of-features model. In order to obtain more discriminative codewords, an unsupervised codeword selection algorithm is utilized. Finally, the Support Vector Machine (SVM) method is employed to perform human action recognition. Comprehensive experimental results on the widely used and challenging Hollywood-2 Human Action (HOHA-2) dataset and YouTube dataset demonstrate that our proposed method is computationally efficient while achieving improved performance in recognizing realistic human actions.  相似文献   

11.
3D skeleton sequences contain more effective and discriminative information than RGB video and are more suitable for human action recognition. Accurate extraction of human skeleton information is the key to the high accuracy of action recognition. Considering the correlation between joint points, in this work, we first propose a skeleton feature extraction method based on complex network. The relationship between human skeleton points in each frame is coded as a network. The changes of action over time are described by a time series network composed of skeleton points. Network topology attributes are used as feature vectors, complex network coding and LSTM are combined to recognize human actions. The method was verified on the NTU RGB + D60, MSR Action3D and UTKinect-Action3D dataset, and have achieved good performance, respectively. It shows that the method of extracting skeleton features based on complex network can properly identify different actions. This method that considers the temporal information and the relationship between skeletons at the same time plays an important role in the accurate recognition of human actions.  相似文献   

12.
Detecting and recognizing human action in natural scenarios, such as indoor and outdoor, is a significant technique in computer vision and intelligent systems, which is widely applied in video surveillance, pedestrian tracking and human-computer interaction. Conventional approaches have been proposed based on various features and achieved impressive performance. However, these methods failed to cope with partial occlusion and changes of posture. In order to address these limitations, we propose a novel human action recognition method. More specifically, in order to capture image spatial composition, we leverage a three-level spatial pyramid feature extraction scheme, where each pyramid is encoded by local features. Thereafter, regions generated by a proposal algorithm are fed into a dual-aggregation net for deep representation extraction. Afterwards, both local features and deep features are fused to describe each image. To describe human action category, we design a metric CXQDA based on Cosine measure and Cross-view Quadratic Discriminant Analysis (XQDA) to calculate the similarity among different action categories. Experimental results demonstrate that our proposed method can effectively cope with object scale variations, partial occlusion and achieve competitive performance.  相似文献   

13.
随着视频数据的海量增长,在人体动作识别领域,单一特征的运用已经不能满足现有的对复杂动作,复杂环境的识别问题。基于此,提出了一种利用多示例将多种动作特征融合来识别人体动作的方法,通过利用传统的多示例学习中包的概念,将同一个样本的不同的特征表征作为在同一个包下的示例.将同一类动作的所有包作为正包,其它种类的动作作为负包,来学习模型进行分类。通过在常用数据库上的测试取得了较好的结果。  相似文献   

14.
提出一种改进的稀疏表示的手写数字识别方法。首先将样本字符块训练成过完备字典,然后通过改进的基于L1/2正则化算法进行系数分解,最后通过求重构图像与原始图像的残差进行分类。实验采用标准的数字数据库MNIST进行手写数字识别,改进的稀疏表示方法较其他方法能够有较高识别率,达到98%以上,并且具有很好的噪声鲁棒性。  相似文献   

15.
In the action recognition, a proper frame sampling method can not only reduce redundant video information, but also improve the accuracy of action recognition. In this paper, an action density based non-isometric frame sampling method, namely NFS, is proposed to discard the redundant video information and sample the rational frames in videos for neural networks to achieve great accuracy on human action recognition, in which action density is introduced in our method to indicate the intensity of actions in videos. Particularly, the action density determination mechanism, focused-clips division mechanism, and reinforcement learning based frame sampling (RLFS) mechanism are proposed in NFS method. Via the evaluations with various neural networks and datasets, our results show that the proposed NFS method can achieve great effectiveness in frame sampling and can assist in achieving better accuracy on action recognition in comparison with existing methods.  相似文献   

16.
基于稠密轨迹特征的红外人体行为识别   总被引:4,自引:2,他引:2  
提出了一种使用基于稠密轨迹(DT)融合特征的红外人体行为识别(HAR)方法。主要流程如下:1)通过稠密采样获得输入行为视频的DT;2)计算DT的方向梯度直方图(HOG)、光流直方图(HOF)和运动边界描述子(MBH)3个描述子;3)基于DT的HOG、HOF和MBH,并采取词袋库模型和一定的融合策略,构建融合特征;4)以第3步所构建的融合特征为k近邻分类器(k-NN)的输入,完成人体HAR。实验以IADB红外行为库为研究对象,正确识别率达到96.7%。结果表明,提出的特征融合及识别方法能有效地对红外人体行为进行识别。  相似文献   

17.
In this paper, we learn explicit representations for dynamic shape manifolds of moving humans for the task of action recognition. We exploit locality preserving projections (LPP) for dimensionality reduction, leading to a low-dimensional embedding of human movements. Given a sequence of moving silhouettes associated to an action video, by LPP, we project them into a low-dimensional space to characterize the spatiotemporal property of the action, as well as to preserve much of the geometric structure. To match the embedded action trajectories, the median Hausdorff distance or normalized spatiotemporal correlation is used for similarity measures. Action classification is then achieved in a nearest-neighbor framework. To evaluate the proposed method, extensive experiments have been carried out on a recent dataset including ten actions performed by nine different subjects. The experimental results show that the proposed method is able to not only recognize human actions effectively, but also considerably tolerate some challenging conditions, e.g., partial occlusion, low-quality videos, changes in viewpoints, scales, and clothes; within-class variations caused by different subjects with different physical build; styles of motion; etc.  相似文献   

18.
In this paper a new classification method called locality-sensitive kernel sparse representation classification (LS-KSRC) is proposed for face recognition. LS-KSRC integrates both sparsity and data locality in the kernel feature space rather than in the original feature space. LS-KSRC can learn more discriminating sparse representation coefficients for face recognition. The closed form solution of the l1-norm minimization problem for LS-KSRC is also presented. LS-KSRC is compared with kernel sparse representation classification (KSRC), sparse representation classification (SRC), locality-constrained linear coding (LLC), support vector machines (SVM), the nearest neighbor (NN), and the nearest subspace (NS). Experimental results on three benchmarking face databases, i.e., the ORL database, the Extended Yale B database, and the CMU PIE database, demonstrate the promising performance of the proposed method for face recognition, outperforming the other used methods.  相似文献   

19.
Gabor wavelet representation for 3-D object recognition   总被引:8,自引:0,他引:8  
This paper presents a model-based object recognition approach that uses a Gabor wavelet representation. The key idea is to use magnitude, phase, and frequency measures of the Gabor wavelet representation in an innovative flexible matching approach that can provide robust recognition. The Gabor grid, a topology-preserving map, efficiently encodes both signal energy and structural information of an object in a sparse multiresolution representation. The Gabor grid subsamples the Gabor wavelet decomposition of an object model and is deformed to allow the indexed object model match with similar representation obtained using image data. Flexible matching between the model and the image minimizes a cost function based on local similarity and geometric distortion of the Gabor grid. Grid erosion and repairing is performed whenever a collapsed grid, due to object occlusion, is detected. The results on infrared imagery are presented, where objects undergo rotation, translation, scale, occlusion, and aspect variations under changing environmental conditions.  相似文献   

20.
Analysis of human behavior through visual information has been one of the active research areas in computer vision community during the last decade. Vision-based human action recognition (HAR) is a crucial part of human behavior analysis, which is also of great demand in a wide range of applications. HAR was initially performed via images from a conventional camera; however, depth sensors have recently embedded as an additional informative resource to cameras. In this paper, we have proposed a novel approach to largely improve the performance of human action recognition using Complex Network-based feature extraction from RGB-D information. Accordingly, the constructed complex network is employed for single-person action recognition from skeletal data consisting of 3D positions of body joints. The indirect features help the model cope with the majority of challenges in action recognition. In this paper, the meta-path concept in the complex network has been presented to lessen the unusual actions structure challenges. Further, it boosts recognition performance. The extensive experimental results on two widely adopted benchmark datasets, the MSR-Action Pairs, and MSR Daily Activity3D indicate the efficiency and validity of the method.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号