首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到10条相似文献,搜索用时 186 毫秒
1.
本文提出了一个基于流形学习的动作识别框架,用来识别深度图像序列中的人体行为。本文从Kinect设备获得的深度信息中评估出人体的关节点信息,并用相对关节点位置差作为人体特征表达。在训练阶段,本文利用Lapacian eigenmaps(LE)流形学习对高维空间下的训练集进行降维,得到低维隐空间下的运动模型。在识别阶段,本文用最近邻差值方法将测试序列映射到低维流形空间中去,然后进行匹配计算。在匹配过程中,通过使用改进的Hausdorff距离对低维空间下测试序列和训练运动集的吻合度和相似度进行度量。本文用Kinect设备捕获的数据进行了实验,取得了良好的效果;同时本文也在MSR Action3D数据库上进行了测试,结果表明在训练样本较多情况下,本文识别效果优于以往方法。实验结果表明本文所提的方法适用于基于深度图像序列的人体动作识别。  相似文献   

2.
Human actions are, inherently, structured patterns of body movements. We explore ensembles of hierarchical spatio-temporal trees, discovered directly from training data, to model these structures for action recognition and spatial localization. Discovery of frequent and discriminative tree structures is challenging due to the exponential search space, particularly if one allows partial matching. We address this by first building a concise action word vocabulary via discriminative clustering of the hierarchical space-time segments, which is a two-level video representation that captures both static and non-static relevant space-time segments of the video. Using this vocabulary we then utilize tree mining with subsequent tree clustering and ranking to select a compact set of discriminative tree patterns. Our experiments show that these tree patterns, alone, or in combination with shorter patterns (action words and pairwise patterns) achieve promising performance on three challenging datasets: UCF Sports, HighFive and Hollywood3D. Moreover, we perform cross-dataset validation, using trees learned on HighFive to recognize the same actions in Hollywood3D, and using trees learned on UCF-Sports to recognize and localize the similar actions in JHMDB. The results demonstrate the potential for cross-dataset generalization of the trees our approach discovers.  相似文献   

3.
基于流形学习的人体动作识别   总被引:5,自引:2,他引:3       下载免费PDF全文
目的 提出了一个基于流形学习的动作识别框架,用来识别深度图像序列中的人体行为。方法 从Kinect设备获得的深度信息中评估出人体的关节点信息,并用相对关节点位置差作为人体特征表达。在训练阶段,利用LE(Lalpacian eigenmaps)流形学习对高维空间下的训练集进行降维,得到低维隐空间下的运动模型。在识别阶段,用最近邻差值方法将测试序列映射到低维流形空间中去,然后进行匹配计算。在匹配过程中,通过使用改进的Hausdorff距离对低维空间下测试序列和训练运动集的吻合度和相似度进行度量。结果 用Kinect设备捕获的数据进行了实验,取得了良好的效果;同时也在MSR Action3D数据库上进行了测试,结果表明在训练样本较多情况下,本文方法识别效果优于以往方法。结论 实验结果表明本文方法适用于基于深度图像序列的人体动作识别。  相似文献   

4.
Recognizing and tracking multiple activities are all extremely challenging machine vision tasks due to diverse motion types included and high-dimensional (HD) state space. To overcome these difficulties, a novel generative model called composite motion model (CMM) is proposed. This model contains a set of independent, low-dimensional (LD), and activity-specific manifold models that effectively constrain the state search space for 3D human motion recognition and tracking. This separate modeling of activity-specific movements can not only allow each manifold model to be optimized in accordance with only its respective movement, but also improve the scalability of the models. For accurate tracking with our CMM, a particle filter (PF) method is thus employed and then the particles can be distributed in all manifold models at each time step. In addition, an efficient activity switching strategy is proposed to dominate the particle distribution on all LD manifolds. To diffuse the particles amongst manifold models and respond quickly to the sudden changes in the activity, a set of visually-reasonable and kinematically-realistic transition bridges are synthesized by using the good properties of LD latent space and HD observation space, which enables the inter-activity motions seem more natural and realistic. Finally, a pose hypothesis that can best interpret the visual observation is selected and then used to recognize the activity that is currently observed. Extensive experiments, via qualitative and quantitative analyses, verify the effectiveness and robustness of our proposed CMM in the tasks of multi-activity 3D human motion recognition and tracking.  相似文献   

5.
Due to the promising applications including video surveillance, video annotation, and interaction gaming, human action recognition from videos has attracted much research interest. Although various works have been proposed for human action recognition, there still exist many challenges such as illumination condition, viewpoint, camera motion and cluttered background. Extracting discriminative representation is one of the main ways to handle these challenges. In this paper, we propose a novel action recognition method that simultaneously learns middle-level representation and classifier by jointly training a multinomial logistic regression (MLR) model and a discriminative dictionary. In the proposed method, sparse code of low-level representation, conducting as latent variables of MLR, can capture the structure of low-level feature and thus is more discriminate. Meanwhile, the training of dictionary and MLR model are integrated into one objective function for considering the information of categories. By optimizing this objective function, we can learn a discriminative dictionary modulated by MLR and a MLR model driven by sparse coding. The proposed method is evaluated on YouTube action dataset and HMDB51 dataset. Experimental results demonstrate that our method is comparable with mainstream methods.  相似文献   

6.
Slow Feature Analysis (SFA) extracts slowly varying features from a quickly varying input signal. It has been successfully applied to modeling the visual receptive fields of the cortical neurons. Sufficient experimental results in neuroscience suggest that the temporal slowness principle is a general learning principle in visual perception. In this paper, we introduce the SFA framework to the problem of human action recognition by incorporating the discriminative information with SFA learning and considering the spatial relationship of body parts. In particular, we consider four kinds of SFA learning strategies, including the original unsupervised SFA (U-SFA), the supervised SFA (S-SFA), the discriminative SFA (D-SFA), and the spatial discriminative SFA (SD-SFA), to extract slow feature functions from a large amount of training cuboids which are obtained by random sampling in motion boundaries. Afterward, to represent action sequences, the squared first order temporal derivatives are accumulated over all transformed cuboids into one feature vector, which is termed the Accumulated Squared Derivative (ASD) feature. The ASD feature encodes the statistical distribution of slow features in an action sequence. Finally, a linear support vector machine (SVM) is trained to classify actions represented by ASD features. We conduct extensive experiments, including two sets of control experiments, two sets of large scale experiments on the KTH and Weizmann databases, and two sets of experiments on the CASIA and UT-interaction databases, to demonstrate the effectiveness of SFA for human action recognition. Experimental results suggest that the SFA-based approach (1) is able to extract useful motion patterns and improves the recognition performance, (2) requires less intermediate processing steps but achieves comparable or even better performance, and (3) has good potential to recognize complex multiperson activities.  相似文献   

7.
This paper presents a human action recognition framework based on the theory of nonlinear dynamical systems. The ultimate aim of our method is to recognize actions from multi-view video. We estimate and represent human motion by means of a virtual skeleton model providing the basis for a view-invariant representation of human actions. Actions are modeled as a set of weighted dynamical systems associated to different model variables. We use time-delay embeddings on the time series resulting of the evolution of model variables along time to reconstruct phase portraits of appropriate dimensions. These phase portraits characterize the underlying dynamical systems. We propose a distance to compare trajectories within the reconstructed phase portraits. These distances are used to train SVM models for action recognition. Additionally, we propose an efficient method to learn a set of weights reflecting the discriminative power of a given model variable in a given action class. Our approach presents a good behavior on noisy data, even in cases where action sequences last just for a few frames. Experiments with marker-based and markerless motion capture data show the effectiveness of the proposed method. To the best of our knowledge, this contribution is the first to apply time-delay embeddings on data obtained from multi-view video.  相似文献   

8.
Recognizing human actions from a stream of unsegmented sensory observations is important for a number of applications such as surveillance and human-computer interaction. A wide range of graphical models have been proposed for these tasks, and are typically extensions of the generative hidden Markov models (HMMs) or their discriminative counterpart, conditional random fields (CRFs). These extensions typically address one of three key limitations in the basic HMM/CRF formalism – unrealistic models for the duration of a sub-event, not encoding interactions among multiple agents directly and not modeling the inherent hierarchical organization of activities. In our work, we present a family of graphical models that generalize such extensions and simultaneously model event duration, multi agent interactions and hierarchical structure. We also present general algorithms for efficient learning and inference in such models based on local variational approximations. We demonstrate the effectiveness of our framework by developing graphical models for applications in automatic sign language (ASL) recognition, and for gesture and action recognition in videos. Our methods show results comparable to state-of-the-art in the datasets we consider, while requiring far fewer training examples compared to low-level feature based methods.  相似文献   

9.
In this paper we address the problem of modeling and analyzing human motion by focusing on 3D body skeletons. Particularly, our intent is to represent skeletal motion in a geometric and efficient way, leading to an accurate action–recognition system. Here an action is represented by a dynamical system whose observability matrix is characterized as an element of a Grassmann manifold. To formulate our learning algorithm, we propose two distinct ideas: (1) in the first one we perform classification using a Truncated Wrapped Gaussian model, one for each class in its own tangent space. (2) In the second one we propose a novel learning algorithm that uses a vector representation formed by concatenating local coordinates in tangent spaces associated with different classes and training a linear SVM. We evaluate our approaches on three public 3D action datasets: MSR-action 3D, UT-kinect and UCF-kinect datasets; these datasets represent different kinds of challenges and together help provide an exhaustive evaluation. The results show that our approaches either match or exceed state-of-the-art performance reaching 91.21% on MSR-action 3D, 97.91% on UCF-kinect, and 88.5% on UT-kinect. Finally, we evaluate the latency, i.e. the ability to recognize an action before its termination, of our approach and demonstrate improvements relative to other published approaches.  相似文献   

10.
冯文刚 《自动化学报》2014,40(4):763-770
针对层次场景图像序列,本文提出了一种数据驱动的基于快速序列视觉表述任务(rapid serial visual presentation task,RSVP)的场景识别模型. 首先基于金字塔模型提取三层尺度图像块,然后构建包括全局和局部特征的词汇字典,接着分别利用生成模型和判决模型训练视觉词汇,最后通过神经网络从图像块标记中获得场景类别. 实验表明算法能够获得更为精确的分类结果.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号