共查询到18条相似文献,搜索用时 0 毫秒
1.
Human action recognition in video is important in many computer vision applications such as automated surveillance. Human actions can be compactly encoded using a sparse set of local spatio-temporal salient features at different scales. The existing bottom-up methods construct a single dictionary of action primitives from the joint features of all scales and hence, a single action representation. This representation cannot fully exploit the complementary characteristics of the motions across different scales. To address this problem, we introduce the concept of learning multiple dictionaries of action primitives at different resolutions and consequently, multiple scale-specific representations for a given video sample. Using a decoupled fusion of multiple representations, we improved the human classification accuracy of realistic benchmark databases by about 5%, compared with the state-of-the art methods. 相似文献
2.
3.
Content-based video retrieval is an increasingly popular research field, in large part due to the quickly growing catalogue of multimedia data to be found online. Even though a large portion of this data concerns humans, however, retrieval of human actions has received relatively little attention. Presented in this paper is a video retrieval system that can be used to perform a content-based query on a large database of videos very efficiently. Furthermore, it is shown that by using ABRS-SVM, a technique for incorporating Relevance feedback (RF) on the search results, it is possible to quickly achieve useful results even when dealing with very complex human action queries, such as in Hollywood movies. 相似文献
4.
Georgios Goudelis Konstantinos Karpouzis Stefanos Kollias 《Pattern recognition》2013,46(12):3238-3248
Machine based human action recognition has become very popular in the last decade. Automatic unattended surveillance systems, interactive video games, machine learning and robotics are only few of the areas that involve human action recognition. This paper examines the capability of a known transform, the so-called Trace, for human action recognition and proposes two new feature extraction methods based on the specific transform. The first method extracts Trace transforms from binarized silhouettes, representing different stages of a single action period. A final history template composed from the above transforms, represents the whole sequence containing much of the valuable spatio-temporal information contained in a human action. The second, involves Trace for the construction of a set of invariant features that represent the action sequence and can cope with variations usually appeared in video capturing. The specific method takes advantage of the natural specifications of the Trace transform, to produce noise robust features that are invariant to translation, rotation, scaling and are effective, simple and fast to create. Classification experiments performed on two well known and challenging action datasets (KTH and Weizmann) using Radial Basis Function (RBF) Kernel SVM provided very competitive results indicating the potentials of the proposed techniques. 相似文献
5.
6.
A survey on vision-based human action recognition 总被引:10,自引:0,他引:10
Vision-based human action recognition is the process of labeling image sequences with action labels. Robust solutions to this problem have applications in domains such as visual surveillance, video retrieval and human–computer interaction. The task is challenging due to variations in motion performance, recording settings and inter-personal differences. In this survey, we explicitly address these challenges. We provide a detailed overview of current advances in the field. Image representations and the subsequent classification process are discussed separately to focus on the novelties of recent research. Moreover, we discuss limitations of the state of the art and outline promising directions of research. 相似文献
7.
8.
In this paper, we propose a hierarchical discriminative approach for human action recognition. It consists of feature extraction with mutual motion pattern analysis and discriminative action modeling in the hierarchical manifold space. Hierarchical Gaussian Process Latent Variable Model (HGPLVM) is employed to learn the hierarchical manifold space in which motion patterns are extracted. A cascade CRF is also presented to estimate the motion patterns in the corresponding manifold subspace, and the trained SVM classifier predicts the action label for the current observation. Using motion capture data, we test our method and evaluate how body parts make effect on human action recognition. The results on our test set of synthetic images are also presented to demonstrate the robustness. 相似文献
9.
This work describes a computational approach for a typical machine-vision application, that of human action recognition from video streams. We present a method that has the following advantages: (a) no human intervention in pre-processing stages, (b) a reduced feature set, (c) modularity of the recognition system and (d) control of the model’s complexity in acceptable for real-time operation levels. Representation of each video frame and feature extraction procedure are formulated in the lattice theory context. The recognition system consists of two components: an ensemble of neural network predictors which correspond to the training video sequences and one classifier, based on the PREMONN approach, capable of deciding at each time instant which known video source has potentially generated a new sequence of frames. Extensive experimental study on three well known benchmarks validates the flexibility and robustness of the proposed approach. 相似文献
10.
Discriminative human pose estimation is the problem of inferring the 3D articulated pose of a human directly from an image feature. This is a challenging problem due to the highly non-linear and multi-modal mapping from the image feature space to the pose space. To address this problem, we propose a model employing a mixture of Gaussian processes where each Gaussian process models a local region of the pose space. By employing the models in this way we are able to overcome the limitations of Gaussian processes applied to human pose estimation — their O(N3) time complexity and their uni-modal predictive distribution. Our model is able to give a multi-modal predictive distribution where each mode is represented by a different Gaussian process prediction. A logistic regression model is used to give a prior over each expert prediction in a similar fashion to previous mixture of expert models. We show that this technique outperforms existing state of the art regression techniques on human pose estimation data sets for ballet dancing, sign language and the HumanEva data set. 相似文献
11.
Alexandros Andre Chaaraoui Pau Climent-Pérez Francisco Flórez-Revuelta 《Pattern recognition letters》2013
In this paper, a human action recognition method is presented in which pose representation is based on the contour points of the human silhouette and actions are learned by making use of sequences of multi-view key poses. Our contribution is twofold. Firstly, our approach achieves state-of-the-art success rates without compromising the speed of the recognition process and therefore showing suitability for online recognition and real-time scenarios. Secondly, dissimilarities among different actors performing the same action are handled by taking into account variations in shape (shifting the test data to the known domain of key poses) and speed (considering inconsistent time scales in the classification). Experimental results on the publicly available Weizmann, MuHAVi and IXMAS datasets return high and stable success rates, achieving, to the best of our knowledge, the best rate so far on the MuHAVi Novel Actor test. 相似文献
12.
《Expert systems with applications》2014,41(3):786-794
Interest in RGB-D devices is increasing due to their low price and the wide range of possible applications that come along. These devices provide a marker-less body pose estimation by means of skeletal data consisting of 3D positions of body joints. These can be further used for pose, gesture or action recognition. In this work, an evolutionary algorithm is used to determine the optimal subset of skeleton joints, taking into account the topological structure of the skeleton, in order to improve the final success rate. The proposed method has been validated using a state-of-the-art RGB action recognition approach, and applying it to the MSR-Action3D dataset. Results show that the proposed algorithm is able to significantly improve the initial recognition rate and to yield similar or better success rates than the state-of-the-art methods. 相似文献
13.
14.
Analysis of human behaviour through visual information has been a highly active research topic in the computer vision community. This was previously achieved via images from a conventional camera, however recently depth sensors have made a new type of data available. This survey starts by explaining the advantages of depth imagery, then describes the new sensors that are available to obtain it. In particular, the Microsoft Kinect has made high-resolution real-time depth cheaply available. The main published research on the use of depth imagery for analysing human activity is reviewed. Much of the existing work focuses on body part detection and pose estimation. A growing research area addresses the recognition of human actions. The publicly available datasets that include depth imagery are listed, as are the software libraries that can acquire it from a sensor. This survey concludes by summarising the current state of work on this topic, and pointing out promising future research directions. For both researchers and practitioners who are familiar with this topic and those who are new to this field, the review will aid in the selection, and development, of algorithms using depth data. 相似文献
15.
《Graphical Models》2014,76(3):162-171
In this work, we investigate whether it is possible to distinguish conversational interactions from observing human motion alone, in particular subject specific gestures in 3D. We adopt Kinect sensors to obtain 3D displacement and velocity measurements, followed by wavelet decomposition to extract low level temporal features. These features are then generalized to form a visual vocabulary that can be further generalized to a set of topics from temporal distributions of visual vocabulary. A subject specific supervised learning approach based on Random Forests is used to classify the testing sequences to seven different conversational scenarios. These conversational scenarios concerned in this work have rather subtle differences among them. Unlike typical action or event recognition, each interaction in our case contain many instances of primitive motions and actions, many of which are shared among different conversation scenarios. That is the interactions we are concerned with are not micro or instant events, such as hugging and high-five, but rather interactions over a period of time that consists rather similar individual motions, micro actions and interactions. We believe this is among one of the first work that is devoted to subject specific conversational interaction classification using 3D pose features and to show this task is indeed possible. 相似文献
16.
Conrad Sanderson Author Vitae Samy Bengio Author Vitae Author Vitae 《Pattern recognition》2006,39(2):288-302
We address the pose mismatch problem which can occur in face verification systems that have only a single (frontal) face image available for training. In the framework of a Bayesian classifier based on mixtures of gaussians, the problem is tackled through extending each frontal face model with artificially synthesized models for non-frontal views. The synthesis methods are based on several implementations of maximum likelihood linear regression (MLLR), as well as standard multi-variate linear regression (LinReg). All synthesis techniques rely on prior information and learn how face models for the frontal view are related to face models for non-frontal views. The synthesis and extension approach is evaluated by applying it to two face verification systems: a holistic system (based on PCA-derived features) and a local feature system (based on DCT-derived features). Experiments on the FERET database suggest that for the holistic system, the LinReg-based technique is more suited than the MLLR-based techniques; for the local feature system, the results show that synthesis via a new MLLR implementation obtains better performance than synthesis based on traditional MLLR. The results further suggest that extending frontal models considerably reduces errors. It is also shown that the local feature system is less affected by view changes than the holistic system; this can be attributed to the parts based representation of the face, and, due to the classifier based on mixtures of gaussians, the lack of constraints on spatial relations between the face parts, allowing for deformations and movements of face areas. 相似文献
17.
Human action recognition is a promising yet non-trivial computer vision field with many potential applications. Current advances in bag-of-feature approaches have brought significant insights into recognizing human actions within complex context. It is, however, a common practice in literature to consider action as merely an orderless set of local salient features. This representation has been shown to be oversimplified, which inherently limits traditional approaches from robust deployment in real-life scenarios. In this work, we propose and show that, by taking into account global configuration of local features, we can greatly improve recognition performance. We first introduce a novel feature selection process called Sparse Hierarchical Bayes Filter to select only the most contributive features of each action type based on neighboring structure constraints. We then present the application of structured learning in human action analysis. That is, by representing human action as a complex set of local features, we can incorporate different spatial and temporal feature constraints into the learning tasks of human action classification and localization. In particular, we tackle the problem of action localization in video using structured learning with two alternatives: one is Dynamic Conditional Random Field from probabilistic perspective; the other is Structural Support Vector Machine from max-margin point of view. We evaluate our modular classification-localization framework on various testbeds, in which our proposed framework is proven to be highly effective and robust compared against bag-of-feature methods. 相似文献
18.
Most work on multi-biometric fusion is based on static fusion rules. One prominent limitation of static fusion is that it cannot respond to the changes of the environment or the individual users. This paper proposes context-aware multi-biometric fusion, which can dynamically adapt the fusion rules to the real-time context. As a typical application, the context-aware fusion of gait and face for human identification in video is investigated. Two significant context factors that may affect the relationship between gait and face in the fusion are considered, i.e., view angle and subject-to-camera distance. Fusion methods adaptable to these two factors based on either prior knowledge or machine learning are proposed and tested. Experimental results show that the context-aware fusion methods perform significantly better than not only the individual biometric traits, but also those widely adopted static fusion rules including SUM, PRODUCT, MIN, and MAX. Moreover, context-aware fusion based on machine learning shows superiority over that based on prior knowledge. 相似文献