共查询到20条相似文献,搜索用时 15 毫秒
1.
Free viewpoint action recognition using motion history volumes 总被引:5,自引:0,他引:5
Daniel Weinland Remi Ronfard Edmond Boyer 《Computer Vision and Image Understanding》2006,104(2-3):249
Action recognition is an important and challenging topic in computer vision, with many important applications including video surveillance, automated cinematography and understanding of social interaction. Yet, most current work in gesture or action interpretation remains rooted in view-dependent representations. This paper introduces Motion History Volumes (MHV) as a free-viewpoint representation for human actions in the case of multiple calibrated, and background-subtracted, video cameras. We present algorithms for computing, aligning and comparing MHVs of different actions performed by different people in a variety of viewpoints. Alignment and comparisons are performed efficiently using Fourier transforms in cylindrical coordinates around the vertical axis. Results indicate that this representation can be used to learn and recognize basic human action classes, independently of gender, body size and viewpoint. 相似文献
2.
3.
Masayuki Fukumoto Takehito Ogata Joo Kooi Tan Hyoung Seop Kim Seiji Ishikawa 《Artificial Life and Robotics》2008,13(1):326-330
In this paper, we describe a technique for representing and recognizing human motions using directional motion history images.
A motion history image is a single human motion image produced by superposing binarized successive motion image frames so
that older frames may have smaller weights. It has, however, difficulty that the latest motion overwrites older motions, resulting
in inexact motion representation and therefore incorrect recognition. To overcome this difficulty, we propose directional
motion history images which describe a motion with respect to four directions of movement, i.e. up, down, right and left, employing optical flow. The directional motion history images are thus a set of four motion history
images defined on four optical flow images. Experimental results show that the proposed technique achieves better performance
in the recognition of human motions than the existent motion history images.
This work was presented in part at the 13th International Symposium on Artificial Life and Robotics, Oita, Japan, January
31–February 2, 2008 相似文献
4.
A survey on vision-based human action recognition 总被引:10,自引:0,他引:10
Vision-based human action recognition is the process of labeling image sequences with action labels. Robust solutions to this problem have applications in domains such as visual surveillance, video retrieval and human–computer interaction. The task is challenging due to variations in motion performance, recording settings and inter-personal differences. In this survey, we explicitly address these challenges. We provide a detailed overview of current advances in the field. Image representations and the subsequent classification process are discussed separately to focus on the novelties of recent research. Moreover, we discuss limitations of the state of the art and outline promising directions of research. 相似文献
5.
6.
7.
Georgios Goudelis Konstantinos Karpouzis Stefanos Kollias 《Pattern recognition》2013,46(12):3238-3248
Machine based human action recognition has become very popular in the last decade. Automatic unattended surveillance systems, interactive video games, machine learning and robotics are only few of the areas that involve human action recognition. This paper examines the capability of a known transform, the so-called Trace, for human action recognition and proposes two new feature extraction methods based on the specific transform. The first method extracts Trace transforms from binarized silhouettes, representing different stages of a single action period. A final history template composed from the above transforms, represents the whole sequence containing much of the valuable spatio-temporal information contained in a human action. The second, involves Trace for the construction of a set of invariant features that represent the action sequence and can cope with variations usually appeared in video capturing. The specific method takes advantage of the natural specifications of the Trace transform, to produce noise robust features that are invariant to translation, rotation, scaling and are effective, simple and fast to create. Classification experiments performed on two well known and challenging action datasets (KTH and Weizmann) using Radial Basis Function (RBF) Kernel SVM provided very competitive results indicating the potentials of the proposed techniques. 相似文献
8.
Nazim Ashraf Yuping Shen Xiaochun Cao Hassan Foroosh 《Computer Vision and Image Understanding》2013,117(6):587-602
In this paper, we fully investigate the concept of fundamental ratios, demonstrate their application and significance in view-invariant action recognition, and explore the importance of different body parts in action recognition. A moving plane observed by a fixed camera induces a fundamental matrix F between two frames, where the ratios among the elements in the upper left 2 × 2 submatrix are herein referred to as the fundamental ratios. We show that fundamental ratios are invariant to camera internal parameters and orientation, and hence can be used to identify similar motions of line segments from varying viewpoints. By representing the human body as a set of points, we decompose a body posture into a set of line segments. The similarity between two actions is therefore measured by the motion of line segments and hence by their associated fundamental ratios. We further investigate to what extent a body part plays a role in recognition of different actions and propose a generic method of assigning weights to different body points. Experiments are performed on three categories of data: the controlled CMU MoCap dataset, the partially controlled IXMAS data, and the more challenging uncontrolled UCF-CIL dataset collected on the internet. Extensive experiments are reported on testing (i) view-invariance, (ii) robustness to noisy localization of body points, (iii) effect of assigning different weights to different body points, (iv) effect of partial occlusion on recognition accuracy, and (v) determining how soon our method recognizes an action correctly from the starting point of the query video. 相似文献
9.
10.
11.
In this paper, a novel two-phase framework is presented to deal with the face hallucination problem. In the first phase, an initial high-resolution (HR) face image is produced in patch-wise. Each input low-resolution (LR) patch is represented as a linear combination of training patches and the corresponding HR patch is estimated by the same combination coefficients. Realizing that training patches similar with the input may provide more appropriate textures in the reconstruction, we regularize the combination coefficients by a weighted ?2-norm minimization term which enlarges the coefficients for relevant patches. The HR face image is then initialized by integrating all the HR patches. In the second phase, three regularization models are introduced to produce the final HR face image. Different from most previous approaches which consider global and local priors separately, the proposed algorithm incorporates the global reconstruction model, the local sparsity model and the pixel correlation model into a unified regularization framework. Initializing the regularization problem with the HR image obtained in the first phase, the final output HR image can be optimized through an iterative procedure. Experimental results show that the proposed algorithm achieves better performances in both reconstruction error and visual quality. 相似文献
12.
A key assumption of traditional machine learning approach is that the test data are draw from the same distribution as the training data. However, this assumption does not hold in many real-world scenarios. For example, in facial expression recognition, the appearance of an expression may vary significantly for different people. As a result, previous work has shown that learning from adequate person-specific data can improve the expression recognition performance over the one from generic data. However, person-specific data is typically very sparse in real-world applications due to the difficulties of data collection and labeling, and learning from sparse data may suffer from serious over-fitting. In this paper, we propose to learn a person-specific model through transfer learning. By transferring the informative knowledge from other people, it allows us to learn an accurate model for a new subject with only a small amount of person-specific data. We conduct extensive experiments to compare different person-specific models for facial expression and action unit (AU) recognition, and show that transfer learning significantly improves the recognition performance with a small amount of training data. 相似文献
13.
Learning a compact and yet discriminative codebook is an important procedure for local feature-based action recognition. A common procedure involves two independent phases: reducing the dimensionality of local features and then performing clustering. Since the two phases are disconnected, dimensionality reduction does not necessarily capture the dimensions that are greatly helpful for codebook creation. What’s more, some dimensionality reduction techniques such as the principal component analysis do not take class separability into account and thus may not help build an effective codebook. In this paper, we propose the weighted adaptive metric learning (WAML) which integrates the two independent phases into a unified optimization framework. This framework enables to select indispensable and crucial dimensions for building a discriminative codebook. The dimensionality reduction phase in the WAML is optimized for class separability and adaptively adjusts the distance metric to improve the separability of data. In addition, the video word weighting is smoothly incorporated into the WAML to accurately generate video words. Experimental results demonstrate that our approach builds a highly discriminative codebook and achieves comparable results to other state-of-the-art approaches. 相似文献
14.
《Advanced Engineering Informatics》2015,29(4):1072-1082
Text in images and video contains important information for visual content understanding, indexing, and recognizing. Extraction of this information involves preprocessing, localization and extraction of the text from a given image. In this paper, we propose a novel expiration code detection and recognition algorithm by using Gabor features and collaborative representation based classification. The proposed system consists of four steps: expiration code location, character isolation, Gabor features extraction and characters recognition. For expiration code detection, the Gabor energy (GE) and the maximum energy difference (MED) are extracted. The performance of the recognition algorithm is tested over three Gabor features: GE, magnitude response (MR) and imaginary response (IR). The Gabor features are classified based on collaborative representation based classifier (GCRC). To encompass all frequencies and orientations, downsampling and principal component analysis (PCA) are applied in order to reduce the features space dimensionality. The effectiveness of the proposed localization algorithm is highlighted and compared with other existing methods. Extensive testing shows that the suggested detection scheme outperforms existing methods in terms of detection rate for large image database. Also, GCRC show very competitive results compared with Gabor feature sparse representation based classification (GSRC). Also, the proposed system outperforms the nearest neighbor (NN) classifier and the collaborative representation based classification (CRC). 相似文献
15.
16.
Catherine Achard Xingtai Qu Arash Mokhber Maurice Milgram 《Machine Vision and Applications》2008,19(1):27-34
In this study a new approach is presented for the recognition of human actions of everyday life with a fixed camera. The originality
of the presented method consists in characterizing sequences by a temporal succession of semi-global features, which are extracted
from “space-time micro-volumes”. The advantage of this approach lies in the use of robust features (estimated on several frames)
associated with the ability to manage actions with variable durations and easily segment the sequences with algorithms that
are specific to time-varying data. Each action is actually characterized by a temporal sequence that constitutes the input
of a Hidden Markov Model system for the recognition. Results presented of 1,614 sequences performed by several persons validate
the proposed approach. 相似文献
17.
18.
Lishan Qiao Author Vitae Songcan Chen Author Vitae Xiaoyang Tan Author Vitae 《Pattern recognition》2010,43(1):331-341
Dimensionality reduction methods (DRs) have commonly been used as a principled way to understand the high-dimensional data such as face images. In this paper, we propose a new unsupervised DR method called sparsity preserving projections (SPP). Unlike many existing techniques such as local preserving projection (LPP) and neighborhood preserving embedding (NPE), where local neighborhood information is preserved during the DR procedure, SPP aims to preserve the sparse reconstructive relationship of the data, which is achieved by minimizing a L1 regularization-related objective function. The obtained projections are invariant to rotations, rescalings and translations of the data, and more importantly, they contain natural discriminating information even if no class labels are provided. Moreover, SPP chooses its neighborhood automatically and hence can be more conveniently used in practice compared to LPP and NPE. The feasibility and effectiveness of the proposed method is verified on three popular face databases (Yale, AR and Extended Yale B) with promising results. 相似文献
19.
This paper presents a method for unsupervised learning and recognition of human actions in video. Lacking any supervision, there is nothing except the inherent biases of a given representation to guide grouping of video clips along semantically meaningful partitions. Thus, in the first part of this paper, we compare two contemporary methods, Bag of Features (BOF) and Product Manifolds (PM), for clustering video clips of human facial expressions, hand gestures, and full-body actions, with the goal of better understanding how well these very different approaches to behavior recognition produce semantically relevant clustering of data. 相似文献
20.
Human action recognition is a promising yet non-trivial computer vision field with many potential applications. Current advances in bag-of-feature approaches have brought significant insights into recognizing human actions within complex context. It is, however, a common practice in literature to consider action as merely an orderless set of local salient features. This representation has been shown to be oversimplified, which inherently limits traditional approaches from robust deployment in real-life scenarios. In this work, we propose and show that, by taking into account global configuration of local features, we can greatly improve recognition performance. We first introduce a novel feature selection process called Sparse Hierarchical Bayes Filter to select only the most contributive features of each action type based on neighboring structure constraints. We then present the application of structured learning in human action analysis. That is, by representing human action as a complex set of local features, we can incorporate different spatial and temporal feature constraints into the learning tasks of human action classification and localization. In particular, we tackle the problem of action localization in video using structured learning with two alternatives: one is Dynamic Conditional Random Field from probabilistic perspective; the other is Structural Support Vector Machine from max-margin point of view. We evaluate our modular classification-localization framework on various testbeds, in which our proposed framework is proven to be highly effective and robust compared against bag-of-feature methods. 相似文献