首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
In this paper, a human action recognition method is presented in which pose representation is based on the contour points of the human silhouette and actions are learned by making use of sequences of multi-view key poses. Our contribution is twofold. Firstly, our approach achieves state-of-the-art success rates without compromising the speed of the recognition process and therefore showing suitability for online recognition and real-time scenarios. Secondly, dissimilarities among different actors performing the same action are handled by taking into account variations in shape (shifting the test data to the known domain of key poses) and speed (considering inconsistent time scales in the classification). Experimental results on the publicly available Weizmann, MuHAVi and IXMAS datasets return high and stable success rates, achieving, to the best of our knowledge, the best rate so far on the MuHAVi Novel Actor test.  相似文献   

2.
In this paper, we present a new silhouette-based gait recognition method via deterministic learning theory, which combines spatio-temporal motion characteristics and physical parameters of a human subject by analyzing shape parameters of the subject?s silhouette contour. It has been validated only in sequences with lateral view, recorded in laboratory conditions. The ratio of the silhouette?s height and width (H–W ratio), the width of the outer contour of the binarized silhouette, the silhouette area and the vertical coordinate of centroid of the outer contour are combined as gait features for recognition. They represent the dynamics of gait motion and can more effectively reflect the tiny variance between different gait patterns. The gait recognition approach consists of two phases: a training phase and a test phase. In the training phase, the gait dynamics underlying different individuals? gaits are locally accurately approximated by radial basis function (RBF) networks via deterministic learning theory. The obtained knowledge of approximated gait dynamics is stored in constant RBF networks. In the test phase, a bank of dynamical estimators is constructed for all the training gait patterns. The constant RBF networks obtained from the training phase are embedded in the estimators. By comparing the set of estimators with a test gait pattern, a set of recognition errors are generated, and the average L1 norms of the errors are taken as the similarity measure between the dynamics of the training gait patterns and the dynamics of the test gait pattern. The test gait pattern similar to one of the training gait patterns can be recognized according to the smallest error principle. Finally, the recognition performance of the proposed algorithm is comparatively illustrated to take into consideration the published gait recognition approaches on the most well-known public gait databases: CASIA, CMU MoBo and TUM GAID.  相似文献   

3.
4.
A complete, fast and practical isolated object recognition system has been developed which is very robust with respect to scale, position and orientation changes of the objects as well as noise and local deformations of shape (due to perspective projection, segmentation errors and non-rigid material used in some objects). The system has been tested on a wide variety of three-dimensional objects with different shapes and material and surface properties. A light-box setup is used to obtain silhouette images which are segmented to obtain the physical boundaries of the objects which are classified as either convex or concave. Convex curves are recognized using their four high-scale curvature extrema points. Curvature scale space (CSS) representations are computed for concave curves. The CSS representation is a multi-scale organization of the natural, invariant features of a curve (curvature zero-crossings or extrema) and useful for very reliable recognition of the correct model since it places no constraints on the shape of objects. A three-stage, coarse-to-fine matching algorithm prunes the search space in stage one by applying the CSS aspect ratio test. The maxima of contours in CSS representations of the surviving models are used for fast CSS matching in stage two. Finally, stage three verifies the best match and resolves any ambiguities by determining the distance between the image and model curves. Transformation parameter optimization is then used to find the best fit of the input object to the correct model  相似文献   

5.
Silhouette-based occluded object recognition through curvature scale space   总被引:4,自引:0,他引:4  
A complete and practical system for occluded object recognition has been developed which is very robust with respect to noise and local deformations of shape (due to weak perspective distortion, segmentation errors and non-rigid material) as well as scale, position and orientation changes of the objects. The system has been tested on a wide variety of free-form 3D objects. An industrial application is envisaged where a fixed camera and a light-box are utilized to obtain images. Within the constraints of the system, every rigid 3D object can be modeled by a limited number of classes of 2D contours corresponding to the object's resting positions on the light-box. The contours in each class are related to each other by a 2D similarity transformation. The Curvature Scale Space technique [26, 28] is then used to obtain a novel multi-scale segmentation of the image and the model contours. Object indexing [16, 32, 36] is used to narrow down the search space. An efficient local matching algorithm is utilized to select the best matching models. Received: 5 August 1996 / Accepted: 19 March 1997  相似文献   

6.
Conventional human action recognition algorithms cannot work well when the amount of training videos is insufficient. We solve this problem by proposing a transfer topic model (TTM), which utilizes information extracted from videos in the auxiliary domain to assist recognition tasks in the target domain. The TTM is well characterized by two aspects: 1) it uses the bag-of-words model trained from the auxiliary domain to represent videos in the target domain; and 2) it assumes each human action is a mixture of a set of topics and uses the topics learned from the auxiliary domain to regularize the topic estimation in the target domain, wherein the regularization is the summation of Kullback-Leibler divergences between topic pairs of the two domains. The utilization of the auxiliary domain knowledge improves the generalization ability of the learned topic model. Experiments on Weizmann and KTH human action databases suggest the effectiveness of the proposed TTM for cross-domain human action recognition.  相似文献   

7.
The understanding of human activity is one of the key research areas in human-centered robotic applications. In this paper, we propose complexity-based motion features for recognizing human actions. Using a time-series-complexity measure, the proposed method evaluates the amount of useful information in subsequences to select meaningful temporal parts in a human motion trajectory. Based on these meaningful subsequences, motion codewords are learned using a clustering algorithm. Motion features are then generated and represented as a histogram of the motion codewords. Furthermore, we propose a multiscaled sliding window for generating motion codewords to solve the sensitivity problem of the performance to the fixed length of the sliding window. As a classification method, we employed a random forest classifier. Moreover, to validate the proposed method, we present experimental results of the proposed approach based on two open data sets: MSR Action 3D and UTKinect data sets.  相似文献   

8.
This paper proposes a boosting EigenActions algorithm for human action recognition. A spatio-temporal Information Saliency Map (ISM) is calculated from a video sequence by estimating pixel density function. A continuous human action is segmented into a set of primitive periodic motion cycles from information saliency curve. Each cycle of motion is represented by a Salient Action Unit (SAU), which is used to determine the EigenAction using principle component analysis. A human action classifier is developed using multi-class Adaboost algorithm with Bayesian hypothesis as the weak classifier. Given a human action video sequence, the proposed method effectively locates the SAUs in the video, and recognizes the human actions by categorizing the SAUs. Two publicly available human action databases, namely KTH and Weizmann, are selected for evaluation. The average recognition accuracy are 81.5% and 98.3% for KTH and Weizmann databases, respectively. Comparative results with two recent methods and robustness test results are also reported.  相似文献   

9.
10.
This paper presents a novel method that leverages reasoning capabilities in a computer vision system dedicated to human action recognition. The proposed methodology is decomposed into two stages. First, a machine learning based algorithm – known as bag of words – gives a first estimate of action classification from video sequences, by performing an image feature analysis. Those results are afterward passed to a common-sense reasoning system, which analyses, selects and corrects the initial estimation yielded by the machine learning algorithm. This second stage resorts to the knowledge implicit in the rationality that motivates human behaviour. Experiments are performed in realistic conditions, where poor recognition rates by the machine learning techniques are significantly improved by the second stage in which common-sense knowledge and reasoning capabilities have been leveraged. This demonstrates the value of integrating common-sense capabilities into a computer vision pipeline.  相似文献   

11.
In this paper, we present a machine learning approach for subject independent human action recognition using depth camera, emphasizing the importance of depth in recognition of actions. The proposed approach uses the flow information of all 3 dimensions to classify an action. In our approach, we have obtained the 2-D optical flow and used it along with the depth image to obtain the depth flow (Z motion vectors). The obtained flow captures the dynamics of the actions in space–time. Feature vectors are obtained by averaging the 3-D motion over a grid laid over the silhouette in a hierarchical fashion. These hierarchical fine to coarse windows capture the motion dynamics of the object at various scales. The extracted features are used to train a Meta-cognitive Radial Basis Function Network (McRBFN) that uses a Projection Based Learning (PBL) algorithm, referred to as PBL-McRBFN, henceforth. PBL-McRBFN begins with zero hidden neurons and builds the network based on the best human learning strategy, namely, self-regulated learning in a meta-cognitive environment. When a sample is used for learning, PBL-McRBFN uses the sample overlapping conditions, and a projection based learning algorithm to estimate the parameters of the network. The performance of PBL-McRBFN is compared to that of a Support Vector Machine (SVM) and Extreme Learning Machine (ELM) classifiers with representation of every person and action in the training and testing datasets. Performance study shows that PBL-McRBFN outperforms these classifiers in recognizing actions in 3-D. Further, a subject-independent study is conducted by leave-one-subject-out strategy and its generalization performance is tested. It is observed from the subject-independent study that McRBFN is capable of generalizing actions accurately. The performance of the proposed approach is benchmarked with Video Analytics Lab (VAL) dataset and Berkeley Multi-modal Human Action Database (MHAD).  相似文献   

12.
Ren  Ziliang  Zhang  Qieshi  Gao  Xiangyang  Hao  Pengyi  Cheng  Jun 《Multimedia Tools and Applications》2021,80(11):16185-16203
Multimedia Tools and Applications - The multi-modality based human action recognition is an increasing topic. Multi-modality can provide more abundant and complementary information than single...  相似文献   

13.
Human action recognition (HAR) is a core technology for human–computer interaction and video understanding, attracting significant research and development attention in the field of computer vision. However, in uncontrolled environments, achieving effective HAR is still challenging, due to the widely varying nature of video content. In previous research efforts, trajectory-based video representations have been widely used for HAR. Although these approaches show state-of-the-art HAR performance for various datasets, issues like a high computational complexity and the presence of redundant trajectories still need to be addressed in order to solve the problem of real-world HAR. In this paper, we propose a novel method for HAR, integrating a technique for rejecting redundant trajectories that are mainly originating from camera movement, without degrading the effectiveness of HAR. Furthermore, in order to facilitate efficient optical flow estimation prior to trajectory extraction, we integrate a technique for dynamic frame skipping. As a result, we only make use of a small subset of the frames present in a video clip for optical flow estimation. Comparative experiments with five publicly available human action datasets show that the proposed method outperforms state-of-the-art HAR approaches in terms of effectiveness, while simultaneously mitigating the computational complexity.  相似文献   

14.
15.
This paper discusses the task of continuous human action recognition. By continuous, it refers to videos that contain multiple actions which are connected together. This task is important to applications like video surveillance and content based video retrieval. It aims to identify the action category and detect the start and end key frame of each action. It is a challenging task due to the frequent changes of human actions and the ambiguity of action boundaries. In this paper, a novel and efficient continuous action recognition framework is proposed. Our approach is based on the bag of words representation. A visual local pattern is regarded as a word and the action is modeled by the distribution of words. A generative translation and scale invariant probabilistic Latent Semantic Analysis model is presented. The continuous action recognition result is obtained frame by frame and updated from time to time. Experimental results show that this approach is effective and efficient to recognize both isolated actions and continuous actions.  相似文献   

16.
17.
Slow Feature Analysis (SFA) extracts slowly varying features from a quickly varying input signal. It has been successfully applied to modeling the visual receptive fields of the cortical neurons. Sufficient experimental results in neuroscience suggest that the temporal slowness principle is a general learning principle in visual perception. In this paper, we introduce the SFA framework to the problem of human action recognition by incorporating the discriminative information with SFA learning and considering the spatial relationship of body parts. In particular, we consider four kinds of SFA learning strategies, including the original unsupervised SFA (U-SFA), the supervised SFA (S-SFA), the discriminative SFA (D-SFA), and the spatial discriminative SFA (SD-SFA), to extract slow feature functions from a large amount of training cuboids which are obtained by random sampling in motion boundaries. Afterward, to represent action sequences, the squared first order temporal derivatives are accumulated over all transformed cuboids into one feature vector, which is termed the Accumulated Squared Derivative (ASD) feature. The ASD feature encodes the statistical distribution of slow features in an action sequence. Finally, a linear support vector machine (SVM) is trained to classify actions represented by ASD features. We conduct extensive experiments, including two sets of control experiments, two sets of large scale experiments on the KTH and Weizmann databases, and two sets of experiments on the CASIA and UT-interaction databases, to demonstrate the effectiveness of SFA for human action recognition. Experimental results suggest that the SFA-based approach (1) is able to extract useful motion patterns and improves the recognition performance, (2) requires less intermediate processing steps but achieves comparable or even better performance, and (3) has good potential to recognize complex multiperson activities.  相似文献   

18.
A survey on vision-based human action recognition   总被引:10,自引:0,他引:10  
Vision-based human action recognition is the process of labeling image sequences with action labels. Robust solutions to this problem have applications in domains such as visual surveillance, video retrieval and human–computer interaction. The task is challenging due to variations in motion performance, recording settings and inter-personal differences. In this survey, we explicitly address these challenges. We provide a detailed overview of current advances in the field. Image representations and the subsequent classification process are discussed separately to focus on the novelties of recent research. Moreover, we discuss limitations of the state of the art and outline promising directions of research.  相似文献   

19.
Silhouette-based multi-sensor smoke detection   总被引:1,自引:0,他引:1  
Fire is one of the leading hazards affecting everyday life around the world. The sooner the fire is detected, the better the chances are for survival. Today’s fire alarm systems, such as video-based smoke detectors, however, still pose many problems. In order to accomplish more accurate video-based smoke detection and to reduce false alarms, this paper proposes a multi-sensor smoke detector which takes advantage of the different kinds of information represented by visual and thermal imaging sensors. The detector analyzes the silhouette coverage of moving objects in visual and long-wave infrared registered (~aligned) images. The registration is performed using a contour mapping algorithm which detects the rotation, scale and translation between moving objects in the multi-spectral images. The geometric parameters found at this stage are then further used to coarsely map the silhouette images and coverage between them is calculated. Since smoke is invisible in long-wave infrared its silhouette will, contrarily to ordinary moving objects, only be detected in visual images. As such, the coverage of thermal and visual silhouettes will start to decrease in case of smoke. Due to the dynamic character of the smoke, the visual silhouette will also show a high degree of disorder. By focusing on both silhouette behaviors, the system is able to accurately detect the smoke. Experiments on smoke and non-smoke multi-sensor sequences indicate that the automated smoke detection algorithm is able to coarsely map the multi-sensor images. Furthermore, using the low-cost silhouette analysis, a fast warning, with a low number of false alarms, can be given.  相似文献   

20.
In this paper, we fully investigate the concept of fundamental ratios, demonstrate their application and significance in view-invariant action recognition, and explore the importance of different body parts in action recognition. A moving plane observed by a fixed camera induces a fundamental matrix F between two frames, where the ratios among the elements in the upper left 2 × 2 submatrix are herein referred to as the fundamental ratios. We show that fundamental ratios are invariant to camera internal parameters and orientation, and hence can be used to identify similar motions of line segments from varying viewpoints. By representing the human body as a set of points, we decompose a body posture into a set of line segments. The similarity between two actions is therefore measured by the motion of line segments and hence by their associated fundamental ratios. We further investigate to what extent a body part plays a role in recognition of different actions and propose a generic method of assigning weights to different body points. Experiments are performed on three categories of data: the controlled CMU MoCap dataset, the partially controlled IXMAS data, and the more challenging uncontrolled UCF-CIL dataset collected on the internet. Extensive experiments are reported on testing (i) view-invariance, (ii) robustness to noisy localization of body points, (iii) effect of assigning different weights to different body points, (iv) effect of partial occlusion on recognition accuracy, and (v) determining how soon our method recognizes an action correctly from the starting point of the query video.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号