首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 843 毫秒
1.
目的 对人体行为的描述是行为识别中的关键问题,为了能够充分利用训练数据从而保证特征对行为的高描述性,提出了基于局部时空特征方向加权的人体行为识别方法。方法 首先,将局部时空特征的亮度梯度特征分解为3个方向(XYZ)分别来描述行为, 通过直接构造视觉词汇表分别得到不同行为3方向特征描述子集合的标准视觉词汇码本,并利用训练视频得到每个行为的标准3方向词汇分布;进而,根据不同行为3方向特征描述子集合的标准视觉词汇码本,分别计算测试视频相应的3方向的词汇分布,并利用与各行为标准3方向词汇分布的加权相似性度量进行行为识别;结果 在Weizmann数据库和KTH数据库中进行实验,Weizmann数据库中的平均识别率高达96.04%,KTH数据库中的平均识别率也高达96.93%。结论 与其他行为识别方法相比可以明显提高行为平均识别率。  相似文献   

2.
Action recognition on large categories of unconstrained videos taken from the web is a very challenging problem compared to datasets like KTH (6 actions), IXMAS (13 actions), and Weizmann (10 actions). Challenges like camera motion, different viewpoints, large interclass variations, cluttered background, occlusions, bad illumination conditions, and poor quality of web videos cause the majority of the state-of-the-art action recognition approaches to fail. Also, an increased number of categories and the inclusion of actions with high confusion add to the challenges. In this paper, we propose using the scene context information obtained from moving and stationary pixels in the key frames, in conjunction with motion features, to solve the action recognition problem on a large (50 actions) dataset with videos from the web. We perform a combination of early and late fusion on multiple features to handle the very large number of categories. We demonstrate that scene context is a very important feature to perform action recognition on very large datasets. The proposed method does not require any kind of video stabilization, person detection, or tracking and pruning of features. Our approach gives good performance on a large number of action categories; it has been tested on the UCF50 dataset with 50 action categories, which is an extension of the UCF YouTube Action (UCF11) dataset containing 11 action categories. We also tested our approach on the KTH and HMDB51 datasets for comparison.  相似文献   

3.
针对全局运动特征难以准确提取的问题,本文采用局部时空特征对人体行为进行表征。针对传统词袋中硬分类的方法量化误差大的不足,本文借鉴模糊聚类的思想,提出软分类的方法。根据兴趣点检测算法从视频中提取出视觉词汇,用K means算法对其进行聚类,建立码本。在计算分类特征时,首先计算待分类视觉词汇到码本中各个码字的距离,根据距离计算这个视觉词汇隶属于各个码字的概率,最后统计得到每个视频中各码字出现的频率。在Weizmann和KTH数据库对本文提出的人体行为识别算法进行验证,Weizmann库的识别率比传统的词袋算法提高8%,KTH库的识别率比传统的词袋算法提高9%,因此本文提出的算法能更有效地对人体行为进行识别。  相似文献   

4.

Human action recognition based on silhouette images has wide applications in computer vision, human computer interaction and intelligent surveillance. It is a challenging task due to the complex actions in nature. In this paper, a human action recognition method is proposed which is based on the distance transform and entropy features of human silhouettes. In the first stage, background subtraction is performed by applying correlation coefficient based frame difference technique to extract silhouette images. In the second stage, distance transform based features and entropy features are extracted from the silhouette images. The distance transform based features and entropy features provide the shape and local variation information. These features are given as input to neural networks to recognize various human actions. The proposed method is tested on three different datasets viz., Weizmann, KTH and UCF50. The proposed method obtains an accuracy of 92.5%, 91.4% and 80% for Weizmann, KTH and UCF50 datasets respectively. The experimental results show that the proposed method for human action recognition is comparable to other state-of-the-art human action recognition methods.

  相似文献   

5.
6.
7.

Deep learning models have attained great success for an extensive range of computer vision applications including image and video classification. However, the complex architecture of the most recently developed networks imposes certain memory and computational resource limitations, especially for human action recognition applications. Unsupervised deep convolutional neural networks such as PCANet can alleviate these limitations and hence significantly reduce the computational complexity of the whole recognition system. In this work, instead of using 3D convolutional neural network architecture to learn temporal features of video actions, the unsupervised convolutional PCANet model is extended into (PCANet-TOP) which effectively learn spatiotemporal features from Three Orthogonal Planes (TOP). For each video sequence, spatial frames (XY) and temporal planes (XT and YT) are utilized to train three different PCANet models. Then, the learned features are fused after reducing their dimensionality using whitening PCA to obtain spatiotemporal feature representation of the action video. Finally, Support Vector Machine (SVM) classifier is applied for action classification process. The proposed method is evaluated on four benchmarks and well-known datasets, namely, Weizmann, KTH, UCF Sports, and YouTube action datasets. The recognition results show that the proposed PCANet-TOP provides discriminative and complementary features using three orthogonal planes and able to achieve promising and comparable results with state-of-the-art methods.

  相似文献   

8.
为有效地表征人体行为的时空特征,将骨骼特征通过Hough变换后建立人体的动作表示.具体来说,采用OpenPose获取视频帧人体骨骼关键点,之后构建骨骼关节并映射到Hough空间,将骨骼关节轨迹转换为点迹,然后角度和轨迹特征的FV(Fisher vector)编码融合作为线性SVM分类器的输入.在经典公共数据集KTH、Weizmann、KARD和Drone-Action上,实验结果表明Hough变换提升了特征的鲁棒性,提高了人体行为识别的性能.  相似文献   

9.
10.
This paper proposes a boosting EigenActions algorithm for human action recognition. A spatio-temporal Information Saliency Map (ISM) is calculated from a video sequence by estimating pixel density function. A continuous human action is segmented into a set of primitive periodic motion cycles from information saliency curve. Each cycle of motion is represented by a Salient Action Unit (SAU), which is used to determine the EigenAction using principle component analysis. A human action classifier is developed using multi-class Adaboost algorithm with Bayesian hypothesis as the weak classifier. Given a human action video sequence, the proposed method effectively locates the SAUs in the video, and recognizes the human actions by categorizing the SAUs. Two publicly available human action databases, namely KTH and Weizmann, are selected for evaluation. The average recognition accuracy are 81.5% and 98.3% for KTH and Weizmann databases, respectively. Comparative results with two recent methods and robustness test results are also reported.  相似文献   

11.
Machine based human action recognition has become very popular in the last decade. Automatic unattended surveillance systems, interactive video games, machine learning and robotics are only few of the areas that involve human action recognition. This paper examines the capability of a known transform, the so-called Trace, for human action recognition and proposes two new feature extraction methods based on the specific transform. The first method extracts Trace transforms from binarized silhouettes, representing different stages of a single action period. A final history template composed from the above transforms, represents the whole sequence containing much of the valuable spatio-temporal information contained in a human action. The second, involves Trace for the construction of a set of invariant features that represent the action sequence and can cope with variations usually appeared in video capturing. The specific method takes advantage of the natural specifications of the Trace transform, to produce noise robust features that are invariant to translation, rotation, scaling and are effective, simple and fast to create. Classification experiments performed on two well known and challenging action datasets (KTH and Weizmann) using Radial Basis Function (RBF) Kernel SVM provided very competitive results indicating the potentials of the proposed techniques.  相似文献   

12.
In this paper we propose a novel method for continuous visual event recognition (CVER) on a large scale video dataset using max-margin Hough transformation framework. Due to high scalability, diverse real environmental state and wide scene variability direct application of action recognition/detection methods such as spatio-temporal interest point (STIP)-local feature based technique, on the whole dataset is practically infeasible. To address this problem, we apply a motion region extraction technique which is based on motion segmentation and region clustering to identify possible candidate “event of interest” as a preprocessing step. On these candidate regions a STIP detector is applied and local motion features are computed. For activity representation we use generalized Hough transform framework where each feature point casts a weighted vote for possible activity class centre. A max-margin frame work is applied to learn the feature codebook weight. For activity detection, peaks in the Hough voting space are taken into account and initial event hypothesis is generated using the spatio-temporal information of the participating STIPs. For event recognition a verification Support Vector Machine is used. An extensive evaluation on benchmark large scale video surveillance dataset (VIRAT) and as well on a small scale benchmark dataset (MSR) shows that the proposed method is applicable on a wide range of continuous visual event recognition applications having extremely challenging conditions.  相似文献   

13.
Action recognition using 3D DAISY descriptor   总被引:1,自引:0,他引:1  
  相似文献   

14.
基于稀疏编码的时空金字塔匹配的动作识别   总被引:1,自引:0,他引:1  
针对复杂场景下的动作识别,提出一种基于稀疏编码的时空金字塔匹配的动作识别方法.通过稀疏编码的方法学习更具有判别性的码书和计算局部块(cuboids)的稀疏表示;然后基于max pooling的时空金字塔匹配进行动作分类.该方法在KTH和YouTube两大公开数据集上进行了评价,实验结果表明,与基于K-means的时空金字塔匹配方法相比,该方法提高了2%-7%左右的识别率,在复杂的视频中取得了较好的识别效果.  相似文献   

15.
This paper presents a novel approach for action recognition, localization and video matching based on a hierarchical codebook model of local spatio-temporal video volumes. Given a single example of an activity as a query video, the proposed method finds similar videos to the query in a target video dataset. The method is based on the bag of video words (BOV) representation and does not require prior knowledge about actions, background subtraction, motion estimation or tracking. It is also robust to spatial and temporal scale changes, as well as some deformations. The hierarchical algorithm codes a video as a compact set of spatio-temporal volumes, while considering their spatio-temporal compositions in order to account for spatial and temporal contextual information. This hierarchy is achieved by first constructing a codebook of spatio-temporal video volumes. Then a large contextual volume containing many spatio-temporal volumes (ensemble of volumes) is considered. These ensembles are used to construct a probabilistic model of video volumes and their spatio-temporal compositions. The algorithm was applied to three available video datasets for action recognition with different complexities (KTH, Weizmann, and MSR II) and the results were superior to other approaches, especially in the case of a single training example and cross-dataset1 action recognition.  相似文献   

16.
目的 人体行为识别是计算机视觉领域的一个重要研究课题,具有广泛的应用前景.针对局部时空特征和全局时空特征在行为识别问题中的局限性,提出一种新颖、有效的人体行为中层时空特征.方法 该特征通过描述视频中时空兴趣点邻域内局部特征的结构化分布,增强时空兴趣点的行为鉴别能力,同时,避免对人体行为的全局描述,能够灵活地适应行为的类内变化.使用互信息度量中层时空特征与行为类别的相关性,将视频识别为与之具有最大互信息的行为类别.结果 实验结果表明,本文的中层时空特征在行为识别准确率上优于基于局部时空特征的方法和其他方法,在KTH数据集和日常生活行为(ADL)数据集上分别达到了96.3%和98.0%的识别准确率.结论 本文的中层时空特征通过利用局部特征的时空分布信息,显著增强了行为鉴别能力,能够有效地识别多种复杂人体行为.  相似文献   

17.
18.
传统方法中的IDT,对于行为识别效果最好,但其中的光流计算太缓慢。将FlowNet2.0引入光流估计替代Farneback方法,在保持性能的同时,计算速度提高近7倍。LIBLINEAR的使用,提升了SVM的速度,在UCF-101、KTH、Weizmann数据集上均取得较好效果,IDT方法得到进一步优化。  相似文献   

19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号