首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 375 毫秒
1.
2.
深度学习在人物动作识别方面已取得较好的成效,但当前仍然需要充分利用视频中人物的外形信息和运动信息。为利用视频中的空间信息和时间信息来识别人物行为动作,提出一种时空双流视频人物动作识别模型。该模型首先利用两个卷积神经网络分别抽取视频动作片段空间和时间特征,接着融合这两个卷积神经网络并提取中层时空特征,最后将提取的中层特征输入到3D卷积神经网络来完成视频中人物动作的识别。在数据集UCF101和HMDB51上,进行视频人物动作识别实验。实验结果表明,所提出的基于时空双流的3D卷积神经网络模型能够有效地识别视频人物动作。  相似文献   

3.
This paper proposes a new examplar-based method for real-time human motion recognition using Motion Capture (MoCap) data. We have formalized streamed recognizable actions, coming from an online MoCap engine, into a motion graph that is similar to an animation motion graph. This graph is used as an automaton to recognize known actions as well as to add new ones. We have defined and used a spatio-temporal metric for similarity measurements to achieve more accurate feedbacks on classification. The proposed method has the advantage of being linear and incremental, making the recognition process very fast and the addition of a new action straightforward. Furthermore, actions can be recognized with a score even before they are fully completed. Thanks to the use of a skeleton-centric coordinate system, our recognition method has become view-invariant. We have successfully tested our action recognition method on both synthetic and real data. We have also compared our results with four state-of-the-art methods using three well known datasets for human action recognition. In particular, the comparisons have clearly shown the advantage of our method through better recognition rates.  相似文献   

4.
提出了一种基于张量子空间学习降维人体高维侧影数据的人行为识别方法。给定一个动作的人侧影图像序列,首先用张量子空间学习方法将目标高维侧影图像投影到低维子空间来描述人运动的时空特性,并同时尽可能地保持目标侧影图像中像素之间的空间几何信息,然后用Hausdorff距离度量动作之间的相似性,并在最近邻距离框架下对动作进行分类识别。为验证本文算法的有效性,设计了动作识别和鲁棒性测试2个实验。实验结果表明提出的算法不仅能够有效地对人行为进行识别,且具有较强的鲁棒性。  相似文献   

5.
目的 人体行为识别在视频监控、环境辅助生活、人机交互和智能驾驶等领域展现出了极其广泛的应用前景。由于目标物体遮挡、视频背景阴影、光照变化、视角变化、多尺度变化、人的衣服和外观变化等问题,使得对视频的处理与分析变得非常困难。为此,本文利用时间序列正反演构造基于张量的线性动态模型,估计模型的参数作为动作序列描述符,构造更加完备的观测矩阵。方法 首先从深度图像提取人体关节点,建立张量形式的人体骨骼正反向序列。然后利用基于张量的线性动态系统和Tucker分解学习参数元组(AF,AI,C),其中C表示人体骨架信息的空间信息,AFAI分别描述正向和反向时间序列的动态性。通过参数元组构造观测矩阵,一个动作就可以表示为观测矩阵的子空间,对应着格拉斯曼流形上的一点。最后通过在格拉斯曼流形上进行字典学习和稀疏编码完成动作识别。结果 实验结果表明,在MSR-Action 3D数据集上,该算法比Eigenjoints算法高13.55%,比局部切从支持向量机(LTBSVM)算法高2.79%,比基于张量的线性动态系统(tLDS)算法高1%。在UT-Kinect数据集上,该算法的行为识别率比LTBSVM算法高5.8%,比tLDS算法高1.3%。结论 通过大量实验评估,验证了基于时间序列正反演构造出来的tLDS模型很好地解决了上述问题,提高了人体动作识别率。  相似文献   

6.
提出一种高效的人体动作识别方法。通过帧间差分法将深度序列的三视图转化为深度运动轮廓序列(DMOS),然后利用时空金字塔对DMOS进行时间维和空间维细分,将细分后得到的空间网格的局部方向梯度直方图(HOG)进行特征融合,并使用线性SVM分类。最后采用MSR Action 3D数据集对提出的算法在不同时空金字塔参数下的识别率和处理速度进行了评估,结果表明该方法在同类算法中具有更高的识别率。  相似文献   

7.
8.
This paper presents a novel approach for action recognition, localization and video matching based on a hierarchical codebook model of local spatio-temporal video volumes. Given a single example of an activity as a query video, the proposed method finds similar videos to the query in a target video dataset. The method is based on the bag of video words (BOV) representation and does not require prior knowledge about actions, background subtraction, motion estimation or tracking. It is also robust to spatial and temporal scale changes, as well as some deformations. The hierarchical algorithm codes a video as a compact set of spatio-temporal volumes, while considering their spatio-temporal compositions in order to account for spatial and temporal contextual information. This hierarchy is achieved by first constructing a codebook of spatio-temporal video volumes. Then a large contextual volume containing many spatio-temporal volumes (ensemble of volumes) is considered. These ensembles are used to construct a probabilistic model of video volumes and their spatio-temporal compositions. The algorithm was applied to three available video datasets for action recognition with different complexities (KTH, Weizmann, and MSR II) and the results were superior to other approaches, especially in the case of a single training example and cross-dataset1 action recognition.  相似文献   

9.
Recognizing human actions from unconstrained videos turns to be a major challenging task in computer visualization approaches due to decreased accuracy in the feature classification performance. Therefore to improve the classification performance it is essential to minimize the ‘classification’ errors. Here, in this work, we propose a hybrid CNN-GWO approach for the recognition of human actions from the unconstrained videos. The weight initializations for the proposed deep Convolutional Neural Network (CNN) classifiers highly depend on the generated solutions of GWO (Grey Wolf Optimization) algorithm, which in turn minimizes the ‘classification’ errors. The action bank and local spatio-temporal features are generated for a video and fed into the ‘CNN’ classifiers. The ‘CNN’ classifiers are trained by a gradient descent algorithm to detect a ‘local minimum’ during the fitness computation of GWO ‘search agents’. The GWO algorithms ‘global search’ capability as well as the gradient descent algorithms ‘local search’ capabilities are subjected for the identification of a solution which is nearer to the global optimum. Finally, the classification performance can be further enhanced by fusing the classifiers evidences produced by the GWO algorithm. The proposed classification frameworks efficiency for the recognition of human actions is evaluated with the help of four achievable action recognition datasets namely HMDB51, UCF50, Olympic Sports and Virat Release 2.0. The experimental validation of our proposed approach shows better achievable results on the recognition of human actions with 99.9% recognition accuracy.  相似文献   

10.
人体动作的超兴趣点特征表述及识别   总被引:1,自引:0,他引:1       下载免费PDF全文
提出一种基于超兴趣点的动作特征描述方法,用于人体动作的识别。兴趣点特征描述了人体动作时变化显著的局部点信息,但其最大的缺陷在于离散的兴趣点间缺乏时间和空间上的结构关联。提出根据兴趣点间的时空距离,使用广度优先搜索邻居算法,将时空距离相近的兴趣点聚合成超兴趣点,该结构作为一个整体,反映人肢体在一定时空范围内的动作变化特征。与现有的基于局部兴趣点的动作识别算法相比,本文算法增加了兴趣点间的整体时空结构关系,提高了特征的区分度。实验采用两层分类方法对超兴趣点特征分类,实验结果表明该算法具有较好的识别率。  相似文献   

11.
目的 在行为识别任务中,妥善利用时空建模与通道之间的相关性对于捕获丰富的动作信息至关重要。尽管图卷积网络在基于骨架信息的行为识别方面取得了稳步进展,但以往的注意力机制应用于图卷积网络时,其分类效果并未获得明显提升。基于兼顾时空交互与通道依赖关系的重要性,提出了多维特征嵌合注意力机制(multi-dimensional feature fusion attention mechanism,M2FA)。方法 不同于现今广泛应用的行为识别框架研究理念,如卷积块注意力模块(convolutional block attention module,CBAM)、双流自适应图卷积网络(two-stream adaptive graph convolutional network,2s-AGCN)等,M2FA通过嵌入在注意力机制框架中的特征融合模块显式地获取综合依赖信息。对于给定的特征图,M2FA沿着空间、时间和通道维度使用全局平均池化操作推断相应维度的特征描述符。特征图使用多维特征描述符的融合结果进行过滤学习以达到细化自适应特征的目的,并通过压缩全局动态信息的全局特征分支与仅使用逐点卷积层的局部特征分支相互嵌合获取多尺度动态信息。结果 实验在骨架行为识别数据集NTU-RGBD和Kinetics-Skeleton中进行,分析了M2FA与其基线方法2s-AGCN及最新提出的图卷积模型之间的识别准确率对比结果。在Kinetics-Skeleton验证集中,相比于基线方法2s-AGCN,M2FA分类准确率提高了1.8%;在NTU-RGBD的两个不同基准分支中,M2FA的分类准确率比基线方法2s-AGCN分别提高了1.6%和1.0%。同时,消融实验验证了多维特征嵌合机制的有效性。实验结果表明,提出的M2FA改善了图卷积骨架行为识别方法的分类效果。结论 通过与基线方法2s-AGCN及目前主流图卷积模型比较,多维特征嵌合注意力机制获得了最高的识别精度,可以集成至基于骨架信息的体系结构中进行端到端的训练,使分类结果更加准确。  相似文献   

12.
针对现有的动作识别算法的特征提取复杂、识别率低等问题,提出了基于批归一化变换(batch normalization)与GoogLeNet网络模型相结合的网络结构,将图像分类领域的批归一化思想应用到动作识别领域中进行训练算法改进,实现了对视频动作训练样本的网络输入进行微批量(mini-batch)归一化处理。该方法以RGB图像作为空间网络的输入,光流场作为时间网络输入,然后融合时空网络得到最终动作识别结果。在UCF101和HMDB51数据集上进行实验,分别取得了93.50%和68.32%的准确率。实验结果表明,改进的网络架构在视频人体动作识别问题上具有较高的识别准确率。  相似文献   

13.
针对传统人体动作识别算法,往往重点解决某一类行为识别,不具有通用性的问题,提出一种局部证据RBF人体行为高层特征自相似融合识别算法。首先,借用随时间变化的广义自相似性概念,利用时空兴趣点光流场局部特征提取方法,构建基于自相似矩阵的人体行为局部特征描述;其次,在使用SVM算法进行独立个体行为识别后,利用所提出的证据理论RBF(Radial Basis Function)高层特征融合,实现分类结构优化,从而提高分类准确度;仿真实验表明,所提方案能够明显提高人体行为识别算法效率和识别准确率。  相似文献   

14.
吴峰  王颖 《计算机应用》2017,37(8):2240-2243
针对词袋(BoW)模型方法基于信息增益的视觉词典建立方法未考虑词频对动作识别的影响,为提高动作识别准确率,提出了基于改进信息增益建立视觉词典的方法。首先,基于3D Harris提取人体动作视频时空兴趣点并利用K均值聚类建立初始视觉词典;然后引入类内词频集中度和类间词频分散度改进信息增益,计算初始词典中词汇的改进信息增益,选择改进信息增益大的视觉词汇建立新的视觉词典;最后基于支持向量机(SVM)采用改进信息增益建立的视觉词典进行人体动作识别。采用KTH和Weizmann人体动作数据库进行实验验证。相比传统信息增益,两个数据库利用改进信息增益建立的视觉词典动作识别准确率分别提高了1.67%和3.45%。实验结果表明,提出的基于改进信息增益的视觉词典建立方法能够选择动作识别能力强的视觉词汇,提高动作识别准确率。  相似文献   

15.
Jiang  Guanghao  Jiang  Xiaoyan  Fang  Zhijun  Chen  Shanshan 《Applied Intelligence》2021,51(10):7043-7057

Due to illumination changes, varying postures, and occlusion, accurately recognizing actions in videos is still a challenging task. A three-dimensional convolutional neural network (3D CNN), which can simultaneously extract spatio-temporal features from sequences, is one of the mainstream models for action recognition. However, most of the existing 3D CNN models ignore the importance of individual frames and spatial regions when recognizing actions. To address this problem, we propose an efficient attention module (EAM) that contains two sub-modules, that is, a spatial efficient attention module (EAM-S) and a temporal efficient attention module (EAM-T). Specifically, without dimensionality reduction, EAM-S concentrates on mining category-based correlation by local cross-channel interaction and assigns high weights to important image regions, while EAM-T estimates the importance score of different frames by cross-frame interaction between each frame and its neighbors. The proposed EAM module is lightweight yet effective, and it can be easily embedded into 3D CNN-based action recognition models. Extensive experiments on the challenging HMDB-51 and UCF-101 datasets showed that our proposed module achieves state-of-the-art performance and can significantly improve the recognition accuracy of 3D CNN-based action recognition methods.

  相似文献   

16.
针对已有动作识别算法训练速度慢且识别精度不高等问题,提出了基于稀疏编码局部时空描述子的动作识别方法。该方法首先对深度图像进行法线提取,同时应用基于运动能量的自适应时空金字塔对动作帧分块;然后局部聚集法线,得到显著性局部时空描述子;对局部时空描述子进行稀疏编码得到一组字典向量来重构样本数据;最后利用简化粒子群(sPSO)优化SVM分类器找到最适合样本数据的分类模型。实验在MSRAction3D和MSRGesture3D公开数据集上达到了93.80%和95.83%的识别率,且训练速度较传统方法有明显提升,证明了该方法的有效性和鲁棒性。  相似文献   

17.
针对常规的卷积神经网络时空感受野尺度单一,难以提取视频中多变的时空信息的问题,利用(2+1)D模型将时间信息和空间信息在一定程度上解耦的特性,提出了(2+1)D多时空信息融合的卷积残差神经网络,并用于人体行为识别.该模型以3×3空间感受野为主,1×1空间感受野为辅,与3种不同时域感受野交叉组合构建了6种不同尺度的时空感受野.提出的多时空感受野融合模型能够同时获取不同尺度的时空信息,提取更丰富的人体行为特征,因此能够更有效识别不同时间周期、不同动作幅度的人体行为.另外提出了一种视频时序扩充方法,该方法能够同时在空间信息和时间序列扩充视频数据集,丰富训练样本.提出的方法在公共视频人体行为数据集UCF101和HMDB51上子视频的识别率超过或接近最新的视频行为识别方法.  相似文献   

18.
This paper proposes a novel method based on Spectral Regression (SR) for efficient scene recognition. First, a new SR approach, called Extended Spectral Regression (ESR), is proposed to perform manifold learning on a huge number of data samples. Then, an efficient Bag-of-Words (BOW) based method is developed which employs ESR to encapsulate local visual features with their semantic, spatial, scale, and orientation information for scene recognition. In many applications, such as image classification and multimedia analysis, there are a huge number of low-level feature samples in a training set. It prohibits direct application of SR to perform manifold learning on such dataset. In ESR, we first group the samples into tiny clusters, and then devise an approach to reduce the size of the similarity matrix for graph learning. In this way, the subspace learning on graph Laplacian for a vast dataset is computationally feasible on a personal computer. In the ESR-based scene recognition, we first propose an enhanced low-level feature representation which combines the scale, orientation, spatial position, and local appearance of a local feature. Then, ESR is applied to embed enhanced low-level image features. The ESR-based feature embedding not only generates a low dimension feature representation but also integrates various aspects of low-level features into the compact representation. The bag-of-words is then generated from the embedded features for image classification. The comparative experiments on open benchmark datasets for scene recognition demonstrate that the proposed method outperforms baseline approaches. It is suitable for real-time applications on mobile platforms, e.g. tablets and smart phones.  相似文献   

19.
20.
人体行为识别中的一个关键问题是如何表示高维的人体动作和构建精确稳定的人体分类模型.文中提出有效的基于混合特征的人体行为识别算法.该算法融合基于外观结构的人体重要关节点极坐标特征和基于光流的运动特征,可更有效获取视频序列中的运动信息,提高识别即时性.同时提出基于帧的选择性集成旋转森林分类模型(SERF),有效地将选择性集成策略融入到旋转森林基分类器的选择中,从而增加基分类器之间的差异性.实验表明SERF模型具有较高的分类精度和较强的鲁棒性.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号