首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 468 毫秒
1.
Recognizing human interactions is a more challenging task than recognizing single person activities and has attracted much attention of the computer vision community. This paper proposes an innovative and effective way to recognize human interactions, which incorporates the advantages of both global motion context (MC) feature and spatio-temporal (S-T) correlation of local spatio-temporal interest point feature. The MC feature is used to train a random forest where genetic algorithm (GA) is applied to the training phase to achieve a good compromise between reliability and efficiency. Besides, we propose S-T correlation-based match, where MC’s structure and Needleman–Wunsch algorithm are used to calculate the spatial and temporal correlation score of two videos, respectively. Experiments on the UT-Interaction dataset show that our approaches outperform other prevalent machine learning methods, and that the combination of GA search-based random forest and S-T correlation achieves the state-of-the-art performance.  相似文献   

2.
3.
4.
目的 人体骨架的动态变化对于动作识别具有重要意义。从关节轨迹的角度出发,部分对动作类别判定具有价值的关节轨迹传达了最重要的信息。在同一动作的每次尝试中,相应关节的轨迹一般具有相似的基本形状,但其具体形式会受到一定的畸变影响。基于对畸变因素的分析,将人体运动中关节轨迹的常见变换建模为时空双仿射变换。方法 首先用一个统一的表达式以内外变换的形式将时空双仿射变换进行描述。基于变换前后轨迹曲线的微分关系推导设计了双仿射微分不变量,用于描述关节轨迹的局部属性。基于微分不变量和关节坐标在数据结构上的同构特点,提出了一种通道增强方法,使用微分不变量将输入数据沿通道维度扩展后,输入神经网络进行训练与评估,用于提高神经网络的泛化能力。结果 实验在两个大型动作识别数据集NTU(Nanyang Technological University)RGB+D(NTU 60)和NTU RGB+D 120(NTU 120)上与若干最新方法及两种基线方法进行比较,在两种实验设置(跨参与者识别与跨视角识别)中均取得了明显的改进结果。相比于使用原始数据的时空图神经卷积网络(spatio-temporal graph convolutional networks,ST-GCN),在NTU 60数据集中,跨参与者与跨视角的识别准确率分别提高了1.9%和3.0%;在NTU 120数据集中,跨参与者与跨环境的识别准确率分别提高了5.6%和4.5%。同时对比于数据增强,基于不变特征的通道增强方法在两种实验设置下都能有明显改善,更为有效地提升了网络的泛化能力。结论 本文提出的不变特征与通道增强,直观有效地综合了传统特征和深度学习的优点,有效提高了骨架动作识别的准确性,改善了神经网络的泛化能力。  相似文献   

5.
目的 在行为识别任务中,妥善利用时空建模与通道之间的相关性对于捕获丰富的动作信息至关重要。尽管图卷积网络在基于骨架信息的行为识别方面取得了稳步进展,但以往的注意力机制应用于图卷积网络时,其分类效果并未获得明显提升。基于兼顾时空交互与通道依赖关系的重要性,提出了多维特征嵌合注意力机制(multi-dimensional feature fusion attention mechanism, M2FA)。方法 不同于现今广泛应用的行为识别框架研究理念,如卷积块注意力模块(convolutional block attention module, CBAM)、双流自适应图卷积网络(two-stream adaptive graph convolutional network, 2s-AGCN)等,M2FA通过嵌入在注意力机制框架中的特征融合模块显式地获取综合依赖信息。对于给定的特征图,M2FA沿着空间、时间和通道维度使用全局平均池化操作推断相应维度的特征描述符。特征图使用多维特征描述符的融合结果进行过滤学习以达到细化自适应特征的目的,并通过压缩全局动态信息的全局特征分支与仅使用逐点卷积层的局...  相似文献   

6.
Human action classification is fundamental technology for robots that have to interpret a human’s intended actions and make appropriate responses, as they will have to do if they are to be integrated into our daily lives. Improved measurement of human motion, using an optical motion capture system or a depth sensor, allows robots to recognize human actions from superficial motion data, such as camera images containing human actions or positions of human bodies. But existing technology for motion recognition does not handle the contact force that always exists between the human and the environment that the human is acting upon. More specifically, humans perform feasible actions by controlling not only their posture but also the contact forces. Furthermore these contact forces require appropriate muscle tensions in the full body. These muscle tensions or activities are expected to be useful for robots observing human actions to estimate the human’s somatosensory states and consequently understand the intended action. This paper proposes a novel approach to classifying human actions using only the activities of all the muscles in the human body. Continuous spatio-temporal data of the activity of an individual muscle is encoded into a discrete hidden Markov model (HMM), and the set of HMMs for all the muscles forms a classifier for the specific action. Our classifiers were tested on muscle activities estimated from captured human motions, electromyography data, and reaction forces. The results demonstrate their superiority over commonly used HMM-based classifiers.  相似文献   

7.
石祥滨  李怡颖  刘芳  代钦 《计算机应用研究》2021,38(4):1235-1239,1276
针对双流法进行视频动作识别时忽略特征通道间的相互联系、特征存在大量冗余的时空信息等问题,提出一种基于双流时空注意力机制的端到端的动作识别模型T-STAM,实现了对视频关键时空信息的充分利用。首先,将通道注意力机制引入到双流基础网络中,通过对特征通道间的依赖关系进行建模来校准通道信息,提高特征的表达能力。其次,提出一种基于CNN的时间注意力模型,使用较少的参数学习每帧的注意力得分,重点关注运动幅度明显的帧。同时提出一种多空间注意力模型,从不同角度计算每帧中各个位置的注意力得分,提取多个运动显著区域,并且对时空特征进行融合进一步增强视频的特征表示。最后,将融合后的特征输入到分类网络,按不同权重融合两流输出得到动作识别结果。在数据集HMDB51和UCF101上的实验结果表明T-STAM能有效地识别视频中的动作。  相似文献   

8.
Temporal dependency is a very important cue for modeling human actions. However, approaches using latent topics models, e.g., probabilistic latent semantic analysis (pLSA), employ the bag of words assumption therefore word dependencies are usually ignored. In this work, we propose a new approach structural pLSA (SpLSA) to model explicitly word orders by introducing latent variables. More specifically, we develop an action categorization approach that learns action representations as the distribution of latent topics in an unsupervised way, where each action frame is characterized by a codebook representation of local shape context. The effectiveness of this approach is evaluated using both the WEIZMANN dataset and the MIT dataset. Results show that the proposed approach outperforms the standard pLSA. Additionally, our approach is compared favorably with six existing models including GMM, logistic regression, HMM, SVM, CRF, and HCRF given the same feature representation. These comparative results show that our approach achieves higher categorization accuracy than the five existing models and is comparable to the state-of-the-art hidden conditional random field based model using the same feature set.  相似文献   

9.
10.
深度学习在人物动作识别方面已取得较好的成效,但当前仍然需要充分利用视频中人物的外形信息和运动信息。为利用视频中的空间信息和时间信息来识别人物行为动作,提出一种时空双流视频人物动作识别模型。该模型首先利用两个卷积神经网络分别抽取视频动作片段空间和时间特征,接着融合这两个卷积神经网络并提取中层时空特征,最后将提取的中层特征输入到3D卷积神经网络来完成视频中人物动作的识别。在数据集UCF101和HMDB51上,进行视频人物动作识别实验。实验结果表明,所提出的基于时空双流的3D卷积神经网络模型能够有效地识别视频人物动作。  相似文献   

11.
人体动作的超兴趣点特征表述及识别   总被引:1,自引:0,他引:1       下载免费PDF全文
提出一种基于超兴趣点的动作特征描述方法,用于人体动作的识别。兴趣点特征描述了人体动作时变化显著的局部点信息,但其最大的缺陷在于离散的兴趣点间缺乏时间和空间上的结构关联。提出根据兴趣点间的时空距离,使用广度优先搜索邻居算法,将时空距离相近的兴趣点聚合成超兴趣点,该结构作为一个整体,反映人肢体在一定时空范围内的动作变化特征。与现有的基于局部兴趣点的动作识别算法相比,本文算法增加了兴趣点间的整体时空结构关系,提高了特征的区分度。实验采用两层分类方法对超兴趣点特征分类,实验结果表明该算法具有较好的识别率。  相似文献   

12.
群体行为的多层次深度分析是行为识别领域亟待解决的重要问题。在深度神经网络研究的基础上,提出了群体行为识别的层级性分析模型。基于调控网络的迁移学习,实现了行为群体中多人体的时序一致性检测;通过融合时空特征学习,完成了群体行为中时长无约束的个体行为识别;通过场景中个体行为类别、交互场景上下文信息的融合,实现了对群体行为稳定有效的识别。在公用数据集上进行的大量实验表明,与现有方法相比,该模型在群体行为分析识别方面具有良好的效果。  相似文献   

13.
提出一种通过检测人体行为动作产生的静电信号进行人体动作识别的方法.在分析人体荷电特性的基础上,设计静电信号检测系统采集被测人员的5种典型动作(行走、踏步、坐下、拿取物品、挥手)的静电感应信号.对采集的5种动作的静电信号进行特征参量提取和显著性差异分析,优化用于分类的特征参数.基于Weka平台使用3种分类算法(支持向量机、决策树C4.5和随机森林)分别对采集到的250组样本数据通过10折交叉验证进行了分类识别,结果显示随机森林算法的识别效果最好,正确率可达99.6%.研究表明本文提出的单人环境下基于人体静电信号的动作分类识别方法能够有效地对典型人体动作进行识别.  相似文献   

14.
We propose a novel approach to model spatio-temporal distribution of local features for action recognition in videos. The proposed approach is based on the Lie Algebrized Gaussians (LAG) which is a feature aggregation approach and yields high-dimensional video signature. In the framework of LAG, local features extracted from a video are aggregated to train a video-specific Gaussian Mixture Model (GMM). Then the video-specific GMM is encoded as a vector based on Lie group theory and this step is also referred to as GMM vectorization. As the video-specific GMM gives a soft partition of the feature space, for each cell of the feature space (i.e. each Gaussian component), we use a GMM to model the spatio-temporal locations of the local features assigned to the Gaussian component. The location GMMs are encoded as vectors just like the local feature GMM. We term those vectors of location GMMs spatio-temporal LAG (STLAG). In addition, although the LAG and the popular Fisher Vector (FV) are derived from distinct theory perspectives, we find that they are closely related. Hence the power and ?2 normalization proposed for the FV are also beneficial to the LAG. Experimental results show that STLAG is very effective to model spatio-temporal layout compared with other techniques such as spatio-temporal pyramid and feature augmentation. Using the state-of-the-art dense trajectory features, our approach achieves state-of-the-art performance on two challenging datasets: Hollywood2 and HMDB51.  相似文献   

15.
This paper proposes a new examplar-based method for real-time human motion recognition using Motion Capture (MoCap) data. We have formalized streamed recognizable actions, coming from an online MoCap engine, into a motion graph that is similar to an animation motion graph. This graph is used as an automaton to recognize known actions as well as to add new ones. We have defined and used a spatio-temporal metric for similarity measurements to achieve more accurate feedbacks on classification. The proposed method has the advantage of being linear and incremental, making the recognition process very fast and the addition of a new action straightforward. Furthermore, actions can be recognized with a score even before they are fully completed. Thanks to the use of a skeleton-centric coordinate system, our recognition method has become view-invariant. We have successfully tested our action recognition method on both synthetic and real data. We have also compared our results with four state-of-the-art methods using three well known datasets for human action recognition. In particular, the comparisons have clearly shown the advantage of our method through better recognition rates.  相似文献   

16.
Action recognition on large categories of unconstrained videos taken from the web is a very challenging problem compared to datasets like KTH (6 actions), IXMAS (13 actions), and Weizmann (10 actions). Challenges like camera motion, different viewpoints, large interclass variations, cluttered background, occlusions, bad illumination conditions, and poor quality of web videos cause the majority of the state-of-the-art action recognition approaches to fail. Also, an increased number of categories and the inclusion of actions with high confusion add to the challenges. In this paper, we propose using the scene context information obtained from moving and stationary pixels in the key frames, in conjunction with motion features, to solve the action recognition problem on a large (50 actions) dataset with videos from the web. We perform a combination of early and late fusion on multiple features to handle the very large number of categories. We demonstrate that scene context is a very important feature to perform action recognition on very large datasets. The proposed method does not require any kind of video stabilization, person detection, or tracking and pruning of features. Our approach gives good performance on a large number of action categories; it has been tested on the UCF50 dataset with 50 action categories, which is an extension of the UCF YouTube Action (UCF11) dataset containing 11 action categories. We also tested our approach on the KTH and HMDB51 datasets for comparison.  相似文献   

17.
18.
An efficient sparse modeling pipeline for the classification of human actions from video is here developed. Spatio-temporal features that characterize local changes in the image are first extracted. This is followed by the learning of a class-structured dictionary encoding the individual actions of interest. Classification is then based on reconstruction, where the label assigned to each video comes from the optimal sparse linear combination of the learned basis vectors (action primitives) representing the actions. A low computational cost deep-layer model learning the inter-class correlations of the data is added for increasing discriminative power. In spite of its simplicity and low computational cost, the method outperforms previously reported results for virtually all standard datasets.  相似文献   

19.
提出一种双人交互行为的稀疏表征方法,融合体现全局变化的轨迹特征和突出区域运动的时空特征。首先,采用词袋模型得到轨迹特征的稀疏表示。然后,对提取的时空特征采用三层时空金字塔分解得到多层特征,并对其进行稀疏编码,利用多尺度Maxpooling算法融合得到局部稀疏特征。最后,将2种稀疏特征加权串联得到双人交互行为的稀疏表征。采用基于隐动态条件随机场的识别算法对文中提取的稀疏表征进行验证,通过实验证明其有效性。  相似文献   

20.

The main principle of diagnostic pathology is the reliable interpretation of individual cells in context of the tissue architecture. Especially a confident examination of bone marrow specimen is dependent on a valid classification of myeloid cells. In this work, we propose a novel rotation-invariant learning scheme for multi-class echo state networks (ESNs), which achieves very high performance in automated bone marrow cell classification. Based on representing static images as temporal sequence of rotations, we show how ESNs robustly recognize cells of arbitrary rotations by taking advantage of their short-term memory capacity. The performance of our approach is compared to a classification random forest that learns rotation-invariance in a conventional way by exhaustively training on multiple rotations of individual samples. The methods were evaluated on a human bone marrow image database consisting of granulopoietic and erythropoietic cells in different maturation stages. Our ESN approach to cell classification does not rely on segmentation of cells or manual feature extraction and can therefore directly be applied to image data.

  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号