首页 | 本学科首页   官方微博 | 高级检索  
     

基于时空注意力图卷积网络模型的人体骨架动作识别算法
引用本文:李扬志,袁家政,刘宏哲.基于时空注意力图卷积网络模型的人体骨架动作识别算法[J].计算机应用,2021,41(7):1915-1921.
作者姓名:李扬志  袁家政  刘宏哲
作者单位:1. 北京市信息服务工程重点实验室(北京联合大学), 北京 100101;2. 北京开放大学 科研外事处, 北京 100081
基金项目:国家自然科学资助基金项目(61871028,61871039,61906017,61802019);北京联合大学领军人才项目(BPHR2019AZ01);北京市教委项目(KM202111417001,KM201911417001);北京联合大学研究生科研创新项目(YZ2020K001)。
摘    要:针对现有的人体骨架动作识别算法不能充分发掘运动的时空特征问题,提出一种基于时空注意力图卷积网络(STA-GCN)模型的人体骨架动作识别算法。该模型包含空间注意力机制和时间注意力机制:空间注意力机制一方面利用光流特征中的瞬时运动信息定位运动显著的空间区域,另一方面在训练过程中引入全局平均池化及辅助分类损失使得该模型可以关注到具有判别力的非运动区域;时间注意力机制则自动地从长时复杂视频中挖掘出具有判别力的时域片段。将这二者融合到统一的图卷积网络(GCN)框架中,实现了端到端的训练。在Kinetics和NTU RGB+D两个公开数据集的对比实验结果表明,基于STA-GCN模型的人体骨架动作识别算法具有很强的鲁棒性与稳定性,与基于时空图卷积网络(ST-GCN)模型的识别算法相比,在Kinetics数据集上的Top-1和Top-5分别提升5.0和4.5个百分点,在NTURGB+D数据集的CS和CV上的Top-1分别提升6.2和6.7个百分点;也优于当前行为识别领域最先进(SOA)方法,如Res-TCN、STA-LSTM和动作-结构图卷积网络(AS-GCN)。结果表示,所提算法可以更好地满足人体行为识别的实际应用需求。

关 键 词:图卷积网络  人体骨架行为识别  注意力机制  人体关节点  视频行为理解  
收稿时间:2020-09-29
修稿时间:2020-12-17

Human skeleton-based action recognition algorithm based on spatiotemporal attention graph convolutional network model
LI Yangzhi,YUAN Jiazheng,LIU Hongzhe.Human skeleton-based action recognition algorithm based on spatiotemporal attention graph convolutional network model[J].journal of Computer Applications,2021,41(7):1915-1921.
Authors:LI Yangzhi  YUAN Jiazheng  LIU Hongzhe
Affiliation:1. Beijing Key Laboratory of Information Service Engineering(Beijing Union University), Beijing 100101, China;2. Department of Scientific Research and Foreign Affairs, Beijing Open University, Beijing 100081, China
Abstract:Aiming at the problem that the existing human skeleton-based action recognition algorithms cannot fully explore the temporal and spatial characteristics of motion, a human skeleton-based action recognition algorithm based on Spatiotemporal Attention Graph Convolutional Network (STA-GCN) model was proposed, which consisted of spatial attention mechanism and temporal attention mechanism. The spatial attention mechanism used the instantaneous motion information of the optical flow features to locate the spatial regions with significant motion on the one hand, and introduced the global average pooling and auxiliary classification loss during the training process to enable the model to focus on the non-motion regions with discriminability ability on the other hand. While the temporal attention mechanism automatically extracted the discriminative time-domain segments from the long-term complex video. Both of spatial and temporal attention mechanisms were integrated into a unified Graph Convolution Network (GCN) framework to enable the end-to-end training. Experimental results on Kinetics and NTU RGB+D datasets show that the proposed algorithm based on STA-GCN has strong robustness and stability, and compared with the benchmark algorithm based on Spatial Temporal Graph Convolutional Network (ST-GCN) model, the Top-1 and Top-5 on Kinetics are improved by 5.0 and 4.5 percentage points, respectively, and the Top-1 on CS and CV of NTU RGB+D dataset are also improved by 6.2 and 6.7 percentage points, respectively; it also outperforms the current State-Of-the-Art (SOA) methods in action recognition, such as Res-TCN (Residue Temporal Convolutional Network), STA-LSTM, and AS-GCN (Actional-Structural Graph Convolutional Network). The results indicate that the proposed algorithm can better meet the practical application requirements of human action recognition.
Keywords:Graph Convolutional Network (GCN)  human skeleton-based action recognition  attention mechanism  human joint  video behavior understanding  
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号