结合时空注意力机制和自适应图卷积网络的骨架行为识别 |
| |
引用本文: | 张家想,刘如浩,金辰曦,卢先领. 结合时空注意力机制和自适应图卷积网络的骨架行为识别[J]. 信号处理, 2021, 37(7): 1226-1234. DOI: 10.16798/j.issn.1003-0530.2021.07.012 |
| |
作者姓名: | 张家想 刘如浩 金辰曦 卢先领 |
| |
作者单位: | 江南大学“轻工过程先进控制”教育部重点实验室 |
| |
基金项目: | 国家自然科学基金项目(61573167);教育部科技发展中心“云数融合科教创新”基金(2017A13055) |
| |
摘 要: | 针对骨架行为识别对时空特征提取不充分以及难以捕捉全局上下文信息的问题,研究了一种将时空注意力机制和自适应图卷积网络相结合的人体骨架行为识别方案.首先,构建基于非局部操作的时空注意力模块,辅助模型关注骨架序列中最具判别性的帧和区域;其次,利用高斯嵌入函数和轻量级卷积神经网络的特征学习能力,并考虑人体先验知识在不同时期的影...
|
关 键 词: | 人体骨架 行为识别 非局部块 注意力机制 图卷积网络 |
收稿时间: | 2021-03-16 |
Skeleton-Based Action Recognition on Spatio-Temporal Attention Mechanism and Adaptive Graph Convolutional Network |
| |
Affiliation: | Key Laboratory for Advanced Process Control for Light Industry of the Education Ministry of China, Jiangnan University |
| |
Abstract: | To solve the problem that skeleton behavior recognition can not extract spatio-temporal features sufficiently and it is difficult to capture global context information, a human skeleton behavior recognition scheme based on spatio-temporal attention mechanism and adaptive graph convolution network is studied. Firstly, a spatio-temporal attention module based on non-local operation is constructed to assist the model to focus on the most discriminative frames and regions in the skeleton sequence; secondly, an adaptive graph convolution network is constructed by using the feature learning ability of Gaussian embedding function and lightweight convolution neural network, and considering the effect of human prior knowledge in different time periods; finally, the adaptive graph convolution network is used as the basic framework, the spatio-temporal attention module is embedded to construct two-stream fusion model with joint information, bone information and their respective motion information. The accuracy of the algorithm is 90.2% and 96.2% respectively under the two evaluation standards of NTU RGB + D dataset. The universality of the model is reflected in the large-scale dataset Kinetics, which verifies that the algorithm is proved to be superior in extracting spatio-temporal features and capturing global context information. |
| |
Keywords: | |
|
| 点击此处可从《信号处理》浏览原始摘要信息 |
|
点击此处可从《信号处理》下载免费的PDF全文 |
|