结合时空注意力机制和自适应图卷积网络的骨架行为识别 Skeleton-Based Action Recognition on Spatio-Temporal Attention Mechanism and Adaptive Graph Convolutional Network期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

结合时空注意力机制和自适应图卷积网络的骨架行为识别

引用本文：	张家想,刘如浩,金辰曦,卢先领.结合时空注意力机制和自适应图卷积网络的骨架行为识别[J].信号处理,2021,37(7):1226-1234.

作者姓名：	张家想刘如浩金辰曦卢先领

作者单位：	江南大学“轻工过程先进控制”教育部重点实验室

基金项目：	国家自然科学基金项目(61573167)；教育部科技发展中心“云数融合科教创新”基金(2017A13055)

摘要：	针对骨架行为识别对时空特征提取不充分以及难以捕捉全局上下文信息的问题，研究了一种将时空注意力机制和自适应图卷积网络相结合的人体骨架行为识别方案。首先，构建基于非局部操作的时空注意力模块，辅助模型关注骨架序列中最具判别性的帧和区域；其次，利用高斯嵌入函数和轻量级卷积神经网络的特征学习能力，并考虑人体先验知识在不同时期的影响，构建自适应图卷积网络；最后，将自适应图卷积网络作为基本框架，并嵌入时空注意力模块，与关节信息、骨骼信息以及各自的运动信息构建双流融合模型。该算法在NTU RGB+D数据集的两种评价标准下分别达到了90.2%和96.2%的准确率，在大规模的数据集Kinetics上体现出模型的通用性，验证了该算法在提取时空特征和捕捉全局上下文信息上的优越性。
关键词：	人体骨架行为识别非局部块注意力机制图卷积网络
收稿时间：	2021-03-16
Skeleton-Based Action Recognition on Spatio-Temporal Attention Mechanism and Adaptive Graph Convolutional Network

Affiliation:	Key Laboratory for Advanced Process Control for Light Industry of the Education Ministry of China, Jiangnan University

Abstract:	To solve the problem that skeleton behavior recognition can not extract spatio-temporal features sufficiently and it is difficult to capture global context information, a human skeleton behavior recognition scheme based on spatio-temporal attention mechanism and adaptive graph convolution network is studied. Firstly, a spatio-temporal attention module based on non-local operation is constructed to assist the model to focus on the most discriminative frames and regions in the skeleton sequence; secondly, an adaptive graph convolution network is constructed by using the feature learning ability of Gaussian embedding function and lightweight convolution neural network, and considering the effect of human prior knowledge in different time periods; finally, the adaptive graph convolution network is used as the basic framework, the spatio-temporal attention module is embedded to construct two-stream fusion model with joint information, bone information and their respective motion information. The accuracy of the algorithm is 90.2% and 96.2% respectively under the two evaluation standards of NTU RGB + D dataset. The universality of the model is reflected in the large-scale dataset Kinetics, which verifies that the algorithm is proved to be superior in extracting spatio-temporal features and capturing global context information.

Keywords:

	点击此处可从《信号处理》浏览原始摘要信息
	点击此处可从《信号处理》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏