首页 | 本学科首页   官方微博 | 高级检索  
     

多尺度超图卷积骨架动作识别网络
引用本文:秦晓飞,赵颖,张逸杰,杜睿杰,钱汉文,陈萌,张文奇,张学典.多尺度超图卷积骨架动作识别网络[J].光学仪器,2022,44(4):39-48.
作者姓名:秦晓飞  赵颖  张逸杰  杜睿杰  钱汉文  陈萌  张文奇  张学典
作者单位:上海理工大学 光电信息与计算机工程学院,上海 200093;上海宇航系统工程研究所,上海 201109
基金项目:上海市人工智能计划(2019-RGZN-01077)
摘    要:动作识别是计算机视觉基础任务之一,骨架序列包含了大部分的动作信息,因此基于骨架的动作识别算法受到很多学者关注。人体骨架在数学上是一个天然的图,所以图卷积被广泛应用于动作识别。但普通的图卷积只聚合两两节点间的低阶信息,不能建模多节点间的高阶复杂关系。针对此问题,本文提出一种多尺度超图卷积网络,在空间和时间两个维度聚合更丰富的信息,提高动作识别准确度。多尺度超图卷积网络采用编解码结构,编码器使用超图卷积模块聚合超边中多个节点间的相关信息,解码器使用超图融合模块恢复原始骨架结构,另外基于空洞卷积设计了多尺度时间图卷积模块以更好地聚合时间维度运动信息。NTU-RGB+D和Kinetics数据集上的实验结果验证了算法的有效性。

关 键 词:动作识别  图卷积  超图卷积  空洞卷积
收稿时间:2022/1/6 0:00:00

Multiscale hypergraph convolutional network for skeleton-based action recognition
QIN Xiaofei,ZHAO Ying,ZHANG Yijie,DU Ruijie,QIAN Hanwen,CHEN Meng,ZHANG Wenqi,ZHANG Xuedian.Multiscale hypergraph convolutional network for skeleton-based action recognition[J].Optical Instruments,2022,44(4):39-48.
Authors:QIN Xiaofei  ZHAO Ying  ZHANG Yijie  DU Ruijie  QIAN Hanwen  CHEN Meng  ZHANG Wenqi  ZHANG Xuedian
Affiliation:School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China;Institute of Aerospace System Engineering of Shanghai, Shanghai 201109, China
Abstract:Action recognition is one of the basic tasks of computer vision. The skeleton sequence contains most of the action information, so skeleton-based action recognition has attracted a lot of research attention. Mathematically, the human skeleton is a natural graph, so graph convolution is widely used in action recognition. But ordinary graph convolution only aggregates low-order information between pairwise nodes, and cannot model high-order complex relationships between multiple nodes. To solve this problem, a multiscale hypergraph convolutional network is proposed, which aggregates richer information in the two dimensions of space and time, so as to improve the accuracy of action recognition. The multiscale hypergraph convolutional network has an encoder-decoder structure. The encoder uses the hypergraph convolution module to aggregate relevant information between multiple nodes in the hyperedge, and the decoder uses the hypergraph fusion module to restore the original skeleton structure. In addition, a multiscale temporal graph convolution model based on dilated convolution is designed, which is used to better aggregate the temporal-dimension motion information. The experimental results on NTU-RGB+D and Kinetics datasets verify the effectiveness of this algorithm.
Keywords:action recognition  graph convolution  hypergraph convolution  dilated convolution
点击此处可从《光学仪器》浏览原始摘要信息
点击此处可从《光学仪器》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号