首页 | 本学科首页   官方微博 | 高级检索  
     

基于注意力机制的多模态人体行为识别算法
引用本文:宋真东,杨国超,马玉鹏,冯晓毅.基于注意力机制的多模态人体行为识别算法[J].计算机测量与控制,2022,30(2):276-283.
作者姓名:宋真东  杨国超  马玉鹏  冯晓毅
作者单位:西北工业大学电子信息学院,西安710129;陕西华明普泰医疗设备有限公司,西安 710119,西北工业大学电子信息学院,西安710129,西北工业大学电子信息学院,西安710129;河北师范大学计算机与网络空间安全学院,石家庄 050024
基金项目:陕西省重点研发项目(2021ZDLGY15-01, 2021ZDLGY09-04, 2021GY-004和2020GY-050);深圳市国际合作研究项目(GJHZ20200731095204013);国家自然基金(61772419)
摘    要:提出了基于注意力机制的多模态人体行为识别算法;针对多模态特征的有效融合问题,设计基于注意力机制的双流特征融合卷积网络(TAM3DNet,two-stream attention mechanism 3D network);主干网络采用结合注意力机制的注意力3D网络(AM3DNet,attention mechanism 3D network),将特征图与注意力图进行加权后得到加权行为特征,从而使网络聚焦于肢体运动区域的特征,减弱背景和肢体静止区域的影响;将RGB-D数据的颜色和深度两种模态数据分别作为双流网络的输入,从两条分支网络得到彩色和深度行为特征,然后将融合特征进行分类得到人体行为识别结果。

关 键 词:RGB-D图像  多模态特征  人体行为  双流网络  注意力机制  特征融合
收稿时间:2022/1/7 0:00:00
修稿时间:2022/1/11 0:00:00

Multi-modal human behavior recognition algorithm based on attention mechanism
SONG Zhendong,YANG Guochao,MA Yupeng,FENG Xiaoyi.Multi-modal human behavior recognition algorithm based on attention mechanism[J].Computer Measurement & Control,2022,30(2):276-283.
Authors:SONG Zhendong  YANG Guochao  MA Yupeng  FENG Xiaoyi
Affiliation:(School of Electronics and Information,Northwestern Polytechnical University,Xi'an 710129,China;Shanxi Huaming Putai Medical Equipment,Xi'an 710119,China;College of Computer and Cyber Security,Hebei Normal University,Shijiazhuang 050024,China)
Abstract:A multi-modal human behavior recognition algorithm based on attention mechanism is proposed. Aiming at the problem of effective fusion of multimodal features, a two-stream feature fusion convolutional network TAM3DNet (Two-stream Attention Mechanism 3D Network) based on attention mechanism is designed. The backbone network adopts AM3DNet (Attention Mechanism 3D Network), which combines the attention mechanism, and weights the feature map and the attention map to obtain the weighted behavior characteristics, so that the network focuses on the characteristics of the limb movement area and reduces the influence of the background and the static area of the limb. The color and depth modal data of the RGB-D data are respectively used as the input of the dual-stream network, and the color and depth behavior features are obtained from the two branch networks, and then the fusion features are classified to obtain the human behavior recognition results.
Keywords:RGB-D  image  multi-modal  features  human  behavior  two-stream  network  attention  mechanism  feature  fusion
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《计算机测量与控制》浏览原始摘要信息
点击此处可从《计算机测量与控制》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号