首页 | 本学科首页   官方微博 | 高级检索  
     

基于注意力机制的3D DenseNet人体动作识别方法
引用本文:张聪聪,何宁,孙琪翔,尹晓杰.基于注意力机制的3D DenseNet人体动作识别方法[J].计算机工程,2021,47(11):313-320.
作者姓名:张聪聪  何宁  孙琪翔  尹晓杰
作者单位:1. 北京联合大学 北京市信息服务工程重点实验室, 北京 100101;2. 北京联合大学 智慧城市学院, 北京 100101
基金项目:国家自然科学基金(61872042,61572077);北京市教委科技重点项目(KZ201911417048);北京联合大学人才强校优选计划(BPHR2020AZ01,BPHR2020EZ01);北京联合大学研究生科研创新项目(YZ2020K001);“十三五”时期北京市属高校高水平教师队伍建设支持计划(CIT&TCD201704069)。
摘    要:传统人体动作识别算法无法充分利用视频中人体动作的时空信息,且识别准确率较低。提出一种新的三维密集卷积网络人体动作识别方法。将双流网络作为基本框架,在空间网络中运用添加注意力机制的三维密集网络提取视频中动作的表观信息特征,结合时间网络对连续视频序列运动光流的运动信息进行特征提取,经过时空特征和分类层的融合后得到最终的动作识别结果。同时为更准确地提取特征并对时空网络之间的相互作用进行建模,在双流网络之间加入跨流连接对时空网络进行卷积层的特征融合。在UCF101和HMDB51数据集上的实验结果表明,该模型识别准确率分别为94.52%和69.64%,能够充分利用视频中的时空信息,并提取运动的关键信息。

关 键 词:动作识别  注意力机制  三维DenseNet  双流网络  特征融合  
收稿时间:2020-10-24
修稿时间:2020-11-24

Human Motion Recognition Method Based on Attention Mechanism of 3D DenseNet
ZHANG Congcong,HE Ning,SUN Qixiang,YIN Xiaojie.Human Motion Recognition Method Based on Attention Mechanism of 3D DenseNet[J].Computer Engineering,2021,47(11):313-320.
Authors:ZHANG Congcong  HE Ning  SUN Qixiang  YIN Xiaojie
Affiliation:1. Beijing Key Laboratory of Information Service Engineering, Beijing Union University, Beijing 100101, China;2. Smart City College, Beijing Union University, Beijing 100101, China
Abstract:Traditional human motion recognition algorithms cannot fully utilize the spatial and temporal information of human motions in videos,and are limited in the recognition accuracy.To address the problem,a three-dimensional dense convolutional network is proposed for recognizing human motions in videos.The model takes the two-stream network as the basic framework,using the 3D dense network with an attention mechanism for the spatial network to extract the apparent information features of human motions in the videos.On this basis,the temporal network is also used to extract the motion information features of the optical flows in the continuous video sequence.Then the spatio-temporal features and the classification layer are fused to obtain the final motion recognition accuracy.To extract features more accurately and model the interactions between the spatio-temporal networks,cross-stream connections are added between the two-stream networks to fuse the features at the convolutional layer of spatio-temporal networks.The experimental results show that the proposed model exhibits a recognition accuracy of 94.52% on the UCF101 dataset and 69.64% on the HMDB51 dataset.The model can make full use of the spatio-temporal information in the video to extract the key information of motions.
Keywords:motion recognition  attention mechanism  3D DenseNet  two-stream network  feature fusion  
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机工程》浏览原始摘要信息
点击此处可从《计算机工程》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号