首页 | 官方网站   微博 | 高级检索  
     

融合非局部神经网络的行为检测模型
作者姓名:黄文明  阳沐利  蓝如师  邓珍荣  罗笑南
作者单位:桂林电子科技大学计算机与信息安全学院,广西 桂林 541004
基金项目:广西图像图形智能处理重点实验室培育基地(桂林电子科技大学)开放基金项目(GIIP2011)
摘    要:针对在视频行为检测中卷积神经网络(CNN)对时域信息理解能力不足的问题,提出了一种融合 非局部神经网络的行为检测模型。模型采用一种双分支的 CNN 结构,分别提取视频的空间特征和运动特征。 将视频单帧和视频连续帧序列作为网络输入,空间网络对视频当前帧进行 2D CNN 特征提取,时空网络采用融 合非局部模块的 3D CNN 来捕获视频帧之间的全局联系。为了进一步增强上下文语义信息,使用一种通道融合 机制来聚合双分支网络的特征,最后将融合后的特征用于帧级检测。在 UCF101-24 和 JHMDB 2 个数据集上进 行了实验,结果表明,该方法能够充分融合空间和时间维度信息,在基于视频的时空行为检测任务上具有较高 的检测精度。

关 键 词:行为检测  非局部模块  3D卷积  注意力机制

Action detection model fused with non-local neural network
Authors:HUANG Wen-ming  YANG Mu-li  LAN Ru-shi  DENG Zhen-rong  LUO Xiao-nan
Affiliation:School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin Guangxi 541004, China
Abstract:The convolutional neural network (CNN) has insufficient ability to understand the time domain information in video action detection. For this problem, we proposed a model based on fused non-local neural network, which combines non-local block with 3D CNN to capture global connections between video frames. Model used a two-stream architecture of 2D CNN and 3D CNN to extract the spatial and motion features of the video, respectively, which takes video single frames and video frame sequences as inputs. To further enhance contextual semantic information, an improved attention and channel fusion mechanism is used to aggregate the features of the above two networks, and finally the fused features are used for frame-level detection. We conducted experimental verification and comparison on the UCF101-24 and JHMDB data set. The results show that our method can fully integrate spatial and temporal information, and has high detection accuracy on video-based action detection tasks. 
Keywords:action detection  non-local neural network  3D convolution  attention mechanism   
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《》浏览原始摘要信息
点击此处可从《》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号