基于时空信息融合的人体行为识别研究 Research on Human Behavior Recognition Based on Temporal and Spatial Information Fusion期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于时空信息融合的人体行为识别研究

引用本文：	于海港,何宁,刘圣杰,韩文静. 基于时空信息融合的人体行为识别研究[J]. 计算机工程与应用, 2023, 59(3): 202-208. DOI: 10.3778/j.issn.1002-8331.2109-0213

作者姓名：	于海港何宁刘圣杰韩文静

作者单位：	北京联合大学北京市信息服务工程重点实验室，北京 100101

摘要：	在视频理解任务中，人体行为识别是一个重要的研究内容，但视频序列中存在时空信息融合困难、准确率低等问题。针对这些问题，提出一种基于时空信息融合的双流时空残差卷积网络模型。将视频分段采样提取RGB图像和光流图像，并将其输入到双流时空残差网络，通过设计的时空残差模块提取视频的深度时空特征，将每个视频片段的类别结果加权融合得到行为类别。提出的双流时空残差模块引入了少量的三维卷积和混合注意力机制，能够同时获取不同尺度的时空信息并且抑制无效信息，可以有效平衡时空信息的捕捉和计算量问题，并且提升了精度。实验基于TSN网络模型，在UCF101数据集上进行验证，实验结果表明提出的模型比原TSN网络模型的精准度提高了0.9个百分点，有效地提高了网络的时空信息捕获效率。
关键词：	行为识别双流网络残差结构注意力机制时序信息
Research on Human Behavior Recognition Based on Temporal and Spatial Information Fusion

YU Haigang,HE Ning,LIU Shengjie,HAN Wenjing. Research on Human Behavior Recognition Based on Temporal and Spatial Information Fusion[J]. Computer Engineering and Applications, 2023, 59(3): 202-208. DOI: 10.3778/j.issn.1002-8331.2109-0213

Authors:	YU Haigang HE Ning LIU Shengjie HAN Wenjing

Affiliation:	Beijing Key Laboratory of Information Service Engineering, College of Smart City, Beijing Union University, Beijing 100101, China

Abstract:	In video comprehension task, human behavior recognition is an important research content, but the temporal and spatial information fusion in video sequence is difficult and the accuracy is low. To solve these problems, this paper proposes a two-stream spatio-temporal residual convolution network model based on spatio-temporal information fusion. Firstly, RGB images and optical flow images are extracted from segmented video samples, and then are input into the two-stream spatio-temporal residual network. The depth spatio-temporal features of the video are extracted by the designed spatio-temporal residual module. Finally, the category results of each video segment are weighted and fused to obtain the behavior category. The two-stream space-time residual module proposed in this paper introduces a small amount of three-dimensional convolution and mixed attention mechanism, which can simultaneously obtain spatio-temporal information of different scales and suppress invalid information. It can effectively balance the problem of capturing and calculating spatio-temporal information, and improve the accuracy. The experiment is based on TSN network model and verified on UCF101 data set. Experimental results show that the accuracy of the proposed model is improved by 0.9 percentage points compared with the original TSN network model, and the efficiency of spatio-temporal information capture is effectively improved.

Keywords:	behavior recognition two stream network residual structure attentional mechanism temporal information

	点击此处可从《计算机工程与应用》浏览原始摘要信息
	点击此处可从《计算机工程与应用》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏