首页 | 本学科首页   官方微博 | 高级检索  
     

基于改进的双向长短期记忆网络的视频摘要生成模型
引用本文:武光利,李雷霆,郭振洲,王成祥. 基于改进的双向长短期记忆网络的视频摘要生成模型[J]. 计算机应用, 2021, 41(7): 1908-1914. DOI: 10.11772/j.issn.1001-9081.2020091512
作者姓名:武光利  李雷霆  郭振洲  王成祥
作者单位:1. 甘肃政法大学 网络空间安全学院, 兰州 730070;2. 中国民族语言文字信息技术教育部重点实验室(西北民族大学), 兰州 730030
基金项目:甘肃省自然科学基金资助项目(17JR5RA161);甘肃省青年科技基金计划项目(18JR3RA193);兰州市人才创新创业项目(2020-RC-27);甘肃省高等学校创新能力提升项目(2020B-167);陇原青年创新创业人才项目(2021LQGR20)。
摘    要:针对传统视频摘要方法往往没有考虑时序信息以及提取的视频特征过于复杂、易出现过拟合现象的问题,提出一种基于改进的双向长短期记忆(BiLSTM)网络的视频摘要生成模型.首先,通过卷积神经网络(CNN)提取视频帧的深度特征,而且为了使生成的视频摘要更具多样性,采用BiLSTM网络将深度特征识别任务转换为视频帧的时序特征标注任...

关 键 词:视频摘要  卷积神经网络  双向长短期记忆网络  最大池化
收稿时间:2020-09-28
修稿时间:2020-12-22

Video summarization generation model based on improved bi-directional long short-term memory network
WU Guangli,LI Leiting,GUO Zhenzhou,WANG Chengxiang. Video summarization generation model based on improved bi-directional long short-term memory network[J]. Journal of Computer Applications, 2021, 41(7): 1908-1914. DOI: 10.11772/j.issn.1001-9081.2020091512
Authors:WU Guangli  LI Leiting  GUO Zhenzhou  WANG Chengxiang
Affiliation:1. School of Cyberspace Security, Gansu University of Political Science and Law, Lanzhou Gansu 730070, China;2. Key Laboratory of China's Ethnic Languages and Information Technology of Ministry of Education(Northwest Minzu University), Lanzhou Gansu 730030, China
Abstract:In order to solve the problems that traditional video summarization methods often do not consider temporal information and the extracted video features are too complex and prone to overfitting, a video summarization generation model based on improved Bi-directional Long Short-Term Memory (BiLSTM) network was proposed. Firstly, the deep features of the video frames were extracted by Convolutional Neural Network (CNN), and in order to make the generated video summarization more diverse, the BiLSTM was adopted to convert the deep feature recognition task into the sequence feature annotation task of the video frames, so that the model was able to obtain more context information. Secondly, considering that the generated video summarization should be representative, the fusion of max pooling was adopted to reduce the feature dimension and highlight the key information to weaken the redundant information, so that the model was able to learn the representative features, and the reduction of the feature dimension also reduced the parameters required in the fully connected layer to avoid the overfitting problem. Finally, the importance scores of the video frames were predicted and converted into the shot scores, which was used to select the key shots to generate video summarization. Experimental results show that the improved video summarization model improves the accuracy of video summarization generation on two standard datasets TvSum and SumMe, its F1-score values are improved by 1.4 and 0.3 percentage points respectively compared with the existing Long Short-Term Memory (LSTM) network based video summarization model DPPLSTM (Determinantal Point Process Long Short-Term Memory).
Keywords:video summarization  Convolutional Neural Network (CNN)  Bi-directional Long Short-Term Memory (BiLSTM) network  max pooling  
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号