首页 | 本学科首页   官方微博 | 高级检索  
     

基于多注意力多尺度特征融合的图像描述生成算法
引用本文:陈龙杰,张钰,张玉梅,吴晓军.基于多注意力多尺度特征融合的图像描述生成算法[J].计算机应用,2019,39(2):354-359.
作者姓名:陈龙杰  张钰  张玉梅  吴晓军
作者单位:现代教学技术教育部重点实验室(陕西师范大学),西安710062;陕西省教学信息技术工程实验室(陕西师范大学),西安710119;文化教育智慧传播工程技术研究中心(陕西师范大学),西安710119;陕西师范大学计算机科学学院,西安710119;现代教学技术教育部重点实验室(陕西师范大学),西安710062;陕西省教学信息技术工程实验室(陕西师范大学),西安710119;文化教育智慧传播工程技术研究中心(陕西师范大学),西安710119;陕西师范大学计算机科学学院,西安710119;现代教学技术教育部重点实验室(陕西师范大学),西安710062;陕西省教学信息技术工程实验室(陕西师范大学),西安710119;文化教育智慧传播工程技术研究中心(陕西师范大学),西安710119;陕西师范大学计算机科学学院,西安710119;现代教学技术教育部重点实验室(陕西师范大学),西安710062;陕西省教学信息技术工程实验室(陕西师范大学),西安710119;文化教育智慧传播工程技术研究中心(陕西师范大学),西安710119;陕西师范大学计算机科学学院,西安710119
基金项目:国家自然科学基金资助项目(11772178,61741208,11502133);中央高校基本科研业务费资助项目(GK201801004,GK201803089,GK201703082);陕西省自然科学基金资助项目(2017JQ6074);国家重点研发计划项目(2017YFB1402102);陕西省自然科学基础研究计划项目(2017JM6103,2017JM6060);陕西师范大学2017年度校级综合教改研究项目(17JG33)。
摘    要:针对图像描述生成中对图像细节表述质量不高、图像特征利用不充分、循环神经网络层次单一等问题,提出基于多注意力、多尺度特征融合的图像描述生成算法。该算法使用经过预训练的目标检测网络来提取图像在卷积神经网络不同层上的特征,将图像特征分层输入多注意力结构中,依次将多注意力结构与多层循环神经网络相连,构造出多层次的图像描述生成网络模型。在多层循环神经网络中加入残差连接来提高网络性能,并且可以有效避免因为网络加深导致的网络退化问题。在MSCOCO测试集中,所提算法的BLEU-1和CIDEr得分分别可以达到0. 804及1. 167,明显优于基于单一注意力结构的自上而下图像描述生成算法;通过人工观察对比可知,所提算法生成的图像描述可以表现出更好的图像细节。

关 键 词:长短期记忆网络  图像描述  多注意力机制  多尺度特征融合  深度神经网络
收稿时间:2018-07-17
修稿时间:2018-09-12

Image caption genaration algorithm based on multi-attention and multi-scale feature fusion
CHEN Longjie,ZHANG Yu,ZHANG Yumei,WU Xiaojun.Image caption genaration algorithm based on multi-attention and multi-scale feature fusion[J].journal of Computer Applications,2019,39(2):354-359.
Authors:CHEN Longjie  ZHANG Yu  ZHANG Yumei  WU Xiaojun
Affiliation:1. Key Laboratory of Modern Teaching Technology, Ministry of Education(Shaanxi Normal University), Xi'an Shaanxi 710062, China;2. Engineering Laboratory of Teaching Information Technology of Shaanxi Province(Shaanxi Normal University), Xi'an Shaanxi 710119, China;3. Culture, Education and Intelligent Communication Engineering Technology Research Center(Shaanxi Normal University), Xi'an Shaanxi 710119, China;4. School of Computer Science, Shaanxi Normal University, Xi'an Shaanxi 710119, China
Abstract:Focusing on the issues of low quality of image caption, insufficient utilization of image features and single-level structure of recurrent neural network in image caption generation, an image caption generation algorithm based on multi-attention and multi-scale feature fusion was proposed. The pre-trained target detection network was used to extract the features of the image from the convolutional neural network, which were input into the multi-attention structures at different layers. Each attention part with features of different levels was related to the multi-level recurrent neural networks sequentially, constructing a multi-level image caption generation network model. By introducing residual connections in the recurrent networks, the network complexity was reduced and the network degradation caused by deepening network was avoided. In MSCOCO datasets, the BLEU-1 and CIDEr scores of the proposed algorithm can achieve 0.804 and 1.167, which is obviously superior to top-down image caption generation algorithm based on single attention structure. Both artificial observation and comparison results velidate that the image caption generated by the proposed algorithm can show better details.
Keywords:Long Short-Term Memory (LSTM) network                                                                                                                        image caption                                                                                                                        multi-attention mechanism                                                                                                                        multi-scale feature fusion                                                                                                                        deep neural network
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号