首页 | 本学科首页   官方微博 | 高级检索  
     

联合自注意力和循环网络的图像标题生成
引用本文:王习,张凯,李军辉,孔芳,张熠天.联合自注意力和循环网络的图像标题生成[J].计算机科学,2021,48(4):157-163.
作者姓名:王习  张凯  李军辉  孔芳  张熠天
作者单位:苏州大学计算机科学与技术学院 江苏 苏州 215006;国家工业信息安全发展研究中心 北京 100000
摘    要:目前大多数图像标题生成模型都是由一个基于卷积神经网络(Convolutional Neural Network,CNN)的图像编码器和一个基于循环神经网络(Recurrent Neural Network,RNN)的标题解码器组成。其中图像编码器用于提取图像的视觉特征,标题解码器基于视觉特征通过注意力机制来生成标题。然而,使用基于注意力机制的RNN的问题在于,解码端虽然可以对图像特征和标题交互的部分进行注意力建模,但是却忽略了标题内部交互作用的自我注意。因此,针对图像标题生成任务,文中提出了一种能同时结合循环网络和自注意力网络优点的模型。该模型一方面能够通过自注意力模型在统一的注意力区域内同时捕获模态内和模态间的相互作用,另一方面又保持了循环网络固有的优点。在MSCOCO数据集上的实验结果表明,CIDEr值从1.135提高到了1.166,所提方法能够有效提升图像标题生成的性能。

关 键 词:图像标题  自注意力机制  循环神经网络

Generation of Image Caption of Joint Self-attention and Recurrent Neural Network
WANG Xi,ZHANG Kai,LI Jun-hui,KONG Fang,ZHANG Yi-tian.Generation of Image Caption of Joint Self-attention and Recurrent Neural Network[J].Computer Science,2021,48(4):157-163.
Authors:WANG Xi  ZHANG Kai  LI Jun-hui  KONG Fang  ZHANG Yi-tian
Affiliation:(School of Computer Science and Technology,Soochow University,Suzhou,Jiangsu 215006,China;China Industrial Control Systems Cyber Emergency Response Team,Beijing 100000,China)
Abstract:At present,most image caption generation models consist of an image encoder based on convolutional neural network(CNN)and a caption decoder based on recurrent neural network(RNN).The image encoder is used to extract visual features from images,while the caption decoder generates captions based on visual features with an attention mechanism.Although the decoder uses RNN with an attention mechanism to model the interaction between image features and captions,it ignores the self-attention of the internal interaction of images or captions.Therefore,this paper proposes a novel model that combines the advantages of RNN and self-attention network for image caption generation.On the one hand,this model can capture interactions within and between modalities in the unified attention area through the self-attention simultaneously.On the other hand,it maintains the inherent advantages of RNN.Experimental results on the MSCOCO dataset show that the proposed model outperforms baseline by improving the performance from 1.135 to 1.166 in CIDEr.
Keywords:Image caption  Self-attention mechanism  Recurrent neural network
本文献已被 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号