Past is important: Improved image captioning by looking back in time期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Past is important: Improved image captioning by looking back in time

Abstract:	A major development in the area of image captioning consists of trying to incorporate visual attention in the design of language generative model. However, most previous studies only emphasize its role in enhancing visual composition at the current moment, while neglect its role in global sequence reasoning. This problem appears not only in captioning model, but also in reinforcement learning structure. To tackle this issue, we first propose a Visual Reserved model that enables previous visual context to be considered for the current sequence reasoning. Next, a Attentional-Fluctuation Supervised model is also proposed in reinforcement learning structure. Compared against the traditional strategies that only take non-differentiable Natural Language Processing (NLP) metrics as the incentive standard, the proposed model regards the fluctuation of previous attention matrix as an important indicator to judge the convergence of the captioning model. The proposed methods have been tested on MS-COCO captioning dataset and achieve competitive results evaluated by the evaluation server of MS COCO captioning challenge.

Keywords:	Image captioning Reinforcement learning Visual attention
本文献已被 ScienceDirect 等数据库收录！