首页 | 本学科首页   官方微博 | 高级检索  
     


Past is important: Improved image captioning by looking back in time
Abstract:A major development in the area of image captioning consists of trying to incorporate visual attention in the design of language generative model. However, most previous studies only emphasize its role in enhancing visual composition at the current moment, while neglect its role in global sequence reasoning. This problem appears not only in captioning model, but also in reinforcement learning structure. To tackle this issue, we first propose a Visual Reserved model that enables previous visual context to be considered for the current sequence reasoning. Next, a Attentional-Fluctuation Supervised model is also proposed in reinforcement learning structure. Compared against the traditional strategies that only take non-differentiable Natural Language Processing (NLP) metrics as the incentive standard, the proposed model regards the fluctuation of previous attention matrix as an important indicator to judge the convergence of the captioning model. The proposed methods have been tested on MS-COCO captioning dataset and achieve competitive results evaluated by the evaluation server of MS COCO captioning challenge.
Keywords:Image captioning  Reinforcement learning  Visual attention
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号