Parallel-fusion LSTM with synchronous semantic and visual information for image captioning期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Parallel-fusion LSTM with synchronous semantic and visual information for image captioning

Affiliation:	1. State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China;2. Department of Computer Science and Technology, Nanjing University, Nanjing 210023, China

Abstract:	For synchronously combining the dynamic semantic and visual information in the decoder part of image captioning, we propose a novel parallel-fusion LSTM (pLSTM) structure in this paper. Two parallel LSTMs with attributes and visual information of image are fused by the hidden states at every time step, which makes the attributes and visual information complementary or enhanced for generating more accurate captions. According to the different ways of integrating semantic information from attribute LSTM to visual LSTM, we propose two models pLSTM with attention (pLSTM-A) and pLSTM with guiding (pLSTM-G). pLSTM-A can automatically capture the crucial semantic and visual information to generate captions, and pLSTM-G directly adjusts the hidden state of visual LSTM by synchronous semantic information to the critical region. For verifying the effectiveness of our proposed pLSTM, we conduct a series of experiments on MSCOCO and Flickr30K datasets, and the experimental results outperform some state-of-the-art image captioning methods.

Keywords:	Image captioning Parallel-fusion LSTM Attention mechanism Guiding LSTM
本文献已被 ScienceDirect 等数据库收录！