基于空间注意力推理机制的视觉问答算法研究 Algorithm of visual question answering based on spatial attention reasoning mechanism期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于空间注意力推理机制的视觉问答算法研究

引用本文：	李智涛,周之平,叶琴.基于空间注意力推理机制的视觉问答算法研究[J].计算机应用研究,2021,38(3):952-955.

作者姓名：	李智涛周之平叶琴

作者单位：	南昌航空大学信息工程学院,南昌330063;南昌航空大学信息工程学院,南昌330063;南昌航空大学信息工程学院,南昌330063

基金项目：	国家自然科学基金资助项目

摘要：	针对现有基于注意力机制的多模态学习,对文字上下文之间的自我联系和图像目标区域的空间位置关系进行了深入研究。在分析现有注意力网络的基础上,提出使用自注意力模块(self-attention,SA)和空间推理注意力模块(spatial reasoning attention,SRA)对文本信息和图像目标进行映射,最终得到融合特征输出。相较于其他注意力机制,SA和SRA可以更好地将文本信息匹配图像目标区域。模型在VQAv2数据集上进行训练和验证,并在VQAv2数据集上达到了64.01%的准确率。
关键词：	视觉问答注意力机制多模态学习自注意力空间推理注意力
收稿时间：	2019/12/16 0:00:00
修稿时间：	2021/2/3 0:00:00
Algorithm of visual question answering based on spatial attention reasoning mechanism

Li Zhitao,Zhou Zhiping and Ye Qin.Algorithm of visual question answering based on spatial attention reasoning mechanism[J].Application Research of Computers,2021,38(3):952-955.

Authors:	Li Zhitao Zhou Zhiping and Ye Qin

Affiliation:	(School of Information Engineering,Nanchang Hangkong University,Nanchang 330063,China)

Abstract:	Aiming at the existing multi-modal learning which based on attention mechanism,this paper studied the self-association between the context of the text and the spatial positional relationship of the object area of the image.Based on the analysis of existing attention networks,this paper proposed to use SA and SRA to map the text information to the image object,and finally obtained the fusion feature output.Compared with other attention mechanisms,SA and SRA can better match text information to the image object area.The model is trained and verified on the VQAv2 dataset and achieves an accuracy of 64.01%on the VQAv2 dataset.

Keywords:	visual question answering(VQA) attention mechanism multimodal learning self-attention spatial reasoning attention
本文献已被维普万方数据等数据库收录！
	点击此处可从《计算机应用研究》浏览原始摘要信息
	点击此处可从《计算机应用研究》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏