基于关系推理与门控机制的视觉问答方法 Visual question answering method based on relational reasoning and gating mechanism期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于关系推理与门控机制的视觉问答方法

引用本文：	王鑫,陈巧红,孙麒,贾宇波. 基于关系推理与门控机制的视觉问答方法[J]. 浙江大学学报(工学版), 2022, 56(1): 36-46. DOI: 10.3785/j.issn.1008-973X.2022.01.004

作者姓名：	王鑫陈巧红孙麒贾宇波

作者单位：	浙江理工大学信息学院, 浙江杭州310018

基金项目：	浙江省自然科学基金资助项目（LY17E050028）

摘要：	针对现有的注意力机制存在缺乏对视觉对象间关系的理解能力及准确度较差的问题，在注意力机制的基础上增加关系推理模块与自适应门控机制. 该方法利用注意力机制关注多个与问题相关的视觉区域，利用关系推理模块中的二元关系推理与多元关系推理加强视觉区域间的联系. 将分别得到的视觉注意力特征与视觉关系特征输入到自适应门控中，动态控制2种特征对预测答案的贡献. 在VQA1.0及VQA2.0数据集上的实验结果表明：该模型与DCN、MFB、MFH及MCB等先进模型相比，在总体精度上均有约2%的提升；利用基于关系推理与门控机制的模型能够更好地理解图像内容，有效地提升视觉问答的准确率.
关键词：	视觉问答(VQA) 注意力机制视觉区域关系推理自适应门控
Visual question answering method based on relational reasoning and gating mechanism

Xin WANG,Qiao-hong CHEN,Qi SUN,Yu-bo JIA. Visual question answering method based on relational reasoning and gating mechanism[J]. Journal of Zhejiang University(Engineering Science), 2022, 56(1): 36-46. DOI: 10.3785/j.issn.1008-973X.2022.01.004

Authors:	Xin WANG Qiao-hong CHEN Qi SUN Yu-bo JIA

Abstract:	A relational reasoning module and an adaptive gating mechanism were added based on the attention mechanism aiming at the problems that the existing attention mechanism lacks understanding of the relationship between visual objects and has low accuracy. The attention mechanism was used to focus on multiple visual regions related to the question. The dual relational reasoning and multiple relational reasoning in the relational reasoning module were used to strengthen the connection between the visual regions. The obtained visual attention feature and visual relationship feature were input into adaptive gating, and the contribution of the two features to the predicted answer was dynamically controlled. The experimental results on the VQA1.0 and VQA2.0 data sets showed that the overall accuracy of the model was improved by about 2% compared with advanced models such as DCN, MFB, MFH and MCB. The model based on relational reasoning and gating mechanism can better understand the image content and effectively improve the accuracy of visual question and answer.

Keywords:	visual question answering (VQA) attention mechanism visual region relational reasoning adaptive gating

	点击此处可从《浙江大学学报(工学版)》浏览原始摘要信息
	点击此处可从《浙江大学学报(工学版)》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏