基于同等注意力图网络的视觉问答方法 Approach for Visual Question Answering Based on Equal Attention Graph Networks期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于同等注意力图网络的视觉问答方法

引用本文：	王天星,袁家斌,刘昕.基于同等注意力图网络的视觉问答方法[J].计算机与现代化,2021,0(11):1-6.

作者姓名：	王天星袁家斌刘昕

作者单位：	南京航空航天大学计算机科学与技术学院,江苏南京 211106;南京航空航天大学计算机科学与技术学院,江苏南京 211106;南京航空航天大学信息化处(信息化技术中心),江苏南京 211106

基金项目：	国家重点研发计划项目（2017YFB0802303）；国家自然科学基金资助项目（62076127, 61571226）

摘要：	视觉问答是一项计算机视觉与自然语言处理相结合的任务，需要理解图中的场景，特别是不同目标对象之间的交互关系。近年来，关于视觉问答的研究有了很大的进展，但传统方法采用整体特征表示，很大程度上忽略了所给图像的结构，无法有效锁定场景中的目标。而图网络依靠高层次图像表示，能捕获语义和空间关系，但以往利用图网络的视觉问答方法忽略了关系与问题间的关联在解答过程中的作用。据此提出基于同等注意力图网络的视觉问答模型EAGN，通过同等注意力机制赋予关系边与目标节点同等的重要性，两者结合使回答问题的依据更加充分。通过实验得出，相比于其他相关方法，EAGN模型性能优异且更具有竞争力，也为后续的相关研究提供了基础。
关键词：	视觉问答图网络计算机视觉自然语言处理
收稿时间：	2021-12-13
Approach for Visual Question Answering Based on Equal Attention Graph Networks

WANG Tian-xing,YUAN Jia-bin,LIU Xin.Approach for Visual Question Answering Based on Equal Attention Graph Networks[J].Computer and Modernization,2021,0(11):1-6.

Authors:	WANG Tian-xing YUAN Jia-bin LIU Xin

Abstract:	Visual question answering is a task that combines computer vision with natural language processing. It needs to understand the scene in the picture, especially the interaction between different target objects. Great progress on visual question answering has been made in recent years, but traditional methods adopt holistic feature representation, which largely ignores the structure of the given image, and cannot effectively locate objects in the scene. Graph networks rely on high-level image representation, which can capture semantic and spatial relationships. However, the former visual question answering methods using graph networks ignored the role of the correspondence between relations and the question in the answering process. According to this, a visual question answering model based on equal attention graph networks named EAGN is proposed. Relationship edges are given the same importance as object nodes through the equal attention mechanism. The combination of these two elements makes the basis for answering the question more sufficient. Experiments show that compared with other related methods, the EAGN model performs well and is more competitive, which also provides a basis for subsequent related research.

Keywords:	visual question answering graph networks computer vision natural language processing
本文献已被万方数据等数据库收录！
	点击此处可从《计算机与现代化》浏览原始摘要信息
	点击此处可从《计算机与现代化》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏