首页 | 本学科首页   官方微博 | 高级检索  
     

基于同等注意力图网络的视觉问答方法
引用本文:王天星,袁家斌,刘昕. 基于同等注意力图网络的视觉问答方法[J]. 计算机与现代化, 2021, 0(11): 1-6. DOI: 10.3969/j.issn.1006-2475.2021.11.001
作者姓名:王天星  袁家斌  刘昕
作者单位:南京航空航天大学计算机科学与技术学院,江苏 南京 211106;南京航空航天大学计算机科学与技术学院,江苏 南京 211106;南京航空航天大学信息化处(信息化技术中心),江苏 南京 211106
基金项目:国家重点研发计划项目(2017YFB0802303); 国家自然科学基金资助项目(62076127, 61571226)
摘    要:视觉问答是一项计算机视觉与自然语言处理相结合的任务,需要理解图中的场景,特别是不同目标对象之间的交互关系。近年来,关于视觉问答的研究有了很大的进展,但传统方法采用整体特征表示,很大程度上忽略了所给图像的结构,无法有效锁定场景中的目标。而图网络依靠高层次图像表示,能捕获语义和空间关系,但以往利用图网络的视觉问答方法忽略了关系与问题间的关联在解答过程中的作用。据此提出基于同等注意力图网络的视觉问答模型EAGN,通过同等注意力机制赋予关系边与目标节点同等的重要性,两者结合使回答问题的依据更加充分。通过实验得出,相比于其他相关方法,EAGN模型性能优异且更具有竞争力,也为后续的相关研究提供了基础。

关 键 词:视觉问答  图网络; 计算机视觉; 自然语言处理  
收稿时间:2021-12-13

Approach for Visual Question Answering Based on Equal Attention Graph Networks
WANG Tian-xing,YUAN Jia-bin,LIU Xin. Approach for Visual Question Answering Based on Equal Attention Graph Networks[J]. Computer and Modernization, 2021, 0(11): 1-6. DOI: 10.3969/j.issn.1006-2475.2021.11.001
Authors:WANG Tian-xing  YUAN Jia-bin  LIU Xin
Abstract:Visual question answering is a task that combines computer vision with natural language processing. It needs to understand the scene in the picture, especially the interaction between different target objects. Great progress on visual question answering has been made in recent years, but traditional methods adopt holistic feature representation, which largely ignores the structure of the given image, and cannot effectively locate objects in the scene. Graph networks rely on high-level image representation, which can capture semantic and spatial relationships. However, the former visual question answering methods using graph networks ignored the role of the correspondence between relations and the question in the answering process. According to this, a visual question answering model based on equal attention graph networks named EAGN is proposed. Relationship edges are given the same importance as object nodes through the equal attention mechanism. The combination of these two elements makes the basis for answering the question more sufficient. Experiments show that compared with other related methods, the EAGN model performs well and is more competitive, which also provides a basis for subsequent related research.
Keywords:visual question answering  graph networks  computer vision  natural language processing  
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机与现代化》浏览原始摘要信息
点击此处可从《计算机与现代化》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号