基于多路语义图网络的图像自动问答 Image question answering based on multi-view semantic gragh network期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于多路语义图网络的图像自动问答

引用本文：	乔有田,张海军,路明.基于多路语义图网络的图像自动问答[J].计算机应用研究,2023,40(2).

作者姓名：	乔有田张海军路明

作者单位：	扬州市职业大学,北京物资学院,北京航空航天大学

基金项目：	北京市自然科学基金资助项目(4182037);北京社会科学基金资助项目(21XCB005);北京市教委科技计划资助项目(KM201810037001)

摘要：	基于视觉特征与文本特征融合的图像问答已经成为自动问答的热点研究问题之一。现有的大部分模型都是通过注意力机制来挖掘图像和问题语句之间的关联关系，忽略了图像区域和问题词在同一模态之中以及不同视角的关联关系。针对该问题，提出一种基于多路语义图网络的图像自动问答模型（MSGN），从多个角度挖掘图像和问题之间的语义关联。MSGN利用图神经网络模型挖掘图像区域和问题词细粒度的模态内模态间的关联关系，进而提高答案预测的准确性。模型在公开的图像问答数据集上的实验结果表明，从多个角度挖掘图像和问题之间的语义关联可提高图像问题答案预测的性能。
关键词：	图像问答多头注意力自动问答特征融合跨模态分析
收稿时间：	2022/6/23 0:00:00
修稿时间：	2022/8/25 0:00:00
Image question answering based on multi-view semantic gragh network

Qiao You Tian,Zhang Hai Jun and Lu Ming.Image question answering based on multi-view semantic gragh network[J].Application Research of Computers,2023,40(2).

Authors:	Qiao You Tian Zhang Hai Jun and Lu Ming

Affiliation:	Yangzhou Vocational University,,

Abstract:	Recently, image question answering based on the fusion of visual features and text features has become one of the hot research issues of automatic question answering. Most of the existing models are based on the attention mechanism to explore the relationship between the image and the question sentence, which ignores the correlation between the image area and the question words in the same mode and different views. To solve these problems, this paper proposed an image question answering model(MSGN) based on multi-view semantic graph network, which could mine the semantic correlation between images and questions from multiple views. Meanwhile, it used the graph neural network model to mine the fine-grained intra and inter-modal correlation between image regions and question words. It carried out extensive experiments on public data sets. The experimental results show that the image automatic question answering model based on multi-view semantic graph network can improve the performance of image question answering.

Keywords:	image question answering multi-head attention model automatic question answering feature fusion crossmodal analysis

	点击此处可从《计算机应用研究》浏览原始摘要信息
	点击此处可从《计算机应用研究》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏