基于深度神经网络的图像碎片化信息问答算法 Question Answering Algorithm on Image Fragmentation Information Based on Deep Neural Network期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于深度神经网络的图像碎片化信息问答算法

引用本文：	王一蕾, 卓一帆, 吴英杰, 陈铭钦. 基于深度神经网络的图像碎片化信息问答算法[J]. 计算机研究与发展, 2018, 55(12): 2600-2610. DOI: 10.7544/issn1000-1239.2018.20180606

作者姓名：	王一蕾卓一帆吴英杰陈铭钦

作者单位：	1.(福州大学数学与计算机科学学院福州 350108) (yilei@fzu.edu.cn)

基金项目：	福建省自然科学基金项目(2018J01779)

摘要：	大量结构无序、内容片面的碎片化信息以文本、图像、视频、网页等不同模态的形式，高度分散存储在不同数据源中，现有的研究通过构建视觉问答系统(visual question answering, VQA)，实现对多模态碎片化信息的提取、表达和理解.视觉问答任务给定与图像相关的一个问题，推理相应的答案.在视觉问答任务的基本背景下，以设计出完备的图像碎片化信息问答的框架与算法为目标，重点研究包括图像特征提取、问题文本特征提取、多模态特征融合和答案推理的模型与算法.构建深度神经网络模型提取用于表示图像与问题信息的特征，结合注意力机制与变分推断方法关联图像与问题2种模态特征并推理答案.实验结果表明：该模型能够有效提取和理解多模态碎片化信息，并提高视觉问答任务的准确率.
关键词：	人工智能碎片化信息神经网络深度学习视觉问答
Question Answering Algorithm on Image Fragmentation Information Based on Deep Neural Network

Wang Yilei, Zhuo Yifan, Wu Yingjie, Chen Mingqin. Question Answering Algorithm on Image Fragmentation Information Based on Deep Neural Network[J]. Journal of Computer Research and Development, 2018, 55(12): 2600-2610. DOI: 10.7544/issn1000-1239.2018.20180606

Authors:	Wang Yilei Zhuo Yifan Wu Yingjie Chen Mingqin

Affiliation:	1.(College of Mathematics and Computer Science, Fuzhou University, Fuzhou 350108)

Abstract:	Many fragmentation information is highly dispersed in different data sources, such as text, image, video and Web. They are characterized by structural disorder and content one-sided. Current researches implement the extraction, expression and understanding of multi-modal fragmentation information by constructing visual question answering (VQA) system. The VQA task is required to provide the correct answer to a given problem with a corresponding image. The aim of this paper is to design a complete framework and algorithm for image fragmentation information question answering under the basic background of visual question answering task. The main research includes image feature extraction, question text feature extraction, multi-modal feature fusion and answer reasoning. Deep neural network is constructed to extract features for representing images and problem information. Attention mechanism and variational inference method are combined to fusion two modal features of image and problem and reason answers. Experiment results show that the model can effectively extract and understand multi-modal fragmentation information, and improve the accuracy of VQA.

Keywords:	artificial intelligence fragmented information neural network deep learning visual question answering (VQA)

	点击此处可从《计算机研究与发展》浏览原始摘要信息
	点击此处可从《计算机研究与发展》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏