基于视觉和语言的跨媒体问答与推理研究综述 Survey of Cross-media Question Answering and Reasoning Based on Vision and Language期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于视觉和语言的跨媒体问答与推理研究综述

引用本文：	武阿明,姜品,韩亚洪.基于视觉和语言的跨媒体问答与推理研究综述[J].计算机科学,2021,48(3):71-78.

作者姓名：	武阿明姜品韩亚洪

作者单位：	天津大学智能与计算学部天津 300350;天津大学智能与计算学部天津 300350;天津大学智能与计算学部天津 300350

基金项目：	国家自然科学基金"重点项目";跨媒体智能问答与推理关键理论与方法研究

摘要：	基于视觉和语言的跨媒体问答与推理是人工智能领域的研究热点之一,其目的是基于给定的视觉内容和相关问题,模型能够返回正确的答案。随着深度学习的飞速发展及其在计算机视觉和自然语言处理领域的广泛应用,基于视觉和语言的跨媒体问答与推理也取得了较快的发展。文中首先系统地梳理了当前基于视觉和语言的跨媒体问答与推理的相关工作,具体介绍了基于图像的视觉问答与推理、基于视频的视觉问答与推理以及基于视觉常识推理模型与算法的研究进展,并将基于图像的视觉问答与推理细分为基于多模态融合、基于注意力机制和基于推理3类,将基于视觉常识推理细分为基于推理和基于预训练2类;然后总结了目前常用的问答与推理数据集,以及代表性的问答与推理模型在这些数据集上的实验结果;最后展望了基于视觉和语言的跨媒体问答与推理的未来发展方向。
关键词：	跨媒体问答与推理图像问答与推理视频问答与推理视觉常识问答与推理多模态融合注意力机制预训练
Survey of Cross-media Question Answering and Reasoning Based on Vision and Language

WU A-ming,JIANG Pin,HAN Ya-hong.Survey of Cross-media Question Answering and Reasoning Based on Vision and Language[J].Computer Science,2021,48(3):71-78.

Authors:	WU A-ming JIANG Pin HAN Ya-hong

Affiliation:	(College of Intelligence and Computing,Tianjin University,Tianjin 300350,China)

Abstract:	Cross-media question answering and reasoning based on vision and language is one of the popular research hotspots of artificial intelligence.It aims to return a correct answer based on understanding of the given visual content and related questions.With the rapid development of deep learning and its wide application in computer vision and natural language processing,cross-media question answering and reasoning based on vision and language has also achieved rapid development.This paper systematically surveys the current researches on cross-media question answering and reasoning based on vision and language,and specifi-cally introduces the research progress of image-based visual question answe-ring and reasoning,video-based visual question answering and reasoning,and visual commonsense reasoning.Particularly,image-based visual question answering and reasoning is subdivided into three categories,i.e.,multi-modal fusion,attention mechanism,and reasoning based methods.Meanwhile,visual commonsense reasoning is subdivided into reasoning and pre-training based methods.Moreover,this paper summarizes the commonly used datasets of question answering and reasoning,as well as the experimental results of representative methods.Finally,this paper looks forward to the future development direction of cross-media question answering and reasoning based on vision and language.

Keywords:	Cross-media question answering and reasoning Image-based question answering and reasoning Video-based question answering and reasoning Visual commonsense question answering and reasoning Multi-modal fusion Attention mechanism Pre-training
本文献已被维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏