首页 | 本学科首页   官方微博 | 高级检索  
     


VQA: Visual Question Answering
Authors:Aishwarya Agrawal  Jiasen Lu  Stanislaw Antol  Margaret Mitchell  C Lawrence Zitnick  Devi Parikh  Dhruv Batra
Affiliation:1.Virginia Tech,Blacksburg,USA;2.Microsoft Research,Redmond,USA;3.Facebook AI Research,Menlo Park,USA;4.Georgia Institute of Technology,Blacksburg,USA
Abstract:We propose the task of free-form and open-ended Visual Question Answering (VQA). Given an image and a natural language question about the image, the task is to provide an accurate natural language answer. Mirroring real-world scenarios, such as helping the visually impaired, both the questions and answers are open-ended. Visual questions selectively target different areas of an image, including background details and underlying context. As a result, a system that succeeds at VQA typically needs a more detailed understanding of the image and complex reasoning than a system producing generic image captions. Moreover, VQA is amenable to automatic evaluation, since many open-ended answers contain only a few words or a closed set of answers that can be provided in a multiple-choice format. We provide a dataset containing \(\sim \)0.25 M images, \(\sim \)0.76 M questions, and \(\sim \)10 M answers (www.visualqa.org), and discuss the information it provides. Numerous baselines and methods for VQA are provided and compared with human performance. Our VQA demo is available on CloudCV (http://cloudcv.org/vqa).
Keywords:
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号