首页 | 本学科首页   官方微博 | 高级检索  
     

可解释的视觉问答研究进展
引用本文:张一飞,孟春运,蒋洲,栾力,Ernest Domanaanmwi Ganaa.可解释的视觉问答研究进展[J].计算机应用研究,2024,41(1):10-20.
作者姓名:张一飞  孟春运  蒋洲  栾力  Ernest Domanaanmwi Ganaa
作者单位:1. 江苏科技大学经济管理学院;2. 江苏大学计算机科学与通信工程学院;3. 中国科学技术大学公共事务学院
基金项目:国家社科基金重点项目(16AJL008);;江苏省社科基金青年项目(22EYC001);;江苏高校哲学社会科学研究一般项目(2019SJA1927);
摘    要:在视觉问答(VQA)任务中,“可解释”是指在特定的任务中通过各种方法去解释模型为什么有效。现有的一些VQA模型因为缺乏可解释性导致模型无法保证在生活中能安全使用,特别是自动驾驶和医疗相关的领域,将会引起一些伦理道德问题,导致无法在工业界落地。主要介绍视觉问答任务中的各种可解释性实现方式,并分为了图像解释、文本解释、多模态解释、模块化解释和图解释五类,讨论了各种方法的特点并对其中的一些方法进行了细分。除此之外,还介绍了一些可以增强可解释性的视觉问答数据集,这些数据集主要通过结合外部知识库、标注图片信息等方法来增强可解释性。对现有常用的视觉问答可解释方法进行了总结,最后根据现有视觉问答任务中可解释性方法的不足提出了未来的研究方向。

关 键 词:视觉问答  视觉推理  可解释性  人工智能  自然语言处理  计算机视觉
收稿时间:2023/5/1 0:00:00
修稿时间:2023/12/16 0:00:00

Research advances in explainable visual question answering
zhangyifei,mengchunyun,jiangzhou,luanli and Ernest Domanaanmwi Ganaa.Research advances in explainable visual question answering[J].Application Research of Computers,2024,41(1):10-20.
Authors:zhangyifei  mengchunyun  jiangzhou  luanli and Ernest Domanaanmwi Ganaa
Affiliation:Jiangsu University of Science and Technology,,,,
Abstract:In the context of visual question answering(VQA) tasks, "explainability" refers to the various ways in which researchers can explain why a model works in a given task. The lack of explainability of some existing VQA models has led to a lack of assurance that the models can be used safely in real-life applications, especially in fields such as autonomous driving and healthcare. This would raise ethical and moral issues that hinder their implementation in industry. This paper introduced various implementations for enhancing explainability in VQA tasks and categorized them into four main categories: image interpretation, text interpretation, multi-modal interpretation, modular interpretation, and graph interpretation. This paper discussed the characteristics of each approach, and further presented the subdivisions for some of them. Furthermore, it presented several VQA datasets that aimed to enhance explainability. These datasets primarily focused on incorporating external knowledge bases and annotating image information to improve explainability. In summary, this paper provided an overview of existing commonly used interpretable methods for VQA tasks and proposed future research directions based on the identified shortcomings of the current approaches.
Keywords:visual question answering  visual reasoning  explainability  artificial intelligence  natural language processing  computer vision
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号