首页 | 本学科首页   官方微博 | 高级检索  
     

一种基于深度神经网络的停车位检测算法开发
引用本文:崔政, 胡永利, 孙艳丰, 尹宝才. 面向跨模态数据协同分析的视觉问答方法综述[J]. 北京工业大学学报, 2022, 48(10): 1088-1099. DOI: 10.11936/bjutxb2021040030
作者姓名:崔政  胡永利  孙艳丰  尹宝才
作者单位:北京工业大学信息学部, 北京 100124
基金项目:国家自然科学基金资助项目(61672071, U1811463, U19B2039)
摘    要:

协同分析和处理跨模态数据一直是现代人工智能领域的难点和热点,其主要挑战是跨模态数据具有语义和异构鸿沟. 近年来,随着深度学习理论和技术的快速发展,基于深度学习的算法在图像和文本处理领域取得了极大的进步,进而产生了视觉问答(visual question answering, VQA)这一课题. VQA系统利用视觉信息和文本形式的问题作为输入,得出对应的答案,核心在于协同理解和处理视觉、文本信息. 因此,对VQA方法进行了详细综述,按照方法原理将现有的VQA方法分为数据融合、跨模态注意力和知识推理3类方法,全面总结分析了VQA方法的最新进展,介绍了常用的VQA数据集,并对未来的研究方向进行了展望.



关 键 词:跨模态数据  深度学习  视觉问答  数据融合  跨模态注意力  知识推理
收稿时间:2021-04-28
修稿时间:2021-06-07

Imagenet classification with deep convolutional neural networks
CUI Zheng, HU Yongli, SUN Yanfeng, YIN Baocai. Visual Question Answering Methods of Cross-modal Data Collaborative Analysis: a Survey[J]. Journal of Beijing University of Technology, 2022, 48(10): 1088-1099. DOI: 10.11936/bjutxb2021040030
Authors:CUI Zheng  HU Yongli  SUN Yanfeng  YIN Baocai
Affiliation:Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
Abstract:Collaborative analysis and processing of cross-modal data are always difficult and hot topics in the field of modern artificial intelligence. The main challenge is the semantic and heterogeneous gap of cross-modal data. Recently, with the rapid development of deep learning theory and technology, algorithms based on deep learning have made great progress in the field of image and text processing, and then the research topic of visual question answering (VQA) has emerged. VQA system uses visual information and text questions as input to get corresponding answers. The core of the system is to understand and process visual and text information cooperatively. Therefore, VQA methods were reviewed in detail. According to the principle of methods, the existing VQA methods were divided into three categories including data fusion, cross-modal attention and knowledge reasoning. The latest development of VQA methods was comprehensively summarized and analyzed, the commonly used VQA data sets were introduced and prospects for future research direction were suggested.
Keywords:cross-modal data  deep learning  visual question answering (VQA)  data fusion  cross-modal attention  knowledge reasoning
点击此处可从《北京工业大学学报》浏览原始摘要信息
点击此处可从《北京工业大学学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号