一种基于深度神经网络的停车位检测算法开发 Imagenet classification with deep convolutional neural networks期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

一种基于深度神经网络的停车位检测算法开发

引用本文：	崔政, 胡永利, 孙艳丰, 尹宝才. 面向跨模态数据协同分析的视觉问答方法综述[J]. 北京工业大学学报, 2022, 48(10): 1088-1099. DOI: 10.11936/bjutxb2021040030

作者姓名：	崔政胡永利孙艳丰尹宝才

作者单位：	北京工业大学信息学部, 北京 100124

基金项目：	国家自然科学基金资助项目(61672071, U1811463, U19B2039)

摘要：	协同分析和处理跨模态数据一直是现代人工智能领域的难点和热点，其主要挑战是跨模态数据具有语义和异构鸿沟. 近年来，随着深度学习理论和技术的快速发展，基于深度学习的算法在图像和文本处理领域取得了极大的进步，进而产生了视觉问答(visual question answering, VQA)这一课题. VQA系统利用视觉信息和文本形式的问题作为输入，得出对应的答案，核心在于协同理解和处理视觉、文本信息. 因此，对VQA方法进行了详细综述，按照方法原理将现有的VQA方法分为数据融合、跨模态注意力和知识推理3类方法，全面总结分析了VQA方法的最新进展，介绍了常用的VQA数据集，并对未来的研究方向进行了展望.
关键词：	跨模态数据深度学习视觉问答数据融合跨模态注意力知识推理
收稿时间：	2021-04-28
修稿时间：	2021-06-07
Imagenet classification with deep convolutional neural networks

CUI Zheng, HU Yongli, SUN Yanfeng, YIN Baocai. Visual Question Answering Methods of Cross-modal Data Collaborative Analysis: a Survey[J]. Journal of Beijing University of Technology, 2022, 48(10): 1088-1099. DOI: 10.11936/bjutxb2021040030

Authors:	CUI Zheng HU Yongli SUN Yanfeng YIN Baocai

Affiliation:	Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China

Abstract:	Collaborative analysis and processing of cross-modal data are always difficult and hot topics in the field of modern artificial intelligence. The main challenge is the semantic and heterogeneous gap of cross-modal data. Recently, with the rapid development of deep learning theory and technology, algorithms based on deep learning have made great progress in the field of image and text processing, and then the research topic of visual question answering (VQA) has emerged. VQA system uses visual information and text questions as input to get corresponding answers. The core of the system is to understand and process visual and text information cooperatively. Therefore, VQA methods were reviewed in detail. According to the principle of methods, the existing VQA methods were divided into three categories including data fusion, cross-modal attention and knowledge reasoning. The latest development of VQA methods was comprehensively summarized and analyzed, the commonly used VQA data sets were introduced and prospects for future research direction were suggested.

Keywords:	cross-modal data deep learning visual question answering (VQA) data fusion cross-modal attention knowledge reasoning

	点击此处可从《北京工业大学学报》浏览原始摘要信息
	点击此处可从《北京工业大学学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏