首页 | 本学科首页   官方微博 | 高级检索  
     

多模态信息处理前沿综述:应用、融合和预训练
引用本文:吴友政,李浩然,姚霆,何晓冬.多模态信息处理前沿综述:应用、融合和预训练[J].中文信息学报,2022,36(5):1-20.
作者姓名:吴友政  李浩然  姚霆  何晓冬
作者单位:京东人工智能研究院,北京 100101
基金项目:科技创新2030-“新一代人工智能”重大项目(2020AAA0108600)
摘    要:随着视觉、听觉、语言等单模态人工智能技术的突破,让计算机拥有更接近人类理解多模态信息的能力受到研究者们的广泛关注。另一方面,随着图文社交、短视频、视频会议、直播和虚拟数字人等应用的涌现,对多模态信息处理技术提出了更高要求,同时也给多模态研究提供了海量的数据和丰富的应用场景。该文首先介绍了近期自然语言处理领域关注度较高的多模态应用,并从单模态的特征表示、多模态的特征融合阶段、融合模型的网络结构、未对齐模态和模态缺失下的多模态融合等角度综述了主流的多模态融合方法,同时也综合分析了视觉-语言跨模态预训练模型的最新进展。

关 键 词:多模态信息处理  多模态融合  多模态预训练  自然语言处理  

A Survey of Multimodal Information Processing Frontiers: Application,Fusion and Pre-training
WU Youzheng,LI Haoran,YAO Ting,HE Xiaodong.A Survey of Multimodal Information Processing Frontiers: Application,Fusion and Pre-training[J].Journal of Chinese Information Processing,2022,36(5):1-20.
Authors:WU Youzheng  LI Haoran  YAO Ting  HE Xiaodong
Affiliation:JD AI Research, Beijing 100101, China
Abstract:Over the past decade, there has been a steady momentum of innovation and breakthroughs that convincingly push the limits of modeling single modality, e.g., vision, speech and language. Going beyond such research progresses made in single modality, the rise of multimodal social network, short video applications, video conferencing, live video streaming and digital human highly demands the development of multimodal intelligence and offers a fertile ground for multimodal analysis. This paper reviews recent multimodal applications that have attracted intensive attention in the field of natural language processing, and summarizes the mainstream multimodal fusion approaches from the perspectives of single modal representation, multimodal fusion stage, fusion network, fusion of unaligned modalities, and fusion of missing modalities. In addition, this paper elaborate the latest progresses of the vision-language pre-training.
Keywords:multimodal information processing  multimodal fusion  multimodal pre-training  natural language processing  
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号