首页 | 本学科首页   官方微博 | 高级检索  
     

多模态深度学习综述
引用本文:孙影影,贾振堂,朱昊宇. 多模态深度学习综述[J]. 计算机工程与应用, 2020, 56(21): 1-10. DOI: 10.3778/j.issn.1002-8331.2002-0342
作者姓名:孙影影  贾振堂  朱昊宇
作者单位:上海电力大学 电子与信息工程学院,上海 200090
基金项目:国家自然科学基金青年科学基金
摘    要:模态是指人接收信息的方式,包括听觉、视觉、嗅觉、触觉等多种方式。多模态学习是指通过利用多模态之间的互补性,剔除模态间的冗余性,从而学习到更好的特征表示。多模态学习的目的是建立能够处理和关联来自多种模式信息的模型,它是一个充满活力的多学科领域,具有日益重要和巨大的潜力。目前比较热门的研究方向是图像、视频、音频、文本之间的多模态学习。着重介绍了多模态在视听语音识别、图文情感分析、协同标注等实际层面的应用,以及在匹配和分类、对齐表示学习等核心层面的应用,并针对多模态学习的核心问题:匹配和分类、对齐表示学习方面给出了说明。对多模态学习中常用的数据集进行了介绍,并展望了未来多模态学习的发展趋势。

关 键 词:多模态学习  多模态应用  多模态融合  共享表示空间

Survey of Multimodal Deep Learning
SUN Yingying,JIA Zhentang,ZHU Haoyu. Survey of Multimodal Deep Learning[J]. Computer Engineering and Applications, 2020, 56(21): 1-10. DOI: 10.3778/j.issn.1002-8331.2002-0342
Authors:SUN Yingying  JIA Zhentang  ZHU Haoyu
Affiliation:College of Electronics and Information Engineering, Shanghai University of Electric Power, Shanghai 200090, China
Abstract:Modal refers to the way people receive information, including hearing, vision, smell, touch and other ways. Multimodal learning refers to learning better feature representation by using the complementarity between multimodes and eliminating the redundancy between them. The purpose of multimodal learning is to build a model that can deal with and correlate information from multiple modes. It is a dynamic multidisciplinary field, with increasing importance and great potential. At present, the popular research direction is multimodal learning among image, video, audio and text. This paper focuses on the application of multimodality in audio-visual speech recognition, image and text emotion analysis, collaborative annotation and other practical levels, as well as the application in the core level of matching and classification, alignment representation learning, and gives an explanation for the core issues of multimodal learning:matching and classification, alignment representation learning. Finally, the common data sets in multimodal learning are introduced, and the development trend of multimodal learning in the future is prospected.
Keywords:multimodal learning  multimodal application  multimodal fusion  shared representation space  
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号