首页 | 本学科首页   官方微博 | 高级检索  
     

深度学习跨模态图文检索研究综述
引用本文:刘颖,郭莹莹,房杰,范九伦,郝羽,刘继明.深度学习跨模态图文检索研究综述[J].计算机科学与探索,2022,16(3):489-511.
作者姓名:刘颖  郭莹莹  房杰  范九伦  郝羽  刘继明
作者单位:西安邮电大学 图像与信息处理研究所,西安 710121;陕西省无线通信与信息处理技术国际合作研究中心,西安 710121;西安邮电大学 电子信息现场勘验应用技术公安部重点实验室,西安 710121,西安邮电大学 图像与信息处理研究所,西安 710121,西安邮电大学 图像与信息处理研究所,西安 710121;西安邮电大学 电子信息现场勘验应用技术公安部重点实验室,西安 710121,西安邮电大学 通信与信息工程学院,西安 710121
摘    要:随着深度神经网络的兴起,多模态学习受到广泛关注.跨模态检索是多模态学习的重要分支,其目的在于挖掘不同模态样本之间的关系,即通过一种模态样本来检索具有近似语义的另一种模态样本.近年来,跨模态检索逐渐成为国内外学术界研究的前沿和热点,是信息检索领域未来发展的重要方向.首先,聚焦于深度学习跨模态图文检索研究的最新进展,对基于...

关 键 词:跨模态检索  深度学习  特征学习  图文匹配  实值表示  二进制表示

Survey of Research on Deep Learning Image-Text Cross-Modal Retrieval
LIU Ying,GUO Yingying,FANG Jie,FAN Jiulun,HAO Yu,LIU Jiming.Survey of Research on Deep Learning Image-Text Cross-Modal Retrieval[J].Journal of Frontier of Computer Science and Technology,2022,16(3):489-511.
Authors:LIU Ying  GUO Yingying  FANG Jie  FAN Jiulun  HAO Yu  LIU Jiming
Affiliation:(Center for Image and Information Processing,Xi'an University of Posts and Telecommunications,Xi'an 710121,China;International Joint Research Center for Wireless Communication and Information Processing Technology of Shaanxi Province,Xi'an 710121,China;Key Laboratory of Electronic Information Application Technology for Crime Scene Investigation,Ministry of Public Security,Xi'an University of Posts and Telecommunications,Xi'an 710121,China;School of Communications and Information Engineering,Xi'an University of Posts and Telecommunications,Xi'an 710121,China)
Abstract:As the rapid development of deep neural networks,multi-modal learning techniques are widely concerned.Cross-modal retrieval is an important branch of multimodal learning.Its fundamental purpose is to reveal the relation between different modal samples by retrieving modal samples with identical semantics.In recent years,cross-modal retrieval has gradually become the forefront and hot spot of academic research.It’s an important direction in the future development of information retrieval.This paper focuses on the latest development of cross-modal retrieval based on deep learning,reviews the development trends of real value representation-based and binary representationbased learning methods systematically.Among them,the real value representation-based method is adopted to improve the semantic relevance,and improve the accuracy,and the binary representation-based learning method is used to improve the efficiency of image-text cross-modal retrieval and reduce storage space.In addition,the common open datasets in the field of image-text cross-modal retrieval are summarized,and the performance of various algorithms on different datasets is compared.Especially,this paper summarizes and analyzes the specified implementations of cross-modal retrieval techniques in the fields of public security,media and medicine.Finally,combined with the state-of-the-art technologies,development trends and future research directions are discussed.
Keywords:cross-modal retrieval  deep learning  feature learning  image-text matching  real value representation  binary representation
本文献已被 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号