首页 | 本学科首页   官方微博 | 高级检索  
     

基于锚点的无监督跨模态哈希算法
引用本文:胡鹏,彭玺,彭德中. 基于锚点的无监督跨模态哈希算法[J]. 软件学报, 2024, 35(8): 3739-3751
作者姓名:胡鹏  彭玺  彭德中
作者单位:四川大学 计算机学院, 四川 成都 610065;四川大学 计算机学院, 四川 成都 610065;成都瑞贝英特信息技术有限公司, 四川 成都 610094
基金项目:国家自然科学基金(62102274, 62176171, U21B2040 U19A2078); 四川省科技计划(2021YFS0389, 2022YFQ0014, 2022YFSY0047, 2022YFH0021); 中央高校基本科研业务费专项资金(YJ202140); 中国博士后科学基金(2021M692270)
摘    要:基于图的无监督跨模态哈希学习具有存储空间小、检索效率高等优点, 受到学术界和工业界的广泛关注, 已成为跨模态检索不可或缺的工具之一. 然而, 图构造的高计算复杂度阻碍其应用于大规模多模态应用. 主要尝试解决基于图的无监督跨模态哈希学习面临的两个重要挑战: 1)在无监督跨模态哈希学习中如何高效地构建图? 2)如何解决跨模态哈希学习中的离散值优化问题? 针对这两个问题, 分别提出基于锚点图的跨模态学习和可微分哈希层. 具体地, 首先从训练集中随机地选择若干图文对作为锚点集, 利用该锚点集作为中介计算每批数据的图矩阵, 以该图矩阵指导跨模态哈希学习, 从而能极大地降低空间与时间开销; 其次, 提出的可微分哈希层可在网络前向传播时直接由二值编码计算, 在反向传播时亦可产生梯度进行网络更新, 而无需连续值松弛, 从而具有更好的哈希编码效果; 最后, 引入跨模态排序损失, 使得在训练过程中考虑排序结果, 从而提升跨模态检索正确率. 通过在3个通用数据集上与10种跨模态哈希算法进行对比, 验证了提出算法的有效性.

关 键 词:无监督哈希学习  跨模态检索  锚点图  可微分哈希  公共汉明空间
收稿时间:2021-08-30
修稿时间:2022-10-13

Anchor-based Unsupervised Cross-modal Hashing
HU Peng,PENG Xi,PENG De-Zhong. Anchor-based Unsupervised Cross-modal Hashing[J]. Journal of Software, 2024, 35(8): 3739-3751
Authors:HU Peng  PENG Xi  PENG De-Zhong
Affiliation:College of Computer Science, Sichuan University, Chengdu 610065, China; College of Computer Science, Sichuan University, Chengdu 610065, China;Chengdu Ruibei Yingte Information Technology Co. Ltd., Chengdu 610094, China
Abstract:Thanks to the low storage cost and high retrieval speed, graph-based unsupervised cross-modal hash learning has attracted much attention from academic and industrial researchers and has been an indispensable tool for cross-modal retrieval. However, the high computational complexity of graph structures prevents its application in large-scale multi-modal applications. This study mainly attempts to solve two important challenges facing graph-based unsupervised cross-modal hash learning: 1) How to efficiently construct graphs in unsupervised cross-modal hash learning? 2) How to handle the discrete optimization in cross-modal hash learning? To address such two problems, this study presents anchor-based cross-modal learning and a differentiable hash layer. To be specific, the study first randomly samples some image-text pairs from the training set as anchor sets and uses the anchor sets as the agent to compute the graph matrix of each batch of data. The graph matrix is used to guide cross-modal hash learning, thus remarkably reducing the space and time cost; second, the proposed differentiable hash layer directly adopts binary coding for computation during network forward propagation and produces gradient to update the network without continuous-value relaxation during backpropagation, thus embracing better hash encoding performance. Finally, the study introduces cross-modal ranking loss to consider the ranking results in the training process and improve the cross-modal retrieval accuracy. To verify the effectiveness of the proposed algorithm, the study compares the algorithm with 10 cross-modal hash algorithms on three general data sets.
Keywords:unsupervised hashing learning  cross-modal retrieval  anchor graph  differentiable hashing  common Hamming space
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号