基于特征金字塔融合表征网络的跨模态哈希方法 Feature pyramid fusion representation network for cross-modal hashing期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于特征金字塔融合表征网络的跨模态哈希方法

引用本文：	阮海涛,曾焕强,朱建清,温廷羲,蔡灿辉.基于特征金字塔融合表征网络的跨模态哈希方法[J].信号处理,2021,37(7):1252-1259.

作者姓名：	阮海涛曾焕强朱建清温廷羲蔡灿辉

作者单位：	华侨大学信息科学与工程学院

基金项目：	国家自然科学基金（61871434，61802136）；福建省自然科学基金杰出青年项目（2019J06017）；厦门市科技重大项目（3502ZCQ20191005）；厦门市科技局产学研协同创新项目（3502Z20203033）；福建省教改项目（FBJG20180038）

摘要：	随着多模态数据的爆发式增长，跨模态检索作为一种搜索多模态数据的最常用方法，受到越来越多的关注。然而，目前存在的大多数深度学习的方法仅仅采用模型后端最后一个全连接层输出作为模态独有的高层语义表征，忽视了多个层次上不同尺度特征之间的语义相关性，具有一定的局限性。为此，本文提出一种基于特征金字塔融合表征网络的跨模态哈希检索方法。该方法设计了一种特征金字塔融合表征网络，通过在多个层次和不同尺度上进行特征提取并融合，挖掘多个层次上不同尺度下模态特征的语义相关性，充分利用模态特有的特征，使网络输出的语义表征更具有代表性。最后设计了三重损失函数:模态间损失，模态内损失和汉明空间损失对模型进行训练学习。实验结果表明，本文所提方法在MIRFLICKR-25K和NUS-WIDE数据集上均获得了良好的跨模态检索效果。
关键词：	跨模态检索特征金字塔融合表征哈希
收稿时间：	2021-02-19
Feature pyramid fusion representation network for cross-modal hashing

Affiliation:	School of Information Science and Engineering, Huaqiao University

Abstract:	With the explosive growth of multi-modal data, cross-modal retrieval, as the most commonly-used method to search multi-modal data, has received extensive attention. However, most of the current deep learning methods only use the output of the final fully connected layer as the modal-special high-level semantic representation, ignoring the semantic correlation between features with different scales extracted from multiple levels, thus have certain limitations. In this paper, we proposed a cross-modal hash retrieval method based on feature pyramid fusion representation network. This method designed a feature pyramid fusion representation network. Through feature extraction and fusion at multiple levels and different scales, the semantic correlation of modal features with different scales at multiple levels is mined, and the modal-special features are fully utilized to make the semantic representation of the network output more representative. Finally, a triple loss function is designed to train the model, including the inter-modal loss, the intra-modal loss, and hamming space loss. The experimental results on both MIRFLICKR-25K and NUS-WIDE datasets show that the proposed method in this paper has obtained good cross-modal retrieval results.

Keywords:

	点击此处可从《信号处理》浏览原始摘要信息
	点击此处可从《信号处理》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏