自适应混合注意力深度跨模态哈希 Adaptive hybrid attention hashing for deep cross-modal retrieval期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

自适应混合注意力深度跨模态哈希

引用本文：	柳兴华,曹桂涛,林秋斌,曹文明.自适应混合注意力深度跨模态哈希[J].计算机应用,2022,42(12):3663-3670.

作者姓名：	柳兴华曹桂涛林秋斌曹文明

作者单位：	深圳大学电子与信息工程学院, 广东深圳 518060 广东省多媒体信息服务工程技术研究中心(深圳大学), 广东深圳 518060 华东师范大学软件工程学院, 上海 200062

基金项目：	国家自然科学基金资助项目(61771322)

摘要：	针对现有哈希方法在特征学习过程中无法区分各区域特征信息的重要程度和不能充分利用标签信息来深度挖掘模态间相关性的问题，提出了自适应混合注意力深度跨模态哈希检索（AHAH）模型。首先，通过自主学习得到的权重将通道注意力和空间注意力有机结合来强化对特征图中相关目标区域的关注度，同时弱化对不相关目标区域的关注度；其次，通过对模态标签进行统计分析，并使用所提出的相似度计算方法将相似度量化为0～1的数字以更精细地表示模态间的相似性。在4个常用的数据集MIRFLICKR-25K、NUS-WIDE、MSCOCO和IAPR TC-12上，当哈希码长度为16 bit时，与最先进的方法多标签语义保留哈希（MLSPH）相比，所提方法的检索平均准确率均值（mAP）分别提高了2.25%、1.75%、6.8%和2.15%。此外，消融实验和效率分析也证明了所提方法的有效性。
关键词：	跨模态检索哈希方法深度神经网络自适应混合注意力
收稿时间：	2021-10-22
修稿时间：	2021-12-20
Adaptive hybrid attention hashing for deep cross-modal retrieval

Xinghua LIU,Guitao CAO,Qiubin LIN,Wenming CAO.Adaptive hybrid attention hashing for deep cross-modal retrieval[J].journal of Computer Applications,2022,42(12):3663-3670.

Authors:	Xinghua LIU Guitao CAO Qiubin LIN Wenming CAO

Affiliation:	College of Electronics and Information Engineering，Shenzhen University，Shenzhen Guangdong 518060，China Guangdong Multimedia Information Service Engineering Technology Research Center （Shenzhen University），Shenzhen Guangdong 518060，China Software Engineering Institute，East China Normal University，Shanghai 200062，China

Abstract:	In feature learning process， the existing hashing methods cannot distinguish the importance of the feature information of each region， and cannot utilize the label information to explore the correlation between modalities. Therefore， an Adaptive Hybrid Attention Hashing for deep cross-modal retrieval （AHAH） model was proposed. Firstly， channel attention and spatial attention were combined by the weights obtained by autonomous learning to strengthen the attention to the relevant target area and weaken the attention to the irrelevant target area. Secondly， the similarity between modalities was expressed more finely through the statistical analysis of modality labels and quantification of similarity degrees to numbers between 0 and 1 by using the proposed similarity measurement method. Compared with the most advanced method Multi-Label Semantics Preserving Hashing （MLSPH） on four commonly used datasets MIRFLICKR-25K， NUS-WIDE， MSCOCO， and IAPR TC-12， when the hash code length is 16 bit， the proposed method has the retrieval mean Average Precision （mAP） increased by 2.25%， 1.75%， 6.8%， and 2.15%， respectively. In addition， ablation experiments and efficiency analysis also prove the effectiveness of the proposed method.

Keywords:	cross-modal retrieval hashing method deep neural network adaptive hybrid attention

	点击此处可从《计算机应用》浏览原始摘要信息
	点击此处可从《计算机应用》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏