非局部注意力双分支网络的跨模态赤足足迹检索 Non-local attention dual-branch network based cross-modal barefoot footprint retrieval期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

非局部注意力双分支网络的跨模态赤足足迹检索

引用本文：	鲍文霞,茅丽丽,王年,唐俊,杨先军,张艳. 非局部注意力双分支网络的跨模态赤足足迹检索[J]. 中国图象图形学报, 2022, 27(7): 2199-2213

作者姓名：	鲍文霞茅丽丽王年唐俊杨先军张艳

作者单位：	安徽大学电子信息工程学院, 合肥 230601;中国科学院合肥物质科学研究院, 合肥 230031

基金项目：	国家重点研发计划资助（2020YFF0303803）；国家自然科学基金项目（61772032）；安徽高校自然科学研究重点项目（KJ2021ZD0004，KJ2019A0027）

摘要：	目的针对目前足迹检索中存在的采集设备种类多样化、有效的足迹特征难以提取等问题,本文以赤足足迹图像为研究对象,提出一种基于非局部(non-local)注意力双分支网络的跨模态赤足足迹检索算法。方法该网络由特征提取、特征嵌入以及双约束损失模块构成,其中特征提取模块采用双分支结构,各分支均以Res Net50作为基础网络分别提取光学和压力赤足图像的有效特征;同时在特征嵌入模块中通过参数共享学习一个多模态的共享空间,并引入非局部注意力机制快速捕获长范围依赖,获得更大感受野,专注足迹图像整体压力分布,在增强每个模态有用特征的同时突出了跨模态之间的共性特征;为了增大赤足足迹图像类间特征差异和减小类内特征差异,利用交叉熵损失LCE(cross-entropy loss)和三元组损失LTRI(triplet loss)对整个网络进行约束,以更好地学习跨模态共享特征,减小模态间的差异。结果本文将采集的138人的光学赤足图像和压力赤足图像作为实验数据集,并将本文算法与细粒度跨模态检索方法 FGC(fine-grained cross-model)和跨模态行人重识别方法 HC(hetero-cente...
关键词：	图像检索跨模态足迹检索非局部注意力机制双分支网络赤足足迹图像
收稿时间：	2020-12-31
修稿时间：	2021-04-14
Non-local attention dual-branch network based cross-modal barefoot footprint retrieval

Bao Wenxi,Mao Lili,Wang Nian,Tang Jun,Yang Xianjun,Zhang Yan. Non-local attention dual-branch network based cross-modal barefoot footprint retrieval[J]. Journal of Image and Graphics, 2022, 27(7): 2199-2213

Authors:	Bao Wenxi Mao Lili Wang Nian Tang Jun Yang Xianjun Zhang Yan

Affiliation:	College of Electronic Information Engineering, Anhui University, Hefei 230601, China;Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei 230031, China

Abstract:	Objective Footprints are the highest rate of material evidence left and extracted from crime scene in general. Footprint retrieval and comparison plays an important role in criminal investigation. Footprint features are identified via the foot shape and bone structure of the person involved and have its features of specificity and stability. Meanwhile, footprints can reveal their essential behavior in the context of the physiological and behavioral characteristics. It is related to the biological features like height, body shape, gender, age and walking habits. Medical research results illustrates that footprint pressure information of each person is unique. It is challenged to improve the rate of discovery, extraction and utilization of footprints in criminal investigation. The retrieval of footprint image is of great significance, which will provide theoretical basis and technical support for footprint comparison and identification. Footprint images have different modes due to the diverse scenarios and tools of extraction. The global information of cross-modal barefoot images is unique, which can realize retrieval-oriented. The retrieval orientation retrieves the corresponding image of cross-modes. The traditional cross-modal retrieval methods are mainly in the context of subspace method and objective model method. These retrieval methods are difficult to obtain distinguishable features. The deep learning based retrieval methods construct multi-modal public space via convolutional neural network (CNN). The high-level semantic features of image can be captured in terms of iterative optimization of network parameters, to lower the multi-modal heterogeneity. Method A cross-modal barefoot footprint retrieval algorithm based on non-local attention two-branch network is demonstrated to resolve the issue of intra-class wide distance and inter-class narrow distance in fine-grained images. The collected barefoot footprint images involve optical mode and pressure mode. The median filter is applied to remove noises for all images, and the data augmentation method is used to expand the footprint images of each mode. In the feature extraction module, the pre-trained ResNet50 is used as basic network to extract the inherent features of each mode. In the feature embedding module, parameter sharing is realized by splicing feature vectors, and a multi-modal sharing space is constructed. All the residual blocks in the Layer2 and Layer3 of the ResNet50 use a non-local attention mechanism to capture long-range dependence, obtain a large receptive field, and highlight common features quickly. Simultaneously, cross-entropy loss and triplet loss are used to better learn multi-modal sharing space in order to reduce intra-class differences and increase inter-class differences of features. Our research tool is equipped with two NVIDIA 2070TI graphics CARDS, and the network is built in PyTorch. The size of the barefoot footprint images is 224×224 pixels. The stochastic gradient descent (SGD) optimizer is used for training. The number of iterations is 81, and the initial learning rate is 0.01. The trained network is validated by using the validation set, and the mean average precision (mAP) and rank values are obtained. In addition, the optimal model is saved in accordance with the highest rank1 value. The backup model is based on the test set, and the data of the final experimental results are recorded and saved. Result A cross-modal retrieval dataset is collected and constructed through a 138 person sample. Our comparative experiments are carried out to verify the effect of non-local attention mechanism in related to the retrieval efficiency, multiple loss functions and different pooling methods based on feature embedding modules. Our illustrated algorithm is compared to fine-grained cross-modal retrieval derived fine-grained cross-model (FGC) method and the RGB-infrared cross-modal person re-identification based hetero-center (HC) method. The number of people in the training set, verification set and test set is 82, 28 and 28, respectively, including 16 400 images, 5 600 images and 5 600 images each. The ratio of query images and retrieval images in the verification set and test set is 1:2. The evaluation indexes of the experiment are mAP mean (mAP_Avg) and rank1 mean (rank1_Avg) of two retrieval modes. Our analysis demonstrates that the algorithm illustrated has a higher precision, and the mAP_Avg and rank1_Avg are 83.95% and 96.5%, respectively. Compared with FGC and HC, the evaluation indexes of the proposed algorithm is 40.01% and 36.50% (higher than FGC), and 26.07% and 19.32% (higher than HC). Conclusion A cross-modal barefoot footprint retrieval algorithm is facilitated based on a non-local attention dual-branch network through the integration of non-local attention mechanism and double constraint loss. Our algorithm considers the uniqueness and correlation of in-modal and inter-modal features, and improves the performance of cross-modal barefoot footprint retrieval further, which can provide theoretical basis and technical support for footprint comparison and identification.

Keywords:	image retrieval cross-modal footprint retrieval non-local attention mechanism two-branch network barefoot footprint image

	点击此处可从《中国图象图形学报》浏览原始摘要信息
	点击此处可从《中国图象图形学报》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏