基于动态双注意力机制的跨模态行人重识别模型 Cross-modal person re-identification model based on dynamic dual-attention mechanism期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于动态双注意力机制的跨模态行人重识别模型

引用本文：	李大伟,曾智勇.基于动态双注意力机制的跨模态行人重识别模型[J].计算机应用,2022,42(10):3200-3208.

作者姓名：	李大伟曾智勇

作者单位：	福建师范大学计算机与网络空间安全学院，福州 350117 福建师范大学数字福建大数据安全技术研究所，福州 350117

摘要：	针对跨模态行人重识别图像间模态差异大的问题，大多数现有方法采用像素对齐、特征对齐来实现图像间的匹配。为进一步提高两种模态图像间的匹配的精度，设计了一个基于动态双注意力机制的多输入双流网络模型。首先，在每个批次的训练中通过增加同一行人在不同相机下的图片，让神经网络在有限的样本中学习到充分的特征信息；其次，利用齐次增强得到灰度图像作为中间桥梁，在保留了可见光图像结构信息的同时消除了颜色信息，而灰度图像的运用弱化了网络对颜色信息的依赖，从而加强了网络模型挖掘结构信息的能力；最后，提出了适用于3个模态间图像的加权六向三元组排序（WSDR）损失，所提损失充分利用了不同视角下的跨模态三元组关系，优化了多个模态特征间的相对距离，并提高了对模态变化的鲁棒性。实验结果表明，在SYSU-MM01数据集上，与动态双注意聚合（DDAG）学习模型相比，所提模型在评价指标Rank-1和平均精确率均值（mAP）上分别提升了4.66和3.41个百分点。
关键词：	跨模态行人重识别多输入双流网络齐次增强加权六向三元组排序损失
收稿时间：	2021-08-24
修稿时间：	2021-12-06
Cross-modal person re-identification model based on dynamic dual-attention mechanism

Dawei LI,Zhiyong ZENG.Cross-modal person re-identification model based on dynamic dual-attention mechanism[J].journal of Computer Applications,2022,42(10):3200-3208.

Authors:	Dawei LI Zhiyong ZENG

Affiliation:	College of Computer and Cyber Security，Fujian Normal University，Fuzhou Fujian 350117，China Digital Fujian Institute of Big Data Security Technology，Fujian Normal University，Fuzhou Fujian 350117，China

Abstract:	Focused on the issue that huge modal difference between cross-modal person re-identification images， pixel alignment and feature alignment are commonly utilized by most of the existing methods to realize image matching. In order to further improve the accuracy of matching two modal images， a multi-input dual-stream network model based on dynamic dual-attention mechanism was designed. Firstly， the neural network was able to learn sufficient feature information in a limited number of samples by adding images of the same person taken by different cameras in each training batch. Secondly， the gray-scale image obtained by homogeneous augmentation was used as an intermediate bridge to retain the structural information of the visible light images and eliminate the color information at the same time. The use of gray-scale images weakened the network’s dependence on color information， thereby strengthening the network model’s ability to mine structural information. Finally， a Weighted Six-Directional triple Ranking （WSDR） loss suitable for images three modalities was proposed， which made full use of cross-modal triple relationship under different angles of view， optimized relative distance between multiple modal features and improved the robustness to modal changes. Experimental results on SYSU-MM01 dataset show that the proposed model increases evaluation indexes Rank-1 and mean Average Precision （mAP） by 4.66 and 3.41 percentage points respectively compared to Dynamic Dual-attentive AGgregation （DDAG） learning model.

Keywords:	cross-modal person re-identification multi-input dual-stream network homogeneous augmentation Weighted Six-Directional triple Ranking (WSDR) loss

	点击此处可从《计算机应用》浏览原始摘要信息
	点击此处可从《计算机应用》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏