首页 | 本学科首页   官方微博 | 高级检索  
     

基于动态双注意力机制的跨模态行人重识别模型
引用本文:李大伟,曾智勇.基于动态双注意力机制的跨模态行人重识别模型[J].计算机应用,2022,42(10):3200-3208.
作者姓名:李大伟  曾智勇
作者单位:福建师范大学 计算机与网络空间安全学院,福州 350117
福建师范大学 数字福建大数据安全技术研究所,福州 350117
摘    要:针对跨模态行人重识别图像间模态差异大的问题,大多数现有方法采用像素对齐、特征对齐来实现图像间的匹配。为进一步提高两种模态图像间的匹配的精度,设计了一个基于动态双注意力机制的多输入双流网络模型。首先,在每个批次的训练中通过增加同一行人在不同相机下的图片,让神经网络在有限的样本中学习到充分的特征信息;其次,利用齐次增强得到灰度图像作为中间桥梁,在保留了可见光图像结构信息的同时消除了颜色信息,而灰度图像的运用弱化了网络对颜色信息的依赖,从而加强了网络模型挖掘结构信息的能力;最后,提出了适用于3个模态间图像的加权六向三元组排序(WSDR)损失,所提损失充分利用了不同视角下的跨模态三元组关系,优化了多个模态特征间的相对距离,并提高了对模态变化的鲁棒性。实验结果表明,在SYSU-MM01数据集上,与动态双注意聚合(DDAG)学习模型相比,所提模型在评价指标Rank-1和平均精确率均值(mAP)上分别提升了4.66和3.41个百分点。

关 键 词:跨模态  行人重识别  多输入双流网络  齐次增强  加权六向三元组排序损失  
收稿时间:2021-08-24
修稿时间:2021-12-06

Cross-modal person re-identification model based on dynamic dual-attention mechanism
Dawei LI,Zhiyong ZENG.Cross-modal person re-identification model based on dynamic dual-attention mechanism[J].journal of Computer Applications,2022,42(10):3200-3208.
Authors:Dawei LI  Zhiyong ZENG
Affiliation:College of Computer and Cyber Security,Fujian Normal University,Fuzhou Fujian 350117,China
Digital Fujian Institute of Big Data Security Technology,Fujian Normal University,Fuzhou Fujian 350117,China
Abstract:Focused on the issue that huge modal difference between cross-modal person re-identification images, pixel alignment and feature alignment are commonly utilized by most of the existing methods to realize image matching. In order to further improve the accuracy of matching two modal images, a multi-input dual-stream network model based on dynamic dual-attention mechanism was designed. Firstly, the neural network was able to learn sufficient feature information in a limited number of samples by adding images of the same person taken by different cameras in each training batch. Secondly, the gray-scale image obtained by homogeneous augmentation was used as an intermediate bridge to retain the structural information of the visible light images and eliminate the color information at the same time. The use of gray-scale images weakened the network’s dependence on color information, thereby strengthening the network model’s ability to mine structural information. Finally, a Weighted Six-Directional triple Ranking (WSDR) loss suitable for images three modalities was proposed, which made full use of cross-modal triple relationship under different angles of view, optimized relative distance between multiple modal features and improved the robustness to modal changes. Experimental results on SYSU-MM01 dataset show that the proposed model increases evaluation indexes Rank-1 and mean Average Precision (mAP) by 4.66 and 3.41 percentage points respectively compared to Dynamic Dual-attentive AGgregation (DDAG) learning model.
Keywords:cross-modal  person re-identification  multi-input dual-stream network  homogeneous augmentation  Weighted Six-Directional triple Ranking (WSDR) loss  
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号