首页 | 本学科首页   官方微博 | 高级检索  
     

基于鲁棒视觉变换和多注意力的全景图像显著性检测
引用本文:陈晓雷, 张鹏程, 卢禹冰, 曹宝宁. 基于鲁棒视觉变换和多注意力的全景图像显著性检测[J]. 电子与信息学报, 2023, 45(6): 2246-2255. doi: 10.11999/JEIT220684
作者姓名:陈晓雷  张鹏程  卢禹冰  曹宝宁
作者单位:1.兰州理工大学电气工程与信息工程学院 兰州 730050;;2.兰州理工大学甘肃省工业过程先进控制重点实验室 兰州 730050;;3.兰州理工大学电气与控制工程国家级实验教学示范中心 兰州 730050
基金项目:国家自然科学基金(61967012)
摘    要:针对当前全景图像显著性检测方法存在检测精度偏低、模型收敛速度慢和计算量大等问题,该文提出一种基于鲁棒视觉变换和多注意力的U型网络(URMNet)模型。该模型使用球形卷积提取全景图像的多尺度特征,减轻了全景图像经等矩形投影后的失真。使用鲁棒视觉变换模块提取4种尺度特征图所包含的显著信息,采用卷积嵌入的方式降低特征图的分辨率,增强模型的鲁棒性。使用多注意力模块,根据空间注意力与通道注意力间的关系,有选择地融合多维度注意力。最后逐步融合多层特征,形成全景图像显著图。纬度加权损失函数使该文模型具有更快的收敛速度。在两个公开数据集上的实验表明,该文所提模型因使用了鲁棒视觉变换模块和多注意力模块,其性能优于其他6种先进方法,能进一步提高全景图像显著性检测精度。

关 键 词:全景图像   显著性检测   卷积神经网络   视觉变换   注意力机制
收稿时间:2022-05-26
修稿时间:2022-08-18

Saliency Detection of Panoramic Images Based on Robust Vision Transformer and Multiple Attention
CHEN Xiaolei, ZHANG Pengcheng, LU Yubing, CAO Baoning. Saliency Detection of Panoramic Images Based on Robust Vision Transformer and Multiple Attention[J]. Journal of Electronics & Information Technology, 2023, 45(6): 2246-2255. doi: 10.11999/JEIT220684
Authors:CHEN Xiaolei  ZHANG Pengcheng  LU Yubing  CAO Baoning
Affiliation:1. School of Electrical Engineering and Information Engineering, Lanzhou University of Technology, Lanzhou 730050, China;;2. Gansu Provincial Key Laboratory of Advanced Control for Industrial Processes, Lanzhou University of Technology, Lanzhou 730050, China;;3. National Experimental Teaching demonstration Center of Electrical and Control Engineering, Lanzhou University of Technology, Lanzhou 730050, China
Abstract:Considering the problems of low detection accuracy, slow model convergence speed and large amount of computation in current panorama image saliency detection methods, a U-Net with Robust vision transformer and Multiple attention at tention modules (URMNet) is proposed. Sphere convolution is used to extract multi-scale features of panoramic images of the model,while reducing the distortion of panoramic images after equirectangular projection.The robust visual transformer module is used to extract the salient information contained in the feature maps of four scales, and the convolutional embedding is used to reduce the resolution of the feature maps and enhance the robustness of the model. The multiple attention module is used to integrate selectively multi-dimensional attention according to the relationship between spatial attention and channel attention. Finally, the multi-layer features are gradually fused to form a panoramic image saliency map. The latitude weighted loss function is used to make the model in this paper have a faster convergence rate. Experiments on two public datasets show that the model proposed in this paper outperforms other 6 advanced methods due to the use of a robust visual transformer module and a multiple attention module, and can further improve the saliency detection accuracy of panoramic images.
Keywords:Panoramic image  Saliency detection  Convolutional Neural Network(CNN)  Vision transformer  Attention mechanism
点击此处可从《电子与信息学报》浏览原始摘要信息
点击此处可从《电子与信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号