跨模态交互融合与全局感知的RGB-D显著性目标检测 RGB-D Salient Object Detection Based on Cross-modal Interactive Fusion and Global Awareness期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

跨模态交互融合与全局感知的RGB-D显著性目标检测

引用本文：	孙福明,胡锡航,武景宇,孙静,王法胜.跨模态交互融合与全局感知的RGB-D显著性目标检测[J].软件学报,2024,35(4):1899-1913.

作者姓名：	孙福明胡锡航武景宇孙静王法胜

作者单位：	大连民族大学信息与通信工程学院, 辽宁大连 116600

基金项目：	国家自然科学基金(61976042, 61972068)；兴辽英才计划(XLYC2007023)；辽宁省高等学校创新人才支持计划(LR2019020)

摘要：	近年来, RGB-D显著性检测方法凭借深度图中丰富的几何结构和空间位置信息, 取得了比RGB显著性检测模型更好的性能, 受到学术界的高度关注. 然而, 现有的RGB-D检测模型仍面临着持续提升检测性能的需求. 最近兴起的Transformer擅长建模全局信息, 而卷积神经网络(CNN)擅长提取局部细节. 因此, 如何有效结合CNN和Transformer两者的优势, 挖掘全局和局部信息, 将有助于提升显著性目标检测的精度. 为此, 提出一种基于跨模态交互融合与全局感知的RGB-D显著性目标检测方法, 通过将Transformer网络嵌入U-Net中, 从而将全局注意力机制与局部卷积结合在一起, 能够更好地对特征进行提取. 首先借助U-Net编码-解码结构, 高效地提取多层次互补特征并逐级解码生成显著特征图. 然后, 使用Transformer模块学习高级特征间的全局依赖关系增强特征表示, 并针对输入采用渐进上采样融合策略以减少噪声信息的引入. 其次, 为了减轻低质量深度图带来的负面影响, 设计一个跨模态交互融合模块以实现跨模态特征融合. 最后, 5个基准数据集上的实验结果表明, 所提算法与其他最新的算法相比具有显著优势.
关键词：	显著性目标检测跨模态全局注意力机制 RGB-D检测模型
收稿时间：	2022/6/29 0:00:00
修稿时间：	2022/9/1 0:00:00
RGB-D Salient Object Detection Based on Cross-modal Interactive Fusion and Global Awareness

SUN Fu-Ming,HU Xi-Hang,WU Jing-Yu,SUN Jing,WANG Fa-Sheng.RGB-D Salient Object Detection Based on Cross-modal Interactive Fusion and Global Awareness[J].Journal of Software,2024,35(4):1899-1913.

Authors:	SUN Fu-Ming HU Xi-Hang WU Jing-Yu SUN Jing WANG Fa-Sheng

Affiliation:	School of Information and Communication Engineering, Dalian Minzu University, Dalian 116600, China

Abstract:	In recent years, RGB-D salient detection method has achieved better performance than RGB salient detection model by virtue of its rich geometric structure and spatial position information in depth maps and thus has been highly concerned by the academic community. However, the existing RGB-D detection model still faces the challenge of improving performance continuously. The emerging Transformer is good at modeling global information, while the convolutional neural network (CNN) is good at extracting local details. Therefore, effectively combining the advantages of CNN and Transformer to mine global and local information will help to improve the accuracy of salient object detection. For this purpose, an RGB-D salient object detection method based on cross-modal interactive fusion and global awareness is proposed in this study. The transformer network is embedded into U-Net to better extract features by combining the global attention mechanism with local convolution. First, with the help of the U-Net encoder-decoder structure, this study efficiently extracts multi-level complementary features and decodes them step by step to generate a salient feature map. Then, the Transformer module is used to learn the global dependency between high-level features to enhance the feature representation, and the progressive upsampling fusion strategy is used to process the input and reduce the introduction of noise information. Moreover, to reduce the negative impact of low-quality depth maps, the study also designs a cross-modal interactive fusion module to realize cross-modal feature fusion. Finally, experimental results on five benchmark datasets show that the proposed algorithm has an excellent performance than other latest algorithms.

Keywords:	salient object detection (SOD) cross-modal global attention mechanism RGB-D detection model

	点击此处可从《软件学报》浏览原始摘要信息
	点击此处可从《软件学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏