首页 | 本学科首页   官方微博 | 高级检索  
     

基于注意力机制和金字塔融合的RGB-D室内场景语义分割
引用本文:余娜,刘彦,魏雄炬,万源. 基于注意力机制和金字塔融合的RGB-D室内场景语义分割[J]. 计算机应用, 2022, 42(3): 844-853. DOI: 10.11772/j.issn.1001-9081.2021030392
作者姓名:余娜  刘彦  魏雄炬  万源
作者单位:武汉理工大学 理学院,武汉 430070
基金项目:国家级大学生创新创业训练计划项目(202010497047)~~;
摘    要:针对现有RGB-D室内场景语义分割不能有效融合多模态特征的问题,提出一种基于注意力机制和金字塔融合的RGB-D室内场景图像语义分割网络模型APFNet,并为其设计了两个新模块:注意力机制融合模块与金字塔融合模块.其中,注意力机制融合模块分别提取RGB特征和Depth特征的注意力分配权重,充分利用两种特征的互补性,使网络...

关 键 词:RGB-D语义分割  注意力机制  金字塔融合  多模态  深层监督
收稿时间:2021-03-16
修稿时间:2021-05-16

Semantic segmentation of RGB-D indoor scenes based on attention mechanism and pyramid fusion
YU Na,LIU Yan,WEI Xiongju,WAN Yuan. Semantic segmentation of RGB-D indoor scenes based on attention mechanism and pyramid fusion[J]. Journal of Computer Applications, 2022, 42(3): 844-853. DOI: 10.11772/j.issn.1001-9081.2021030392
Authors:YU Na  LIU Yan  WEI Xiongju  WAN Yuan
Affiliation:College of Science,Wuhan University of Technology,Wuhan Hubei 430070,China
Abstract:Aiming at the issue of ineffective fusion of multi-modal features of indoor scene semantic segmentation using RGB-D, a network named APFNet (Attention mechanism and Pyramid Fusion Network) was proposed, in which attention mechanism fusion module and pyramid fusion module were designed. To fully use the complementarity of the RGB features and the Depth features, the attention allocation weights of these two kinds of features were respectively extracted by the attention mechanism fusion module, making the network focus more on the multi-modal feature domain with more information content. Local and global information were fused by pyramid fusion module with four different scales of pyramid features, thus scene context was extracted and segmentation accuracies of object edges and small-scale objects were improved. By integrating these two fusion modules into a three-branch “encoder-decoder” network, an “end-to-end” output was realized. Comarative experiments were implemented with the state-of-the-art methods, such as multi-level RGB-D residual feature Fusion network (RDF-152), Attention Complementary features Network (ACNet) and Spatial information Guided convolution Network (SGNet) on the SUN RGB-D and NYU Depth v2 datasets. Compared with the best-performing method RDF-152, when the layer number of the encoder network was reduced from 152 to 50, the Pixel Accuracy (PA), Mean Pixel Accuracy (MPA), and Mean Intersection over Union (MIoU) of APFNet were respectively increased by 0.4, 1.1 and 3.2 percentage points. The semantic segmentation accuracies for small-scale objects such as pillows and photos, and large-scale objects such as boards and ceilings were increased by 0.9 to 3.4 and 12.4 to 18 percentage points respectively. The results show that the proposed APFNet has some advantages in dealing with the semantic segmentation of indoor scenes.
Keywords:RGB-D semantic segmentation  attention mechanism  pyramid fusion  multi-modal  deep supervision  
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号