基于注意力机制和金字塔融合的RGB-D室内场景语义分割 Semantic segmentation of RGB-D indoor scenes based on attention mechanism and pyramid fusion期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于注意力机制和金字塔融合的RGB-D室内场景语义分割

引用本文：	余娜,刘彦,魏雄炬,万源. 基于注意力机制和金字塔融合的RGB-D室内场景语义分割[J]. 计算机应用, 2022, 42(3): 844-853. DOI: 10.11772/j.issn.1001-9081.2021030392

作者姓名：	余娜刘彦魏雄炬万源

作者单位：	武汉理工大学理学院，武汉 430070

基金项目：	国家级大学生创新创业训练计划项目（202010497047）~~；

摘要：	针对现有RGB-D室内场景语义分割不能有效融合多模态特征的问题,提出一种基于注意力机制和金字塔融合的RGB-D室内场景图像语义分割网络模型APFNet,并为其设计了两个新模块:注意力机制融合模块与金字塔融合模块.其中,注意力机制融合模块分别提取RGB特征和Depth特征的注意力分配权重,充分利用两种特征的互补性,使网络...
关键词：	RGB-D语义分割注意力机制金字塔融合多模态深层监督
收稿时间：	2021-03-16
修稿时间：	2021-05-16
Semantic segmentation of RGB-D indoor scenes based on attention mechanism and pyramid fusion

YU Na,LIU Yan,WEI Xiongju,WAN Yuan. Semantic segmentation of RGB-D indoor scenes based on attention mechanism and pyramid fusion[J]. Journal of Computer Applications, 2022, 42(3): 844-853. DOI: 10.11772/j.issn.1001-9081.2021030392

Authors:	YU Na LIU Yan WEI Xiongju WAN Yuan

Affiliation:	College of Science，Wuhan University of Technology，Wuhan Hubei 430070，China

Abstract:	Aiming at the issue of ineffective fusion of multi-modal features of indoor scene semantic segmentation using RGB-D， a network named APFNet （Attention mechanism and Pyramid Fusion Network） was proposed， in which attention mechanism fusion module and pyramid fusion module were designed. To fully use the complementarity of the RGB features and the Depth features， the attention allocation weights of these two kinds of features were respectively extracted by the attention mechanism fusion module， making the network focus more on the multi-modal feature domain with more information content. Local and global information were fused by pyramid fusion module with four different scales of pyramid features， thus scene context was extracted and segmentation accuracies of object edges and small-scale objects were improved. By integrating these two fusion modules into a three-branch “encoder-decoder” network， an “end-to-end” output was realized. Comarative experiments were implemented with the state-of-the-art methods， such as multi-level RGB-D residual feature Fusion network （RDF-152）， Attention Complementary features Network （ACNet） and Spatial information Guided convolution Network （SGNet） on the SUN RGB-D and NYU Depth v2 datasets. Compared with the best-performing method RDF-152， when the layer number of the encoder network was reduced from 152 to 50， the Pixel Accuracy （PA）， Mean Pixel Accuracy （MPA）， and Mean Intersection over Union （MIoU） of APFNet were respectively increased by 0.4， 1.1 and 3.2 percentage points. The semantic segmentation accuracies for small-scale objects such as pillows and photos， and large-scale objects such as boards and ceilings were increased by 0.9 to 3.4 and 12.4 to 18 percentage points respectively. The results show that the proposed APFNet has some advantages in dealing with the semantic segmentation of indoor scenes.

Keywords:	RGB-D semantic segmentation attention mechanism pyramid fusion multi-modal deep supervision
本文献已被万方数据等数据库收录！
	点击此处可从《计算机应用》浏览原始摘要信息
	点击此处可从《计算机应用》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏