首页 | 本学科首页   官方微博 | 高级检索  
     

多尺度特征金字塔融合的街景图像语义分割
引用本文:曲海成,王莹,董康龙,刘万军. 多尺度特征金字塔融合的街景图像语义分割[J]. 计算机系统应用, 2024, 33(3): 73-84
作者姓名:曲海成  王莹  董康龙  刘万军
作者单位:辽宁工程技术大学 软件学院, 葫芦岛 125105
基金项目:国家自然科学基金面上项目(42271409); 辽宁省高等学校基本科研项目(LIKMZ20220699)
摘    要:针对街景图像语义分割任务中的目标尺寸差异大、多尺度特征难以高效提取的问题, 本文提出了一种语义分割网络(LDPANet). 首先, 将空洞卷积与引入残差学习单元的深度可分离卷积结合, 来优化编码器结构, 在降低了计算复杂度的同时缓解梯度消失的问题. 然后利用层传递的迭代空洞空间金字塔, 将自顶向下的特征信息依次融合, 提高了上下文信息的有效交互能力; 在多尺度特征融合之后引入属性注意力模块, 使网络抑制冗余信息, 强化重要特征. 再者, 以通道扩展上采样代替双线插值上采样作为解码器, 进一步提升了特征图的分辨率. 最后, LDPANet方法在Cityscapes和CamVid数据集上的精度分别达到了91.8%和87.52%, 与近几年网络模型相比, 本文网络模型可以精确地提取像素的位置信息以及空间维度信息, 提高了语义分割的准确率.

关 键 词:语义分割  MDSDC  IDCP-LC  属性注意力  通道扩展上采样  特征融合
收稿时间:2023-08-31
修稿时间:2023-09-26

Semantic Segmentation of Street Scenes Images Based on Multi-scale Feature Pyramid Fusion
QU Hai-Cheng,WANG Ying,DONG Kang-Long,LIU Wan-Jun. Semantic Segmentation of Street Scenes Images Based on Multi-scale Feature Pyramid Fusion[J]. Computer Systems& Applications, 2024, 33(3): 73-84
Authors:QU Hai-Cheng  WANG Ying  DONG Kang-Long  LIU Wan-Jun
Affiliation:Software College, Liaoning Technical University, Huludao 125105, China
Abstract:This study proposes a semantic segmentation network called LDPANet to address the challenges of significant variations in target sizes and the difficulty of efficient extraction of multi-scale features in semantic segmentation tasks of street scene images. Firstly, the void convolution is combined with the deeply separable convolution introduced into the residual learning unit to optimize the encoder structure, which reduces computational complexity and alleviates the problem of gradient vanishing. Secondly, the network utilizes a layer-wise iterative void spatial pyramid to sequentially fuse top-down feature information, enhancing the effective interaction of contextual information. After multi-scale feature fusion, an attribute attention module is introduced to suppress redundant information and strengthen important features. Furthermore, channel-extended upsampling replaces two-wire interpolation upsampling as the decoder to further improve the resolution of feature maps. Finally, the accuracy of the LDPANet method on Cityscapes and CamVid datasets reaches 91.8% and 87.52%, respectively. Compared with the network model in recent years, the proposed network model can accurately extract pixel position information and spatial dimension information and improve the accuracy of semantic segmentation.
Keywords:semantic segmentation  mixed depthwise separable dilated convolution (MDSDC)  iterative dilated convolution pyramid with layer cascade (IDCP-LC)  attribute attention  channel expansion upsampling  feature fusion
点击此处可从《计算机系统应用》浏览原始摘要信息
点击此处可从《计算机系统应用》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号