基于多尺度注意力机制的道路场景语义分割模型 Road Scene Semantic Segmentation Model Based on Multi-Scale Attention Mechanism期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于多尺度注意力机制的道路场景语义分割模型

引用本文：	范润泽,刘宇红,张荣芬,李景玉.基于多尺度注意力机制的道路场景语义分割模型[J].计算机工程,2023,49(2):288-295.

作者姓名：	范润泽刘宇红张荣芬李景玉

作者单位：	贵州大学大数据与信息工程学院, 贵阳 550025

基金项目：	贵州省科学技术基金（黔科合基础-ZK［2021］重点001）。

摘要：	通过对道路场景进行语义分割可以辅助车辆感知周边环境，达到避让行人、车辆以及各类小目标物体障碍的目的，提高行驶的安全性。针对道路场景语义分割中小目标物体识别精度不高、网络参数量过大等问题，提出一种基于多尺度注意力机制的语义分割模型。利用小波变换的多尺度多频率信息分析特性，设计一种多尺度小波注意力模块，并将其嵌入到编码器结构中，通过融合不同尺度及频率的特征信息，保留更多的边缘轮廓细节。使用编码器与解码器之间的层级连接，以及改进的金字塔池化模块进行多方面特征提取，在保留上下文特征信息的同时获得更多的图像细节。通过设计多级损失函数训练网络模型，从而加快网络收敛。在剑桥驾驶标注视频数据集上的实验结果表明，该模型的平均交并比为60.21%，与DeepLabV3+和DenseASPP模型相比参数量减少近30%，在不额外增加参数量的前提下提升了模型的分割精度，且在不同场景下均具有较好的鲁棒性。
关键词：	深度学习语义分割注意力机制小波变换金字塔池化
收稿时间：	2021-11-17
修稿时间：	2022-03-06
Road Scene Semantic Segmentation Model Based on Multi-Scale Attention Mechanism

FAN Runze,LIU Yuhong,ZHANG Rongfen,LI Jingyu.Road Scene Semantic Segmentation Model Based on Multi-Scale Attention Mechanism[J].Computer Engineering,2023,49(2):288-295.

Authors:	FAN Runze LIU Yuhong ZHANG Rongfen LI Jingyu

Affiliation:	College of Big Data and Information Engineering, Guizhou University, Guiyang 550025, China

Abstract:	Semantic segmentation of road scenes can assist vehicles to perceive the surrounding environment, to avoid pedestrians, vehicles and all kinds of small object obstacles, and further improve the safety of driving.This study proposes a semantic segmentation network based on multi-scale attention mechanism, aiming at the problems of low recognition accuracy of small objects in semantic segmentation of road scene in deep learning, and the large number of network parameters adversely affecting the deployment.A multi-scale wavelet attention module is designed based on the characteristics of wavelet transform with multi-scale and multi frequency information analysis and embedded into the encoder structure.By fusing the characteristics of different scales and frequencies, more edge contour details are retained.The hierarchical connection between the encoder and the decoder and the improved pyramid pooling module are used for feature extraction in many aspects to obtain more image details, while retaining the context feature information.By designing the training model of multistage loss function, the network convergence is accelerated.The experimental results on the Cambridge-driving Labeled Video Database(CamVid) show that the average intersection and merge ratio of the model is 60.21%, which reduces the parameters by nearly 30% compared with DeepLabV3+ and DenseASP models.The segmentation accuracy of this model is improved without additional parameters, and the model has good robustness in different scenes.

Keywords:	deep learning semantic segmentation attention mechanism wavelet transform pyramid pooling

	点击此处可从《计算机工程》浏览原始摘要信息
	点击此处可从《计算机工程》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏