首页 | 本学科首页   官方微博 | 高级检索  
     

门控多层融合的实时语义分割
引用本文:张灿龙,程庆贺,李志欣,王智文.门控多层融合的实时语义分割[J].计算机辅助设计与图形学学报,2020,32(9):1442-1449.
作者姓名:张灿龙  程庆贺  李志欣  王智文
作者单位:广西师范大学广西多源信息挖掘与安全重点实验室桂林 541004;广西科技大学计算机科学与通信工程学院柳州 545006
基金项目:工程学院创新项目;广西多源信息挖掘与安全重点实验室系统性研究课题;广西八桂学者创新研究团队;广西自然科学基金;国家自然科学基金
摘    要:针对语义分割任务中因模型下采样过程中的像素损失而导致的上采样像素难以精确还原的问题,提出一种基于门控多层融合的实时语义分割方法.考虑分割的实时性,采用轻量级模型作为基础网络进行特征信息的提取.为解决像素难以精确还原问题,设计了一种横向连接的门控注意力结构,此结构可以对目标特征进行筛选,并通过横向传递增强上采样特征图信息的多样性,从而提高特征图的还原精度.此外,还提出采用多层融合结构来整合不同网络层的语义信息,利用不同网络层间的语义表达差异对缺失像素进行补充.实验以CamVid和VOC为数据集,以512×512大小的图像为输入,测试结果表明,方法的图像语义分割精度达到72.9%,平均分割速度为43.1帧/s.

关 键 词:图像语义分割  多层融合  门控注意力机制

Gated Multi-Layer Fusion for Real-Time Semantic Segmentation
Zhang Canlong,Cheng Qinghe,Li Zhixin,Wang Zhiwen.Gated Multi-Layer Fusion for Real-Time Semantic Segmentation[J].Journal of Computer-Aided Design & Computer Graphics,2020,32(9):1442-1449.
Authors:Zhang Canlong  Cheng Qinghe  Li Zhixin  Wang Zhiwen
Affiliation:(Guangxi Key Laboratory of Multi-Source Information Mining and Security,Guangxi Normal University,Guilin 541004;College of Computer Science and Communication Engineering,Guangxi University of Science and Technology,Liuzhou 545006)
Abstract:In order to solve the problem that it is difficult to recover the pixels precisely in upsampling because of the loss of pixels caused by downsampling,a real-time semantic segmentation method based on gated multi-layer fusion is proposed.Considering the real-time of segmentation,we use the light model as the basic network to extract feature information.To improve the recovering accuracy of pixels,a gated attention structure is designed to horizontally connect the downsample lay to upsample lay,which can filter the features of object,and enhance the information diversity of the upsampling feature map.In addition,a multi-layer fusion structure is proposed to integrate the semantic information of different network layers,and is used to supplement the missing pixels by utilizing the semantic differences between different layers.The experimental results on CamVid and VOC datasets show that the segmentation accuracy of the proposed model is 72.9%,and the average segmentation speed is 43.1 frames per second,where the input image size of the network is 512×512.
Keywords:image semantic segmentation  multi-layer fusion  gated attention mechanism
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号