首页 | 本学科首页   官方微博 | 高级检索  
     

融合多尺度和多头注意力的医疗图像分割方法
引用本文:王万良,王铁军,陈嘉诚,尤文波. 融合多尺度和多头注意力的医疗图像分割方法[J]. 浙江大学学报(工学版), 2022, 56(9): 1796-1805. DOI: 10.3785/j.issn.1008-973X.2022.09.013
作者姓名:王万良  王铁军  陈嘉诚  尤文波
作者单位:1. 浙江工业大学 计算机科学与技术学院,浙江 杭州 3100232. 浙江树人大学 信息科技学院,浙江 杭州 310015
基金项目:国家自然科学基金资助项目(61873240)
摘    要:为了从医疗图像中自动且准确地提取兴趣区域,提出基于神经网络的分割模型MS2Net.针对传统卷积操作缺乏获取长距离依赖关系能力的问题,为了更好提取上下文信息,提出融合卷积和Transformer的架构.基于Transformer的上下文抽取模块通过多头自注意力得到像素间相似度关系,基于相似度关系融合各像素特征使网络拥有全局视野,使用相对位置编码使Transformer保留输入特征图的结构信息.为了使网络适应兴趣区域形态的差异,在MS2Net中应用解码端多尺度特征并提出多尺度注意力机制.对多尺度特征图依次应用分组通道和分组空间注意力,使网络自适应地选取合理的多尺度语义信息. MS2Net在数据集ISBI 2017和CVC-ColonDB上均取得较U-Net、CE-Net、DeepLab v3+、UTNet等先进方法更优的交并比指标,有着较好的泛化能力.

关 键 词:医疗图像分割  深度学习  注意力  Transformer  多尺度

Medical image segmentation method combining multi-scale and multi-head attention
Wan-liang WANG,Tie-jun WANG,Jia-cheng CHEN,Wen-bo YOU. Medical image segmentation method combining multi-scale and multi-head attention[J]. Journal of Zhejiang University(Engineering Science), 2022, 56(9): 1796-1805. DOI: 10.3785/j.issn.1008-973X.2022.09.013
Authors:Wan-liang WANG  Tie-jun WANG  Jia-cheng CHEN  Wen-bo YOU
Abstract:A neural network based segmentation model MS2Net was proposed to automatically and accurately extract regions of interest from medical images. In order to better extract context information, a network architecture combining convolution and Transformer was proposed, which solved the problem that traditional convolution operations lacked the ability to acquire long-range dependencies. In the Transformer-based context extraction module, multi-head self-attention was used to obtain the similarity relationship between pixels. Based on the similarity relationship, the features of each pixel were fused, so that the network had a global view, while the relative positional encoding enabled Transformer to retain the structural information of an input feature map. Aiming at making the network adapt to different sizes of regions of interest, the multi-scale features of decoders were used by MS2Net and a multi-scale attention mechanism was proposed. The group channel attention and the group spatial attention were applied to a multi-scale feature map in turns, so that the reasonable multi-scale semantic information was selected adaptively by the network. MS2Net had achieved better intersection-over-union than advanced methods such as U-Net, CE-Net, DeepLab v3+, UTNet on both ISBI 2017 and CVC-ColonDB datasets, which reflected its excellent generalization ability.
Keywords:medical image segmentation  deep learning  attention  Transformer  multi-scale  
点击此处可从《浙江大学学报(工学版)》浏览原始摘要信息
点击此处可从《浙江大学学报(工学版)》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号