融合多尺度和多头注意力的医疗图像分割方法 Medical image segmentation method combining multi-scale and multi-head attention期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

融合多尺度和多头注意力的医疗图像分割方法

引用本文：	王万良,王铁军,陈嘉诚,尤文波. 融合多尺度和多头注意力的医疗图像分割方法[J]. 浙江大学学报(工学版), 2022, 56(9): 1796-1805. DOI: 10.3785/j.issn.1008-973X.2022.09.013

作者姓名：	王万良王铁军陈嘉诚尤文波

作者单位：	1. 浙江工业大学计算机科学与技术学院，浙江杭州 3100232. 浙江树人大学信息科技学院，浙江杭州 310015

基金项目：	国家自然科学基金资助项目(61873240)

摘要：	为了从医疗图像中自动且准确地提取兴趣区域,提出基于神经网络的分割模型MS2Net.针对传统卷积操作缺乏获取长距离依赖关系能力的问题,为了更好提取上下文信息,提出融合卷积和Transformer的架构.基于Transformer的上下文抽取模块通过多头自注意力得到像素间相似度关系,基于相似度关系融合各像素特征使网络拥有全局视野,使用相对位置编码使Transformer保留输入特征图的结构信息.为了使网络适应兴趣区域形态的差异,在MS2Net中应用解码端多尺度特征并提出多尺度注意力机制.对多尺度特征图依次应用分组通道和分组空间注意力,使网络自适应地选取合理的多尺度语义信息. MS2Net在数据集ISBI 2017和CVC-ColonDB上均取得较U-Net、CE-Net、DeepLab v3+、UTNet等先进方法更优的交并比指标,有着较好的泛化能力.
关键词：	医疗图像分割深度学习注意力 Transformer 多尺度
Medical image segmentation method combining multi-scale and multi-head attention

Wan-liang WANG,Tie-jun WANG,Jia-cheng CHEN,Wen-bo YOU. Medical image segmentation method combining multi-scale and multi-head attention[J]. Journal of Zhejiang University(Engineering Science), 2022, 56(9): 1796-1805. DOI: 10.3785/j.issn.1008-973X.2022.09.013

Authors:	Wan-liang WANG Tie-jun WANG Jia-cheng CHEN Wen-bo YOU

Abstract:	A neural network based segmentation model MS²Net was proposed to automatically and accurately extract regions of interest from medical images. In order to better extract context information, a network architecture combining convolution and Transformer was proposed, which solved the problem that traditional convolution operations lacked the ability to acquire long-range dependencies. In the Transformer-based context extraction module, multi-head self-attention was used to obtain the similarity relationship between pixels. Based on the similarity relationship, the features of each pixel were fused, so that the network had a global view, while the relative positional encoding enabled Transformer to retain the structural information of an input feature map. Aiming at making the network adapt to different sizes of regions of interest, the multi-scale features of decoders were used by MS²Net and a multi-scale attention mechanism was proposed. The group channel attention and the group spatial attention were applied to a multi-scale feature map in turns, so that the reasonable multi-scale semantic information was selected adaptively by the network. MS²Net had achieved better intersection-over-union than advanced methods such as U-Net, CE-Net, DeepLab v3+, UTNet on both ISBI 2017 and CVC-ColonDB datasets, which reflected its excellent generalization ability.

Keywords:	medical image segmentation deep learning attention Transformer multi-scale

	点击此处可从《浙江大学学报(工学版)》浏览原始摘要信息
	点击此处可从《浙江大学学报(工学版)》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏