首页 | 本学科首页   官方微博 | 高级检索  
     

基于双重注意力和多尺度特征融合的场景文本检测算法
引用本文:强观臣,杨茜,张丽真,熊炜,李利荣.基于双重注意力和多尺度特征融合的场景文本检测算法[J].光电子.激光,2024,35(6):570-579.
作者姓名:强观臣  杨茜  张丽真  熊炜  李利荣
作者单位:湖北工业大学 电气与电子工程学院,湖北 武汉 430068,湖北工业大学 电气与电子工程学院,湖北 武汉 430068,湖北工业大学 电气与电子工程学院,湖北 武汉 430068,湖北工业大学 电气与电子工程学院,湖北 武汉 430068 ;湖北工业大学 太阳能高效利用及储能运行控制湖北省重点实验室,湖北 武汉 430068 ;湖北工业大学 新能源及电网装备安全监测湖北省工程研究中心,湖北 武汉 430068 ;美国南卡罗来纳大学 计算机科学与工程系,南卡罗来纳州 29201,湖北工业大学 电气与电子工程学院,湖北 武汉 430068 ;湖北工业大学 太阳能高效利用及储能运行控制湖北省重点实验室,湖北 武汉 430068
基金项目:国家自然科学基金(62202148)、湖北省自然科学基金(2019CFB530)、湖北省科技厅重大专项(2019ZYYD020)和国家留学基金(201808420418)资助项目
摘    要:本文提出了一种场景文本检测方法,用于应对复杂自然场景中文本检测的挑战。该方法采用了双重注意力和多尺度特征融合的策略,通过双重注意力融合机制增强了文本特征通道之间的关联性,提升了整体检测性能。在考虑到深层特征图上下采样可能引发的语义信息损失的基础上,提出了空洞卷积多尺度特征融合金字塔(dilated convolution multi-scale feature fusion pyramid structure,MFPN) ,它采用双融合机制来增强语义特征,有助于加强语义特征,克服尺度变化的影响。针对不同密度信息融合引发的语义冲突和多尺度特征表达受限问题,创新性地引入了多尺度特征融合模块(multi-scale feature fusion module,MFFM )。 此外,针对容易被冲突信息掩盖的小文本问题,引入了特征细化模块(feature refinement module,FRM ) 。实验表明,本文的方法对复杂 场景中文本检测有效,其F值在CTW1500、ICDAR2015和Total-Text 3个数据集上分别达到了85.6%、87.1%和86.3%。

关 键 词:文本检测    注意力融合    多尺度    特征融合金字塔
收稿时间:2023/9/4 0:00:00
修稿时间:2023/11/24 0:00:00

Scene text detection based on dual attention and multi-scale feature fusion
QIANG Guanchen,YANG Qian,ZHANG Lizhen,XIONG Wei and LI Lirong.Scene text detection based on dual attention and multi-scale feature fusion[J].Journal of Optoelectronics·laser,2024,35(6):570-579.
Authors:QIANG Guanchen  YANG Qian  ZHANG Lizhen  XIONG Wei and LI Lirong
Affiliation:School of Electrical and Electronic Engineering, Hubei University of Technology, Wuhan, Hubei 430068, China,School of Electrical and Electronic Engineering, Hubei University of Technology, Wuhan, Hubei 430068, China,School of Electrical and Electronic Engineering, Hubei University of Technology, Wuhan, Hubei 430068, China,School of Electrical and Electronic Engineering, Hubei University of Technology, Wuhan, Hubei 430068, China;Hubei Key Laboratory of Solar Energy Efficient Utilization and Energy Storage Operation Control, Hubei University of Technology, Wuhan, Hubei 430068, China;Hubei Engineering Research Center for Safety Monitoring of New Energy and Power Grid Equipment, Hubei University of Technology, Wuhan, Hubei 430068, China;Department of Computer Science and Engineering, University of South Carolina, Columbia, South Carolina 29201, USA and School of Electrical and Electronic Engineering, Hubei University of Technology, Wuhan, Hubei 430068, China;Hubei Key Laboratory of Solar Energy Efficient Utilization and Energy Storage Operation Control, Hubei University of Technology, Wuhan, Hubei 430068, China
Abstract:Addressing the challenges associated with text detection in complex natural scenes,this paper presents a novel scene text detection method that employs a dual-attention and multi-scale feature fusion strategy.By introducing the dual-attention fusion mechanism,the correlation between text feature channels is strengthened,leading to an overall improvement in detection performance.Furthermore,considering the potential loss of semantic information resulting from up-and-down sampling of deep feature maps,a hollow convolutional multi-scale feature fusion pyramid is introduced.This approach adopts a dual fusion mechanism to enhance semantic features and overcome the impact of scale variations.To address the issues of semantic conflict and limited representation of multi-scale features resulting from the fusion of information with different densities,an innovative multi-scale feature fusion module (MFFM) is introduced.In addition,the feature refinement module (FRM) is introduced for the problem of small text that is easily masked by conflicting information.The experiments show the effectiveness of our method for text detection in complex scenes with F-values of 85.6%,87.1% and 86.3% on three datasets,CTW1500,ICDAR2015,and Total-Text.
Keywords:text detection  attention fusion  multi-scale  feature fusion pyramid
点击此处可从《光电子.激光》浏览原始摘要信息
点击此处可从《光电子.激光》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号