首页 | 本学科首页   官方微博 | 高级检索  
     

基于像素分配的文本检测方法研究
引用本文:吉训生,喻智,徐晓祥. 基于像素分配的文本检测方法研究[J]. 计算机测量与控制, 2023, 31(7): 21-27
作者姓名:吉训生  喻智  徐晓祥
作者单位:江南大学,江南大学,
基金项目:国家自然科学基金项目(面上项目,重点项目,重大项目)
摘    要:针对现有方法在场景文本检测上的不足,提出一种基于像素分配方的场景文本检测方法,并采用了交叉注意力模块和多尺度特征自适应模块来分别在空间和和通道上优化特征提取。为了丰富不同尺度的特征表示,采用多尺度特征自适应模块进行自动分配不同尺度特征的权重。为了有效获取上下文信息,将特征网络提取到的特征送入交叉注意力模块。对每个像素,在其所在的水平路径和垂直路径上收集上下文信息。再通过循环操作,每一个像素便可以在全图范围内获取上下文信息。通过全卷积网络方法,使用多任务学习框架学习文本实例的几何特征,结合多任务学习的结果完成像素到文本框的分配,经过简单处理后重建文本实例的多边形边界框。在任意形状公开数据集Total-text上进行测试,本文方法的召回率、精确率、F值分别为75.71%、89.15%、81.89%,在多方向公开数据集ICDAR2015上也表现良好,经实验得召回率、精确率、F值分别为79.06%、89.24%、83.84%,证明了本文方法的有效性。

关 键 词:图像处理;文本检测;交叉注意力;像素分配
收稿时间:2023-02-08
修稿时间:2023-03-06

Text Detection Based on Pixel to Box Assignment
Abstract:Aiming at the shortcomings of existing methods in scene text detection, a scene text detection method based on pixel allocation is proposed, and a cross-attention module and a multi-scale feature adaptive module are used to optimize feature extraction in space and channel respectively. In order to enrich the feature representations of different scales, a multi-scale feature adaptive module is used to automatically assign the weights of features of different scales. In order to effectively obtain contextual information, the features extracted by the feature network are fed into the cross-attention module. For each pixel, contextual information is collected on its horizontal path and vertical path. Then through the loop operation, each pixel can obtain context information in the whole image. Through the fully convolutional network method, the multi-task learning framework is used to learn the geometric features of the text instance, and the results of the multi-task learning are combined to complete the allocation of pixels to the text box, and the polygonal bounding box of the text instance is reconstructed after simple processing. Tested on the public dataset Total-text with any shape, the recall rate, precision rate, and F value of the method in this paper are 75.71%, 89.15%, and 81.89%, respectively, and it also performs well on the multi-directional public dataset ICDAR2015. The recall rate, precision rate, and F value are 79.06%, 89.24%, and 83.84%, respectively, which proves the effectiveness of the method in this paper.
Keywords:image processing   text detection   criss-cross attention   pixel to box assignment
点击此处可从《计算机测量与控制》浏览原始摘要信息
点击此处可从《计算机测量与控制》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号