首页 | 本学科首页   官方微博 | 高级检索  
     

基于多级特征选择的自然场景文本识别算法
引用本文:李利荣,张开,张云良,乐玲,周蕾,巩朋成. 基于多级特征选择的自然场景文本识别算法[J]. 光电子.激光, 2022, 33(5): 479-487
作者姓名:李利荣  张开  张云良  乐玲  周蕾  巩朋成
作者单位:湖北工业大学 电气与电子工程学院,湖北 武汉 430064 ;新能源及电网装备安全监测湖 北省工程研究中心,湖北 武汉 430064,湖北工业大学 电气与电子工程学院,湖北 武汉 430064,湖北工业大学 电气与电子工程学院,湖北 武汉 430064,湖北工业大学 电气与电子工程学院,湖北 武汉 430064,湖北工业大学 电气与电子工程学院,湖北 武汉 430064,湖北工业大学 电气与电子工程学院,湖北 武汉 430064 ;新能源及电网装备安全监测湖 北省工程研究中心,湖北 武汉 430064
基金项目:国家自然科学基金(62071172)和新能源及电网装备安全监测湖北省工程研究中心开放研究基金(HBSKF202121)资助项目 (1.湖北工业大学 电气与电子工程学院,湖北 武汉 430064; 2.新能源及电网装备安全监测湖北省工程研究中心,湖北 武汉 430064)
摘    要:针对现有场景文本识别方法只关注局部序列字符 分类,而忽略了整个单词全局信息的问题,提出 了一种多级特征选择的场景文本识别(multilevel feature selection scene text recogn ition,MFSSTR)算 法。该算法使用堆叠块体系结构,利用多级特征选择模块在视觉特征中分别捕获上下文特征 和语义特 征。在字符预测过程中提出一种新颖的多级注意力选择解码器(multilevel attention sele ction decoder, MASD),将视觉特征、上下文特征和语义特征拼接成一个新的特征空间,通过自注意力机制 将新的特征 空间重新加权,在关注特征序列的内部联系的同时,选择更有价值的特征并参与解码预测, 同时在训练 过程中引入中间监督,逐渐细化文本预测。实验结果表明,本文算法在多个公共场景文本 数据集上识 别准确率能达到较高水平,特别是在不规则文本数据集SVTP上准确率能达到87.1%,相比于当前热门算法提升了约2%。

关 键 词:场景文本识别   特征序列   自注意力机制   多级注意力选择解码器   中间监督
收稿时间:2021-11-12

Natural scene text recognition algorithm based on multilevel feature selection
LI Lirong,ZHANG Kai,ZHANG Yunliang,YUE Ling,ZHOU Lei and GONG Pengcheng. Natural scene text recognition algorithm based on multilevel feature selection[J]. Journal of Optoelectronics·laser, 2022, 33(5): 479-487
Authors:LI Lirong  ZHANG Kai  ZHANG Yunliang  YUE Ling  ZHOU Lei  GONG Pengcheng
Affiliation:School of Electrical and Electronic Engineering,Hubei University of Technolo gy,Wuhan,Hubei 430064, China ;Hubei Engineering Research Center for New Energy and Power Grid Equipment Safety Monit oring,Wuhan,Hubei 430064, China,School of Electrical and Electronic Engineering,Hubei University of Technolo gy,Wuhan,Hubei 430064, China,School of Electrical and Electronic Engineering,Hubei University of Technolo gy,Wuhan,Hubei 430064, China,School of Electrical and Electronic Engineering,Hubei University of Technolo gy,Wuhan,Hubei 430064, China,School of Electrical and Electronic Engineering,Hubei University of Technolo gy,Wuhan,Hubei 430064, China and School of Electrical and Electronic Engineering,Hubei University of Technolo gy,Wuhan,Hubei 430064, China ;Hubei Engineering Research Center for New Energy and Power Grid Equipment Safety Monit oring,Wuhan,Hubei 430064, China
Abstract:Aiming at the problem that existing scene text recognition methods onl y focus on the classification of local sequence characters and ignore the global information of the entire word,a multilevel feature selection scene text recognition (MFSSTR) algorithm is propos ed.The algorithm uses a stacked block architecture and applies a multilevel feature selection module to capture contextual and semantic features in visual features. In the process of character prediction ,a novel multilevel attention selection decoder (MASD) is proposed,which combines visual features, context features and semantic features into a new feature space,and re-weights the new feature space through a self- attention mechanism.While paying attention to the internal relations of the fe ature sequence,select more valuable features and participate in decoding prediction.At the same time, intermediate supervision is introduced in the training process to gradually refine the text p rediction.The experimental results show that the algorithm in this paper can reach a high leve l of recognition accuracy on multiple public scene text data sets.In particular,the accuracy ra te can reach 87.1% on the irregular text data set SVTP,which is improved compared with the current popu lar algorithms by about 2%.
Keywords:scene text recognition   feature sequence   self attention mechanism   multilevel a ttention selection decoder   intermediate supervision
点击此处可从《光电子.激光》浏览原始摘要信息
点击此处可从《光电子.激光》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号