基于残差注意网络的端到端手写文本识别方法 An end-to-end handwritten text recognition method using residual attention networks期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于残差注意网络的端到端手写文本识别方法

引用本文：	王寅同,郑豪,常合友,李朔.基于残差注意网络的端到端手写文本识别方法[J].控制与决策,2023,38(7):1825-1834.

作者姓名：	王寅同郑豪常合友李朔

作者单位：	南京晓庄学院信息工程学院,南京 211171;浙江大学计算机科学与技术学院,杭州 310058;英国德蒙福特大学人工智能研究所,莱斯特 LE19BH

基金项目：	国家自然科学基金项目(62177028,61976118,61806098)；江苏省自然科学基金项目(BK20180142)；江苏省青蓝工程项目.

摘要：	中文手写文本识别是模式识别领域中的研究热点问题之一,其存在字符类别数量多、书写风格差异大和训练数据集标记难等问题.针对上述问题,提出无切分无循环的残差注意网络结构用于端到端手写文本识别.首先,以ResNet-26为主体结构,使用深度可分离卷积提取有意义特征,残差注意门控模块提升文本图像中的关键区域的重要性;其次,采用批量双线性插值模型对输入表征进行拉伸-挤压,实现二维文本表征到一维文本行表征的文本行上采样;最后,以连接时序分类作为识别模型的损失函数,实现高层次抽取表征与字符序列标记的对应关系.在CASIA-HWDB2.x和ICDAR2013两个数据集上进行实验研究,结果表明,所提方法在没有任何字符或文本行的位置信息时能够有效地实现端到端手写文本识别,且优于现有的方法.
关键词：	手写文本识别深度可分离卷积残差注意门控双线性插值文本行上采样连接时序分类
An end-to-end handwritten text recognition method using residual attention networks

WANG Yin-tong,ZHENG Hao,CHANG He-you,LI Shuo.An end-to-end handwritten text recognition method using residual attention networks[J].Control and Decision,2023,38(7):1825-1834.

Authors:	WANG Yin-tong ZHENG Hao CHANG He-you LI Shuo

Affiliation:	School of Information Engineering,Nanjing Xiaozhuang University,Nanjing 211171,China;College of Computer Science and Technology,Zhejiang University,Hangzhou 310058,China; Institute of Artificial Intelligence,De Montfort University,Leicester LE19BH,United Kingdom

Abstract:	Handwritten Chinese text recognition which involves thousands of character categories, variant writing styles and monotonous data collection process is a long-standing focus in the field of pattern recognition research. In response to these issues, we propose a residual attention network of segmentation-free and recurrent-free for end-to-end handwritten text recognition with ResNet-26 as the main architecture, using depthwise separable convolution to extract the representation features. The residual attention gate block enhances the important of the key areas of input text image. Then, the text-lines up-sampling of batch bilinear interpolation is used to implement the mapping from two dimension text representation to one dimension text line representation. Finally, connectionist temporal classification as the loss function is employed to realize the corresponding relationship between the high-level extraction representation features and the character sequence labels. An experimental study is carried out on two datasets of CASIA-HWDB2.x and ICDAR2013, and the results indicate that the method can effectively implement end-to-end handwritten text recognition without any position information of characters or text lines, and superior to the existing research methods.

Keywords:

	点击此处可从《控制与决策》浏览原始摘要信息
	点击此处可从《控制与决策》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏