改进YOLOv2的端到端自然场景中文字符检测 End-to-end Chinese character detection in natural scene based on improved YOLOv2期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

改进YOLOv2的端到端自然场景中文字符检测

引用本文：	刘杰,朱旋,宋密密.改进YOLOv2的端到端自然场景中文字符检测[J].控制与决策,2021,36(10):2483-2489.

作者姓名：	刘杰朱旋宋密密

作者单位：	哈尔滨理工大学测控技术与通信工程学院,哈尔滨 150080

基金项目：	国家自然科学基金项目(51607049)；黑龙江省自然科学基金项目(LH2019E067).

摘要：	针对自然场景中文字符检测率低、小字符检测困难以及字符检测类别多样等问题,提出一种基于YOLOv2的改进方法,并将其应用于自然场景中文字符检测中.首先利用k-means++聚类算法对字符目标候选框(anchor)的数量和宽高比维度进行聚类分析,提出多层特征融合策略,对原网络中第4个最大池化层前所输出的特征图经过3times3和1times1大小的卷积核进行卷积操作,并执行4倍的下采样得到局部特征;然后对第5个最大池化层前所输出的特征图经过3times3和1times1大小的卷积核进行卷积操作,并执行2倍的下采样得到局部特征,将局部特征与全局特征融合,同时增加高层卷积中的重复卷积层,将高层卷积中连续且重复的3times3times1024大小的卷积层数由3增加为5;最后使用Chinese text in the wild(CTW)数据集对YOLOv2和改进的YOLOv2算法进行对比实验,结果表明,改进后的YOLOv2算法在中文字符检测中平均准确率均值为78.3%,较原YOLOv2算法提升了7.3%,且明显高于其他自然场景中的文字符检测方法.
关键词：	计算机视觉深度学习自然场景中文字符检测 YOLOv2
End-to-end Chinese character detection in natural scene based on improved YOLOv2

LIU Jie,ZHU Xuan,SONG Mi-mi.End-to-end Chinese character detection in natural scene based on improved YOLOv2[J].Control and Decision,2021,36(10):2483-2489.

Authors:	LIU Jie ZHU Xuan SONG Mi-mi

Affiliation:	School of Measurement and Control Technologe and Communication Engineering,Harbin University of Science and Technology,Harbin 150080,China

Abstract:	This paper proposes an improved method based on YOLOv2 to solve the problems of low Chinese character detection rate, difficulty in small character detection and various character detection categories in natural scenes, and applies it to Chinese character detection in natural scenes. Firstly, k-means++ clustering algorithm is used to cluster the number and aspect ratio of character target candidate boxes (anchors). Then the multi-layer feature fusion strategy is proposed, the feature map output before the fourth maxpooling pooling layer in the original network is convolved with 3times3 and 1times1 convolution kernels and 4 times downsampling is performed to obtain local features, and the feature map output before the fifth maxpooling pooling layer in the original network is convolved with 3times3 and 1times1 convolution kernels and 2 times downsampling is performed to obtain local features. At the same time, repeat convolution layers in high-level convolution are added, and the number of continuous and repeated 3times3times1024 convolution layers in high-level convolution is increased from 3 to 5. Finally, the Chinese text in the wild (CTW) data set is used to compare the YOLOv2 algorithm with the improved one. The experimental results show that the improved YOLOv2 algorithm has a mean average precision (mean average precision, mAP) of 78.3% in Chinese character detection, which is 7.3% higher than mAP value of the original YOLOv2 algorithm, and is significantly higher than the one of other Chinese character detection methods in natural scenes.

Keywords:

	点击此处可从《控制与决策》浏览原始摘要信息
	点击此处可从《控制与决策》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏