首页 | 本学科首页   官方微博 | 高级检索  
     

卷积神经网络的多字体汉字识别
引用本文:柴伟佳,王连明.卷积神经网络的多字体汉字识别[J].中国图象图形学报,2018,23(3):410-417.
作者姓名:柴伟佳  王连明
作者单位:东北师范大学物理学院, 长春 130024,东北师范大学物理学院, 长春 130024
基金项目:国家自然科学基金项目(21227008);吉林省重点科技攻关项目(20170204035GX)
摘    要:目的 多字体的汉字识别在中文自动处理及智能输入等方面具有广阔的应用前景,是模式识别领域的一个重要课题。近年来,随着深度学习新技术的出现,基于深度卷积神经网络的汉字识别在方法和性能上得到了突破性的进展。然而现有方法存在样本需求量大、训练时间长、调参难度大等问题,针对大类别的汉字识别很难达到最佳效果。方法 针对无遮挡的印刷及手写体汉字图像,提出了一种端对端的深度卷积神经网络模型。不考虑附加层,该网络主要由3个卷积层、2个池化层、1个全连接层和一个Softmax回归层组成。为解决样本量不足的问题,提出了综合运用波纹扭曲、平移、旋转、缩放的数据扩增方法。为了解决深度神经网络参数调整难度大、训练时间长的问题,提出了对样本进行批标准化以及采用多种优化方法相结合精调网络等策略。结果 实验采用该深度模型对国标一级3 755类汉字进行识别,最终识别准确率达到98.336%。同时通过多组对比实验,验证了所提出的各种方法对改善模型最终效果的贡献。其中使用数据扩增、使用混合优化方法和使用批标准化后模型对测试样本的识别率分别提高了8.0%、0.3%和1.4%。结论 与其他文献中利用手工提取特征结合卷积神经网络的方法相比,减少了人工提取特征的工作量;与经典卷积神经网络相比,该网络特征提取能力更强,识别率更高,训练时间更短。

关 键 词:汉字识别  卷积神经网络  深度学习  数据扩增  批标准化
收稿时间:2017/7/19 0:00:00
修稿时间:2017/11/16 0:00:00

Recognition of Chinese characters using deep convolutional neural network
Chai Weijia and Wang Lianming.Recognition of Chinese characters using deep convolutional neural network[J].Journal of Image and Graphics,2018,23(3):410-417.
Authors:Chai Weijia and Wang Lianming
Affiliation:School of Physics, Northeast Normal University, Changchun 130024, China and School of Physics, Northeast Normal University, Changchun 130024, China
Abstract:Objective The recognition of Chinese characters has a broad application prospect in Chinese automatic processing and intelligent input. It is an important subject in the field of pattern recognition. With the emergence of the new technology of deep learning in recent years, the recognition of Chinese characters based on a deep convolutional neural network has made a breakthrough in theoretical method and actual performance. However, many problems still exist, such as the need for a large sample size, long training time, and great difficulty in parameter tuning. Thus, achieving the best identification result for Chinese characters, which belong to numerous categories, is difficult. Method An end-to-end deep convolutional neural network model was proposed for processing unscreened images with printed and handwritten Chinese characters. Regardless of the additional layers, such as batch normalization and dropout layers, the network mainly consisted of three convolutional layers, two pooling layers, one fully connected layer, and a softmax regression layer. This paper proposed the data augmentation method, which comprehensively adopted a wave distortion, translation, rotation, and zooming, to solve the problem of a small sample size. The translation and zooming scale, the rotation angles, and a large number of pseudo-samples were randomly generated by controlling the amplitude and period of the sine function that caused the wave distortion. The overall structure of the characters could not be changed, and the number of the trainset samples could be increased to infinity. Advanced strategies, such as batch normalization and fine-tuning the model by combining two optimizers, namely, stochastic gradient descent (SGD) and adaptive moment estimation (Adam), were used to reduce the difficulty of parameter adjustment and the long model training duration. Batch normalization refers to normalizing the input data for each training mini-batch in the process of stochastic gradient descent. Thus, the probability distribution in each dimension becomes a stable probability distribution with mean 0 and standard deviation 1. We define internal covariate shift as the change in the distribution of network activations due to the change in network parameters during training. The network should learn to adapt to different distributions at each iteration, which will greatly reduce the training speed of the network. Batch normalization is an effective way to solve this problem. In the proposed network, the batch normalization layer was placed in front of the activation function layer. In the classic convolutional neural network, the mini-batch stochastic gradient descent method is usually adopted during the training process. However, selecting suitable hyper-parameters is difficult. Parameter selection, such as learning rate and initial weight, greatly affects training speed and classification results. Adam is an algorithm for first-order gradient-based optimization of stochastic objective functions based on adaptive estimates of lower-order moments. The method computes individual adaptive learning rates for different parameters from estimates of the first and second moments of the gradients. The greatest advantage of the method is that the magnitudes of parameter updates are invariant to the rescaling of the gradient and that the training speed can be accelerated tremendously. However, the single use of this method cannot ensure state-of-the-art results. Therefore, this paper presents a new training method that combines the novel optimization method, Adam, and the traditional method, SGD. We divided the training process into two steps. First, we adopted Adam to adjust the parameter, such as learning rate, to avoid manual adjustment and make the network coverage immediately. This process lasted for 200 iterations, and the best model was saved after the first training step. Second, SGD was used to further fine-tune the trained model with a minimal learning rate to achieve the best classification result. The initial learning rate was set to 0.0001 in this step and exponentially decayed. Through these methods,the network performed well in terms of training speed and generalization ability. Result A seven-layer deep model was trained to categorize 3,755 Chinese characters, and the recognition accuracy rate reached 98.336%. The contribution of each proposed method to improve the final effect of the model was verified by several sets of comparative experiments.The recognition rate of the model increased by 8.0%, 0.3%, and 1.4% by using data augmentation, combining the two kinds of optimizers, and using batch normalization, respectively.The training time of the model was 483 and 43 minutes less than when SGD was used and batch normalization was not used, respectively.Conclusion The workload of extracting features is manually reduced compared with traditional recognition methods that use handcrafted features in combination with convolutional neural networks in the reference paper. Our proposed method achieves superior performance because it has a higher recognition rate, stronger extraction ability, and shorter training time compared with the classic convolutional neural network.
Keywords:recognition of Chinese characters  convolutional neural network  deep learning  data augmentation  batch normalization
点击此处可从《中国图象图形学报》浏览原始摘要信息
点击此处可从《中国图象图形学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号