首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 328 毫秒
1.
Chinese characters are constructed by strokes according to structural rules. Therefore, the geometric configurations of characters are important features for character recognition. In handwritten characters, stroke shapes and their spatial relations may vary to some extent. The attribute value of a structural identification is then a fuzzy quantity rather than a binary quantity. Recognizing these facts, we propose a fuzzy attribute representation (FAR) to describe the structural features of handwritten Chinese characters for an on-line Chinese character recognition (OLCCR) system. With a FAR. a fuzzy attribute graph for each handwritten character is created, and the character recognition process is thus transformed into a simple graph matching problem. This character representation and our proposed recognition method allow us to relax the constraints on stroke order and stroke connection. The graph model provides a generalized character representation that can easily incorporate newly added characters into an OLCCR system with an automatic learning capability. The fuzzy representation can describe the degree of structural deformation in handwritten characters. The character matching algorithm is designed to tolerate structural deformations to some extent. Therefore, even input characters with deformations can be recognized correctly once the reference dictionary of the recognition system has been trained using a few representative learning samples. Experimental results are provided to show the effectiveness of the proposed method.  相似文献   

2.
王寅同  郑豪  常合友  李朔 《控制与决策》2023,38(7):1825-1834
中文手写文本识别是模式识别领域中的研究热点问题之一,其存在字符类别数量多、书写风格差异大和训练数据集标记难等问题.针对上述问题,提出无切分无循环的残差注意网络结构用于端到端手写文本识别.首先,以ResNet-26为主体结构,使用深度可分离卷积提取有意义特征,残差注意门控模块提升文本图像中的关键区域的重要性;其次,采用批量双线性插值模型对输入表征进行拉伸-挤压,实现二维文本表征到一维文本行表征的文本行上采样;最后,以连接时序分类作为识别模型的损失函数,实现高层次抽取表征与字符序列标记的对应关系.在CASIA-HWDB2.x和ICDAR2013两个数据集上进行实验研究,结果表明,所提方法在没有任何字符或文本行的位置信息时能够有效地实现端到端手写文本识别,且优于现有的方法.  相似文献   

3.
A handwritten Chinese character recognition method based on primitive and compound fuzzy features using the SEART neural network model is proposed. The primitive features are extracted in local and global view. Since handwritten Chinese characters vary a great deal, the fuzzy concept is used to extract the compound features in structural view. We combine the two categories of features and use a fast classifier, called the Supervised Extended ART (SEART) neural network model, to recognize handwritten Chinese characters. The SEART classifier has excellent performance, is fast, and has good generalization and exception handling abilities in complex problems. Using the fuzzy set theory in feature extraction and the neural network model as a classifier is helpful for reducing distortions, noise and variations. In spite of the poor thinning, a 90.24% recognition rate on average for the 605 test character categories was obtained. The database used is CCL/HCCR3 (provided by CCL, ITRI, Taiwan). The experiment not only confirms the feasibility of the proposed system, but also suggests that applying the fuzzy set theory and neural networks to recognition of handwritten Chinese characters is an efficient and promising approach.  相似文献   

4.
In this paper, we propose an off-line recognition method for handwritten Korean characters based on stroke extraction and representation. To recognize handwritten Korean characters, it is required to extract strokes and stroke sequence to describe an input of two-dimensional character as one-dimensional representation. We define 28 primitive strokes to represent characters and introduce 300 stroke separation rules to extract proper strokes from Korean characters. To find a stroke sequence, we use stroke code and stroke relationship between consecutive strokes. The input characters are recognized by using character recognition trees. The proposed method has been tested for the most frequently used 1000 characters by 400 different writers and showed recognition rate of 94.3%.  相似文献   

5.
一种用于大规模模式识别问题的神经网络算法   总被引:16,自引:1,他引:15  
吴鸣锐  张钹 《软件学报》2001,12(6):851-855
许多实际的模式识别问题如对手写体汉字的识别,都属于大规模的模式识别问题.目前,传统的神经网络算法对这类问题尚无有效的解决办法.在球邻域模型的基础上提出一种可用于大规模模式识别问题的神经网络训练算法,试图加强神经网络解决大规模问题的能力,并用手写体汉字识别问题检验其效果.实验结果揭示了所提算法是解决大规模模式识别问题的一个有效且具有良好前景的方法.  相似文献   

6.
针对传统两级手写汉字识别系统中手写汉字识别的特征提取方法的限制问题,提出了一种采用卷积神经网对相似汉字自动学习有效特征进行识别的系统方法。该方法采用来自手写云平台上的大数据来训练模型,基于频度统计生成相似子集,进一步提高识别率。实验表明,相对于传统的基于梯度特征的支持向量机和最近邻分类器方法,该方法的识别率有一定的提高。  相似文献   

7.
手写汉字识别是手写汉字输入的基础。目前智能设备中的手写汉字输入法无法根据用户的汉字书写习惯,动态调整识别模型以提升手写汉字的正确识别率。通过对最新深度学习算法及训练模型的研究,提出了一种基于用户手写汉字样本实时采集的个性化手写汉字输入系统的设计方法。该方法将采集用户的手写汉字作为增量样本,通过对服务器端训练生成的手写汉字识别模型的再次训练,使识别模型能够更好地适应该用户的书写习惯,提升手写汉字输入系统的识别率。最后,在该理论方法的基础上,结合新设计的深度残差网络,进行了手写汉字识别的对比实验。实验结果显示,通过引入实时采集样本的再次训练,手写汉字识别模型的识别率有较大幅度的提升,能够更有效的满足用户在智能设备端对手写汉字输入系统的使用需求。  相似文献   

8.
A primary reason for performance degradation in unconstrained online handwritten Chinese character recognition is the subtle differences between similar characters. Various methods have been proposed in previous works to address the problem of generating similar characters. These methods are basically comprised of two components—similar character discovery and cascaded classifiers. The goal of similar character discovery is to make similar character pairs/sets cover as many misclassified samples as possible. It is observed that the confidence of convolutional neural network (CNN) is output by an end-to-end manner and it can be understood as one type of probability metric. In this paper, we propose an algorithm by leveraging CNN confidence for discovering similar character pairs/sets. Specifically, a deep CNN is applied to output the top ranked candidates and the corresponding confidence scores, followed by an accumulating and averaging procedure. We experimentally found that the number of similar character pairs for each class is diverse and the confusion degree of similar character pairs is varied. To address these problems, we propose an entropy- based similarity measurement to rank these similar character pairs/sets and reject those with low similarity. The experimental results indicate that by using 30,000 similar character pairs, our method achieves the hit rates of 98.44 and 98.05 % on CASIA-OLHWDB1.0 and CASIA-OLHWDB1.0–1.2 datasets, respectively, which are significantly higher than corresponding results produced by MQDF-based method (95.42 and 94.49 %). Furthermore, recognition of ten randomly selected similar character subsets with a two-stage classification scheme results in a relative error reduction of 30.11 % comparing with traditional single stage scheme, showing the potential usage of the proposed method.  相似文献   

9.
基于可伸缩矢量图SVG的在线手写汉字是以SVG图像作为汉字图像格式、以SVG的path对象作为笔画的基本存储单元来对汉字进行显示和存储的,笔画的轮廓是以手写过程中记录的坐标值作为特征数值加以确定的。基于此种SVG手写汉字存储和表示形式,本文提出一种基于图论的在线连续手写汉字多步分割方法。该方法根据汉字笔画间的坐标位置关系对手写笔画序列构建无向图模型,并利用图的广度优先搜索将原笔画序列分割为互不连通的笔画部件,使偏旁部首分离较远、非粘连汉字得到正确分割;然后利用改进的tarjan算法对部件中的粘连字符进行分割,最后基于笔画部件间距,利用二分类迭代算法对间距进行分类,找出全局最佳分割位置,对过分割的部件进行重组合并。实验结果表明,该方法对于在线手写汉字的分割是有效可行的。  相似文献   

10.
将粗分类应用于脱机手写汉字识别中,采用这种多层次分类策略,能有效地改善识别的性能,提高识别精度。本文提出了一种利用四角区域结构特征对手写汉字进行粗分类的方法。在对汉字基本笔画进行分析的基础之上,根据手写汉字形变的特点以及识别算法的要求,定义一组新的笔画单元,并将这些笔画单元与汉字特定区域内的结构进行比对,得到一组4位结构特征编码,以此作为脱机手写汉字粗分类的依据。对GB2312一级字库中的部分手写汉字进行采样和识别实验,结果证明改进的四角结构特征用于粗分类的有效性。  相似文献   

11.
This paper proposes an effective segmentation-free approach using a hybrid neural network hidden Markov model (NN-HMM) for offline handwritten Chinese text recognition (HCTR). In the general Bayesian framework, the handwritten Chinese text line is sequentially modeled by HMMs with each representing one character class, while the NN-based classifier is adopted to calculate the posterior probability of all HMM states. The key issues in feature extraction, character modeling, and language modeling are comprehensively investigated to show the effectiveness of NN-HMM framework for offline HCTR. First, a conventional deep neural network (DNN) architecture is studied with a well-designed feature extractor. As for the training procedure, the label refinement using forced alignment and the sequence training can yield significant gains on top of the frame-level cross-entropy criterion. Second, a deep convolutional neural network (DCNN) with automatically learned discriminative features demonstrates its superiority to DNN in the HMM framework. Moreover, to solve the challenging problem of distinguishing quite confusing classes due to the large vocabulary of Chinese characters, NN-based classifier should output 19900 HMM states as the classification units via a high-resolution modeling within each character. On the ICDAR 2013 competition task of CASIA-HWDB database, DNN-HMM yields a promising character error rate (CER) of 5.24% by making a good trade-off between the computational complexity and recognition accuracy. To the best of our knowledge, DCNN-HMM can achieve a best published CER of 3.53%.  相似文献   

12.
Analysis of stroke structures of handwritten Chinese characters   总被引:3,自引:0,他引:3  
Most handwritten Chinese character recognition systems suffer from the variations in geometrical features for different writing styles. The stroke structures of different styles have proved to be more consistent than geometrical features. In an on-line recognition system, the stroke structure can be obtained according to the sequences of writing via a pen-based input device such as a tablet. But in an off-line recognition system, the input characters are scanned optically and saved as raster images, so the stroke structure information is not available. In this paper, we propose a method to extract strokes from an off-line handwritten Chinese character. We have developed four new techniques: 1) a new thinning algorithm based on Euclidean distance transformation and gradient oriented tracing, 2) a new line approximation method based on curvature segmentation, 3) artifact removal strategies based on geometrical analysis, and 4) stroke segmentation rules based on splitting, merging and directional analysis. Using these techniques, we can extract and trace the strokes in an off-line handwritten Chinese character accurately and efficiently.  相似文献   

13.
This paper proposes a novel framework of writer adaptation based on deeply learned features for online handwritten Chinese character recognition. Our motivation is to further boost the state-of-the-art deep learning-based recognizer by using writer adaptation techniques. First, to perform an effective and flexible writer adaptation, we propose a tandem architecture design for the feature extraction and classification. Specifically, a deep neural network (DNN) or convolutional neural network (CNN) is adopted to extract the deeply learned features which are used to build a discriminatively trained prototype-based classifier initialized by Linde–Buzo–Gray clustering techniques. In this way, the feature extractor can fully utilize the useful information of a DNN or CNN. Meanwhile, the prototype-based classifier could be designed more compact and efficient as a practical solution. Second, the writer adaption is performed via a linear transformation of the deeply learned features which is optimized with a sample separation margin-based minimum classification error criterion. Furthermore, we improve the generalization capability of the previously proposed discriminative linear regression approach for writer adaptation by using the linear interpolation of two transformations and adaptation data perturbation. The experiments on the tasks of both the CASIA-OLHWDB benchmark and an in-house corpus with a vocabulary of 20,936 characters demonstrate the effectiveness of our proposed approach.  相似文献   

14.
在线手写汉字识别的字形结构排序法   总被引:6,自引:0,他引:6  
汉字是二维平面上的线划图形,在线汉字识别的一个有利条件是利用书写时的笔段顺序 信息,从而采用一维的表示.然而不同的人书写同一个字时笔顺会有所不同,这就给汉字的处 理与识别带来困难.本文给出一种以笔段为基础,仅依赖汉字字形结构的排序方法,把二维空 间的笔段在一维空间排出稳定的次序.这一次序与笔划的书写序无关.这就为在线手写汉字 的识别打下了良好的基础.  相似文献   

15.
The purpose of this study is to investigate a new representation of shape and its use in handwritten online character recognition by a Kohonen associative memory. This representation is based on the empirical distribution of features such as tangents and tangent differences at regularly spaced points along the character signal. Recognition is carried out by a Kohonen neural network trained using the representation. In addition to the Euclidean distance traditionally used in the Kohonen training algorithm to measure the similarities among feature vectors, we also investigate the Kullback–Leibler divergence and the Hellinger distance, functions that measure distance between distributions. Furthermore, we perform operations (pruning and filtering) on the trained memory to improve its classification potency. We report on extensive experiments using a database of online Arabic characters produced without constraints by a large number of writers. Comparative results show the pertinence of the representation and the superior performance of the scheme.  相似文献   

16.
针对手写汉字字符图像识别率受随机噪声影响的问题,提出了一种基于深度学习与抑制噪声相结合的新算法。该算法主要应用于拥有随机噪声的手写汉字字符图片,是其在Python环境下,利用Caffe平台建立抑制噪声与卷积神经网络相结合的模型,通过模型移除噪声并正确识别手写汉字。另外,新算法去除噪声的同时对字符形态没有改变,保留了汉字的原始信息。结果在其两种不同的噪声(高斯噪声和椒盐噪声)下,逐渐提升其噪声强度,进行多次实验,同时与其他方法对比,最终得到其平均识别率为97.05%。实验结果表明,该模型和算法具有效率快、识别能力强的优点。  相似文献   

17.
基于神经网络的手写体汉字识别是将汉字点阵图形转换成电信号,然后输入给数字信号处理器或计算机进行 处理,依据一定的分类算法在众多汉字字符中找出和它相互匹配的汉字字符。本文阐述了手写体汉字识别实验系统的设计目 标,分析了手写体汉字的预处理及其原理,详细介绍了手写汉字的特征提取。  相似文献   

18.
The problem of recognizing offline handwritten Chinese characters has been investigated extensively. One difficulty is due to the existence of characters with very similar shapes. In this paper, we propose a “critical region analysis” technique which highlights the critical regions that distinguish one character from another similar character. The critical regions are identified automatically based on the output of the Fisher's discriminant. Additional features are extracted from these regions and contribute to the recognition process. By incorporating this technique into the character recognition system, a record high recognition rate of 99.53% on the ETL-9B database is obtained.  相似文献   

19.
本文面向手写字符序列输入信号连续识别研究,分析了汉字及联机手写文本的特点,提出并构建了手写汉字部件集。基于该部件集,完成了GB2312-80的6,763个汉字的部件拆分编码和部件集的测试。统计编码数据发现,汉字依手写部件数的分布规律呈对数正态分布。本文从统计学和字符识别技术的角度对手写部件的构字能力作了分析和讨论,部件集的设计方案在部件选择和汉字拆分上均满足设计要求。实验表明,基于手写部件构造的部件识别器对手写汉字和连续汉字的部件识别率分别达到70.21%和58.49%。  相似文献   

20.
The main problem in the handwritten character recognition systems (HCR) is to describe each character by a set of features that can distinguish it from the other characters. Thus, in this paper, we propose a robust set of features extracted from isolated Amazigh characters based on decomposing the character image into zones and calculate the density and the total length of the histogram projection in each zone. In the experimental evaluation, we test the proposed set of features, to show its performance, with different classification algorithms on a large database of handwritten Amazigh characters. The obtained results give recognition rates that reach 99.03% which we presume good and satisfactory compared to other approaches and show that our proposed set of features is useful to describe the Amazigh characters.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号