首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We propose a novel language model for Hangul text recognition. Without relying on prior linguistic knowledge in training, the proposed model learns variable length Hangul character sequences, which comprise the elementary tokens of Korean language, and their probabilities from statistics of a raw text corpus. Experiments in handwritten Hangul recognition shows that the proposed language model is effective in postprocessing of recognition results.  相似文献   

2.
3.
朝鲜文是一种由元音和辅音构成的字母文字。因此经常使用的一种朝鲜文识别方法是:从朝鲜文字符中分离出每一个字母,然后对这些字母进行识别,最后确定识别字符。本文结合结构分析法,通过对字符图像背景进行细化处理,找到字母之间的分割线分离出了每个字母,并且利用两层外围距离特征对这些字母进行了识别。在对4种经常使用的朝鲜文印刷字体进行初步实验的结果表明,字母分割正确率平均达到了97.4% ,而字母样本集识别率为99%以上。  相似文献   

4.
This paper presents an effective approach for the offline recognition of unconstrained handwritten Chinese texts. Under the general integrated segmentation-and-recognition framework with character oversegmentation, we investigate three important issues: candidate path evaluation, path search, and parameter estimation. For path evaluation, we combine multiple contexts (character recognition scores, geometric and linguistic contexts) from the Bayesian decision view, and convert the classifier outputs to posterior probabilities via confidence transformation. In path search, we use a refined beam search algorithm to improve the search efficiency and, meanwhile, use a candidate character augmentation strategy to improve the recognition accuracy. The combining weights of the path evaluation function are optimized by supervised learning using a Maximum Character Accuracy criterion. We evaluated the recognition performance on a Chinese handwriting database CASIA-HWDB, which contains nearly four million character samples of 7,356 classes and 5,091 pages of unconstrained handwritten texts. The experimental results show that confidence transformation and combining multiple contexts improve the text line recognition performance significantly. On a test set of 1,015 handwritten pages, the proposed approach achieved character-level accurate rate of 90.75 percent and correct rate of 91.39 percent, which are superior by far to the best results reported in the literature.  相似文献   

5.
In structural character recognition, a character is usually viewed as a set of strokes and the spatial relationships between them. Therefore, strokes and their relationships should be properly modeled for effective character representation. For this purpose, we propose a modeling scheme by which strokes as well as relationships are stochastically represented by utilizing the hierarchical characteristics of target characters. A character is defined by a multivariate random variable over the components and its probability distribution is learned from a training data set. To overcome difficulties of the learning due to the high order of the probability distribution (a problem of curse of dimensionality), the probability distribution is factorized and approximated by a set of lower-order probability distributions by applying the idea of relationship decomposition recursively to components and subcomponents. Based on the proposed method, a handwritten Hangul (Korean) character recognition system is developed. Recognition experiments conducted on a public database show the effectiveness of the proposed relationship modeling. The recognition accuracy increased by 5.5 percent in comparison to the most successful system ever reported.  相似文献   

6.
基于组件合并的手写体汉字串分割   总被引:5,自引:0,他引:5  
吕岳  施鹏飞  张克华 《软件学报》2000,11(11):1554-1559
人们对孤立的手写体汉字字符的离线 识别做了大量的研究工作,而走向实用化的进展并不快.除了单字识别率不理想以外,从文本 中正确分割出单个汉字字符也是一个主要难题,因为字符的识别离不开正确分割.利用汉字的 基本结构特征,根据两个组件之间的上下、左右和包围关系,对组件进行合并形成完整的汉字 图像.对整个汉字字符串中组件的宽度和相邻组件的间距进行分析,有助于左右关系组件的合 并.实验结果表明,该方法对手写体汉字字符串具有理想的分割效果.  相似文献   

7.
8.
基于主分量分析法的脱机手写数字识别   总被引:1,自引:0,他引:1       下载免费PDF全文
张国华  万钧力 《计算机工程》2007,33(18):219-221
针对手写数字识别研究中统计特征和结构特征融合困难的问题,利用主分量分析法提取数字字符结构特征的统计信息,重建数字模型,并估计重构偏差,同时提取数字的高宽比特征和欧拉特征,通过组合与3种特征相对应的贝叶斯分类器的分类结果实现数字识别。使用该方法对样本库中的样本进行测试,正确识别率为90.73%。  相似文献   

9.
In this paper, we intensively study the behavior of three part-based methods for handwritten digit recognition. The principle of the proposed methods is to represent a handwritten digit image as a set of parts and recognize the image by aggregating the recognition results of individual parts. Since part-based methods do not rely on the global structure of a character, they are expected to be more robust against various deformations which may damage the global structure. The proposed three methods are based on the same principle but different in their details, for example, the way of aggregating the individual results. Thus, those methods have different performances. Experimental results show that even the simplest part-based method can achieve recognition rate as high as 98.42% while the improved one achieved 99.15%, which is comparable or even higher than some state-of-the-art method. This result is important because it reveals that characters can be recognized without their global structure. The results also show that the part-based method has robustness against deformations which usually appear in handwriting.  相似文献   

10.
This paper proposes a model-based structural matching method for handwritten Chinese character recognition (HCCR). This method is able to obtain reliable stroke correspondence and enable structural interpretation. In the model base, the reference character of each category is described in an attributed relational graph (ARG). The input character is described with feature points and line segments. The strokes and inter-stroke relations of input character are not determined until being matched with a reference character. The structural matching is accomplished in two stages: candidate stroke extraction and consistent matching. All candidate input strokes to match the reference strokes are extracted by line following and then the consistent matching is achieved by heuristic search. Some structural post-processing operations are applied to improve the stroke correspondence. Recognition experiments were implemented on an image database collected in KAIST, and promising results have been achieved.  相似文献   

11.
Classifier combination methods have proved to be an effective tool to increase the performance of classification techniques that can be used in any pattern recognition applications. Despite a significant number of publications describing successful classifier combination implementations, the theoretical basis is still not matured enough and achieved improvements are inconsistent. In this paper, we propose a novel statistical validation technique known as correlation‐based classifier combination technique for combining classifier in any pattern recognition problem. This validation has significant influence on the performance of combinations, and their utilization is necessary for complete theoretical understanding of combination algorithms. The analysis presented is statistical in nature but promises to lead to a class of algorithms for rank‐based decision combination. The potentials of the theoretical and practical issues in implementation are illustrated by applying it on 2 standard datasets in pattern recognition domain, namely, handwritten digit recognition and letter image recognition datasets taken from UCI Machine Learning Database Repository ( http://www.ics.uci.edu/_mlearn ). 1 An empirical evaluation using 8 well‐known distinct classifiers confirms the validity of our approach compared to some other combinations of multiple classifiers algorithms. Finally, we also suggest a methodology for determining the best mix of individual classifiers.  相似文献   

12.
针对单一尺度的Gabor滤波器组只对某一特定粗细的手写体汉字敏感的缺点,提出了一种新颖的多尺度局部Gabor滤波器组。为了评估该方法的识别性能,提出了一个基于Gabor特征的手写体汉字识别系统,实验表明多尺度全局Gabor滤波器组在识别性能上明显提高,局部Gabor滤波器组在基本保持识别性能的情况下,特征维数明显降低,计算量和内存需求减少。该方法的创新之处在于选取局部Gabor滤波器,对863 HCL2000手写体汉字数据库的识别,最高平均识别率达到了92.32%,表明了该方法在手写体汉字识别中的有效性。  相似文献   

13.
韩文是一种常见的东方文字。相对于英文和汉字,韩文字具有大类别,类与类之间相似度极高,基本笔画单位具有以二维几何方式进行排列等特点,因此联机手写体韩文字符识别一直是一个难点。提出了基于PCGM模型的韩文字母识别,用字母的PCGM模型从韩文字分割出来,将分割出来的字母对此模型进行训练,使模型收敛稳定。实验结果表明方法对韩文字的识别有显著效果。  相似文献   

14.
15.
16.
This paper presents a new Bayesian-based method of unconstrained handwritten offline Chinese text line recognition. In this method, a sample of a real character or non-character in realistic handwritten text lines is jointly recognized by a traditional isolated character recognizer and a character verifier, which requires just a moderate number of handwritten text lines for training. To improve its ability to distinguish between real characters and non-characters, the isolated character recognizer is negatively trained using a linear discriminant analysis (LDA)-based strategy, which employs the outputs of a traditional MQDF classifier and the LDA transform to re-compute the posterior probability of isolated character recognition. In tests with 383 text lines in HIT-MW database, the proposed method achieved the character-level recognition rates of 71.37% without any language model, and 80.15% with a bi-gram language model, respectively. These promising results have shown the effectiveness of the proposed method for unconstrained handwritten offline Chinese text line recognition.  相似文献   

17.
The aim of our work is to present a new method based on structural characteristics and a fuzzy classifier for off-line recognition of handwritten Arabic characters in all their forms (beginning, end, middle and isolated). The proposed method can be integrated in any handwritten Arabic words recognition system based on an explicit segmentation process. First, three preprocessing operations are applied on character images: thinning, contour tracing and connected components detection. These operations extract structural characteristics used to divide the set of characters into five subsets. Next, features are extracted using invariant pseudo-Zernike moments. Classification was done using the Fuzzy ARTMAP neural network, which is very fast in training and supports incremental learning. Five Fuzzy ARTMAP neural networks were employed; each one is designed to recognize one subset of characters. The recognition process is achieved in two steps: in the first one, a clustering method affects characters to one of the five character subsets. In the second one, the pseudo-Zernike features are used by the appropriate Fuzzy ARTMAP classifier to identify the character. Training process and tests were performed on a set of character images manually extracted from the IFN/ENIT database. A height recognition rate was reported.  相似文献   

18.
19.
The main problem in the handwritten character recognition systems (HCR) is to describe each character by a set of features that can distinguish it from the other characters. Thus, in this paper, we propose a robust set of features extracted from isolated Amazigh characters based on decomposing the character image into zones and calculate the density and the total length of the histogram projection in each zone. In the experimental evaluation, we test the proposed set of features, to show its performance, with different classification algorithms on a large database of handwritten Amazigh characters. The obtained results give recognition rates that reach 99.03% which we presume good and satisfactory compared to other approaches and show that our proposed set of features is useful to describe the Amazigh characters.  相似文献   

20.
A handwritten Chinese character recognition method based on primitive and compound fuzzy features using the SEART neural network model is proposed. The primitive features are extracted in local and global view. Since handwritten Chinese characters vary a great deal, the fuzzy concept is used to extract the compound features in structural view. We combine the two categories of features and use a fast classifier, called the Supervised Extended ART (SEART) neural network model, to recognize handwritten Chinese characters. The SEART classifier has excellent performance, is fast, and has good generalization and exception handling abilities in complex problems. Using the fuzzy set theory in feature extraction and the neural network model as a classifier is helpful for reducing distortions, noise and variations. In spite of the poor thinning, a 90.24% recognition rate on average for the 605 test character categories was obtained. The database used is CCL/HCCR3 (provided by CCL, ITRI, Taiwan). The experiment not only confirms the feasibility of the proposed system, but also suggests that applying the fuzzy set theory and neural networks to recognition of handwritten Chinese characters is an efficient and promising approach.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号