首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 125 毫秒
1.
汉字的数学表达式研究   总被引:14,自引:1,他引:14  
通过深入分析汉字的有关结构知识,提出了一种全新的汉字表达方法,该方法将汉字表示成由505个部件作为操作数、部件间的6种位置关系作为运算符号的数学表达式,这种表达方法接近自然、结构简单,而且可像普通的数学表达式一样按一定的运算规则处理,它可广泛应用于排版印刷、广告、包装设计、网络传输及中文移动通信等领域,已成功地在汉字字形自动生成,互联网上跨平台传输汉字信息、挖掘有关汉字结构知识等方面。  相似文献   

2.
基于语言知识的手写汉语文本自动识别初探   总被引:2,自引:0,他引:2  
文中首先从信息开销的角度分析了识别一个汉字所需要的信息量。研究表明,单字识别算法是一种等概模型,需要的信息最多。因此,可把汉字文本当作Markov模型来处理,当前汉字的发生仅依赖于前m个汉字。根据对文本的统计,得到许多语言统计信息,在此基础上,设计了利用语言知识基于句子的文本自动识别方法。识别时当前待识字的匹配仅在前一个字的后邻接字集里进行;当一个句子识别完后,对其进行语言知识处理后再输出结果。因  相似文献   

3.
汉字认知心理研究对机器自动识别汉字的启示   总被引:4,自引:1,他引:3  
几项认知心理学实验研究从不同角度一致证实, 方块汉字的四个等分象限所含的字形特征信息童不同。在人类识别汉字时作用也不一样。其中以左上象限最重要, 右下象限的作用则要弱得多。本文结合部件的象限位置频率, 讨论了这些结果对汉字机器识别的一些启示。  相似文献   

4.
基于汉字原型的手写汉字识别   总被引:7,自引:1,他引:6  
本文以现存三种汉字的计算机表示和两种传统的汉字结构分析方法进行评述,应用拓扑和几何的基本原理,分析了汉字结构及其制约关系,确定了四类组成汉字的基本关系,在此基础上实现了汉字原型,最后给出了汉字原型应用手写汉字识别的实例。  相似文献   

5.
王建平  蔺菲  陈军 《计算机工程》2007,33(10):230-232,248
提出了手写体汉字笔画宽度提取、基于提取出的笔画宽度归一化手写体汉字的方法,给出手写体汉字笔画重构的思想,实现了一种基于手写体汉字笔画提取的汉字重构并最终识别手写体汉字的算法,构建了手写体汉字的识别系统。实验证实,该方法可保证原有笔画特征信息,且能有效地识别手写体汉字。  相似文献   

6.
本文实现了一个基于综合匹配法的汉字识别后处理系统,该方法既利用了帝级识别结果的信息,又利用了汉语中字的上下文制约关系,即组字成词的信息,同时用了词的使用频度。  相似文献   

7.
本文根据抽取的各个子笔道相邻关系信息,提出一种新的弛豫匹配法来识别手写体汉字(HCC)。为了确保弛豫处理过程收敛,本文设计了一种新的迭代方案,同时也设计了一种支持函数来解决书写变化多端的问题以及在预处理操作中某些不可避免的缺点。各个子笔道匹配的可能性在距离函数上的反映,这种距离函数是通过线性规划方法来确定的,其目的是获得最佳的结果。采用ETL-8数据库中的汉字进行了实验,实验结果表明,本文所提出的  相似文献   

8.
手写汉字中笔划,部件及其位置关系均产生较大变化,这种变化是引起手写汉字特征不稳定的主要因素。为了减小上述不利影响,使手写汉字特征的描述趋于稳定,本文给出了一种基于汉字基元之间的模糊关系识别手写汉字的方法。  相似文献   

9.
手写体汉字变形规律的机器发现   总被引:1,自引:0,他引:1  
手写汉字的不规整是手写汉字的一个主要特点,也是影响识别率的一个重要因素。利用机器发现的方法,从大量手写体汉字样本中寻求手写汉字变形的一般规律,并将变形规律应用手汉字识别的过程之中,可以大大提高分类的准确性,并可以压缩样本字典的容量,提高匹配速度。  相似文献   

10.
汉字笔段形成规律及其提取方法   总被引:8,自引:0,他引:8  
该文从点阵图像行(列)连通像素段出发,研究汉字图像的笔段构成,发现汉字点阵图像仅由阶梯型笔段和平行长笔段两种类型的笔段构成,并归纳出阶梯型笔段和平行长笔段的形成规律.以笔段形成规律为基础提出了汉字笔段的提取方法,该方法将像素级汉字图像转变为以笔段为单位的图像,有利于汉字识别、汉字细化及汉字字体的自动生成.最后该文给出了印刷体和手写体汉字笔段提取的实验结果.  相似文献   

11.
针对汉字识别的超多类问题,将贝叶斯网络分类器引入小样本字符集脱机手写体汉字识别中.对手写大写数字汉字的小样本字符集构造识别系统,同时与传统的欧氏距离方法进行比较,实验表明该算法将识别率提高到92.4%,在小样本字符集脱机手写体识别中具有较强的实用性和良好的扩展性.  相似文献   

12.
This paper presents a new Bayesian-based method of unconstrained handwritten offline Chinese text line recognition. In this method, a sample of a real character or non-character in realistic handwritten text lines is jointly recognized by a traditional isolated character recognizer and a character verifier, which requires just a moderate number of handwritten text lines for training. To improve its ability to distinguish between real characters and non-characters, the isolated character recognizer is negatively trained using a linear discriminant analysis (LDA)-based strategy, which employs the outputs of a traditional MQDF classifier and the LDA transform to re-compute the posterior probability of isolated character recognition. In tests with 383 text lines in HIT-MW database, the proposed method achieved the character-level recognition rates of 71.37% without any language model, and 80.15% with a bi-gram language model, respectively. These promising results have shown the effectiveness of the proposed method for unconstrained handwritten offline Chinese text line recognition.  相似文献   

13.
This paper gives an introduction and remarks on two review papers for Chinese character recognition. One review is made by Chinese authors, another is from American scientists. They investigate Chinese character from different language environments; they do the research from different points of view. Thus, a more comprehensive view on Chinese character recognition, which is an important branch of pattern recognition, can be provided to the readers. Meantime, one article pays attention to online process, and other paper deals with offline recognition, which complement each other. The author is the Associate Editor-in-Chief of Frontiers of Computer Science in China  相似文献   

14.
将粗分类应用于脱机手写汉字识别中,采用这种多层次分类策略,能有效地改善识别的性能,提高识别精度。本文提出了一种利用四角区域结构特征对手写汉字进行粗分类的方法。在对汉字基本笔画进行分析的基础之上,根据手写汉字形变的特点以及识别算法的要求,定义一组新的笔画单元,并将这些笔画单元与汉字特定区域内的结构进行比对,得到一组4位结构特征编码,以此作为脱机手写汉字粗分类的依据。对GB2312一级字库中的部分手写汉字进行采样和识别实验,结果证明改进的四角结构特征用于粗分类的有效性。  相似文献   

15.
陈站  邱卫根  张立臣 《计算机应用研究》2020,37(4):1244-1246,1251
由于字形的复杂多变,脱机手写汉字的识别一直是模式识别的难题,深度卷积神经网络的发展为其提供了一种直接有效的解决方案。研究基于inceptions 结构神经网络的脱机手写汉字识别,提出了一种inception结构的改进方法,它具有结构更加简单、网络深度扩展更加容易、需要的训练参数量更少的优点。该方法在数据集CISIA-HWDB1.1 上进行了实验验证,采用随机梯度下降优化算法,模型达到了96.95%的平均准确率。实验结果表明,使用改进的inception结构在图像分类上具有更好的鲁棒性,更容易扩展到其他应用领域。  相似文献   

16.
This paper presents an effective approach for the offline recognition of unconstrained handwritten Chinese texts. Under the general integrated segmentation-and-recognition framework with character oversegmentation, we investigate three important issues: candidate path evaluation, path search, and parameter estimation. For path evaluation, we combine multiple contexts (character recognition scores, geometric and linguistic contexts) from the Bayesian decision view, and convert the classifier outputs to posterior probabilities via confidence transformation. In path search, we use a refined beam search algorithm to improve the search efficiency and, meanwhile, use a candidate character augmentation strategy to improve the recognition accuracy. The combining weights of the path evaluation function are optimized by supervised learning using a Maximum Character Accuracy criterion. We evaluated the recognition performance on a Chinese handwriting database CASIA-HWDB, which contains nearly four million character samples of 7,356 classes and 5,091 pages of unconstrained handwritten texts. The experimental results show that confidence transformation and combining multiple contexts improve the text line recognition performance significantly. On a test set of 1,015 handwritten pages, the proposed approach achieved character-level accurate rate of 90.75 percent and correct rate of 91.39 percent, which are superior by far to the best results reported in the literature.  相似文献   

17.
汉字具有丰富的字体类型,并且不同的字体在汉字结构上有显著的不同,现在的OCR技术侧重字的识别,而对字体识别的关注较少。提出文字相关的单字符字体识别方法,利用文字相关的先验信息及字体结构特征,对字体的相似性度量采用向量空间模型,并针对常用66款简体字进行实验,得到了较好的平均识别率。  相似文献   

18.
The problem of recognizing offline handwritten Chinese characters has been investigated extensively. One difficulty is due to the existence of characters with very similar shapes. In this paper, we propose a “critical region analysis” technique which highlights the critical regions that distinguish one character from another similar character. The critical regions are identified automatically based on the output of the Fisher's discriminant. Additional features are extracted from these regions and contribute to the recognition process. By incorporating this technique into the character recognition system, a record high recognition rate of 99.53% on the ETL-9B database is obtained.  相似文献   

19.
This paper proposes an effective segmentation-free approach using a hybrid neural network hidden Markov model (NN-HMM) for offline handwritten Chinese text recognition (HCTR). In the general Bayesian framework, the handwritten Chinese text line is sequentially modeled by HMMs with each representing one character class, while the NN-based classifier is adopted to calculate the posterior probability of all HMM states. The key issues in feature extraction, character modeling, and language modeling are comprehensively investigated to show the effectiveness of NN-HMM framework for offline HCTR. First, a conventional deep neural network (DNN) architecture is studied with a well-designed feature extractor. As for the training procedure, the label refinement using forced alignment and the sequence training can yield significant gains on top of the frame-level cross-entropy criterion. Second, a deep convolutional neural network (DCNN) with automatically learned discriminative features demonstrates its superiority to DNN in the HMM framework. Moreover, to solve the challenging problem of distinguishing quite confusing classes due to the large vocabulary of Chinese characters, NN-based classifier should output 19900 HMM states as the classification units via a high-resolution modeling within each character. On the ICDAR 2013 competition task of CASIA-HWDB database, DNN-HMM yields a promising character error rate (CER) of 5.24% by making a good trade-off between the computational complexity and recognition accuracy. To the best of our knowledge, DCNN-HMM can achieve a best published CER of 3.53%.  相似文献   

20.
A Chinese handwriting database named HIT-MW is presented to facilitate the offline Chinese handwritten text recognition. Both the writers and the texts for handcopying are carefully sampled with a systematic scheme. To collect naturally written handwriting, forms are distributed by postal mail or middleman instead of face to face. The current version of HIT-MW includes 853 forms and 186,444 characters that are produced under an unconstrained condition without preprinted character boxes. The statistics show that the database has an excellent representation of the real handwriting. Many new applications concerning real handwriting recognition can be supported by the database.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号