首页 | 本学科首页   官方微博 | 高级检索  
     

文档识别中误切分字符拒识问题的研究
引用本文:陈臻刚,丁晓青,刘长松,彭良瑞.文档识别中误切分字符拒识问题的研究[J].计算机工程与应用,2002,38(17):69-72.
作者姓名:陈臻刚  丁晓青  刘长松  彭良瑞
作者单位:清华大学电子工程系智能技术与系统国家重点实验室,北京,100084
基金项目:国家863高技术研究发展计划(编号:2001AA114081),国家自然科学基金(编号:69972024)
摘    要:自动文档识别中字切分算法如果仅仅依靠大小位置等度量信息,很容易产生误切分图像块,需要字符分类器给出一定的反馈才能准确切分,为此提出了一个新的拒识算法,目标是尽可能准确地拒识非法字符。该文分析了基于距离的分类器的置信度和广义置信度,在此基础上改进了常用的广义置信度映射函数,并设计了一个基于样本学习的拒识规则,提高了拒识算法的适应性。在中日韩三种文档样本上的实验表明,该文算法明显改善了系统性能,对于较低质量的印刷文本识别具有一定的普遍意义。

关 键 词:OCR  字符识别  置信度  拒识规则
文章编号:1002-8331-(2002)17-0069-04
修稿时间:2002年5月1日

Research on the Missegmented Character Rejection in Document Recognition
Chen Zhengang Ding Xiaoqing Liu Changsong Peng Liangrui.Research on the Missegmented Character Rejection in Document Recognition[J].Computer Engineering and Applications,2002,38(17):69-72.
Authors:Chen Zhengang Ding Xiaoqing Liu Changsong Peng Liangrui
Abstract:In OCR systems the character segmentation algorithm may generate missegmented blocks,especially when us-ing only geometric measure information such as size and location.Feedback information from character classifier is nec-essary to achieve higher character segmentation accuracy.In this paper a novel rejection algorithm is proposed to reject these invalid characters more accurately.First,the confidence and generalized confidence of distance-based classifiers are analyzed,and then usual generalized confidence mapping function is modified.A new sample-based rejection rule is also proposed,which is more adaptive and flexible.Experiments on Chinese,Japanese and Korean document recognition show that new rejection algorithm evidently improved the system performance,especially for low-quality printed document recognition.
Keywords:OCR  Character Recognition  Confidence  Rejection Rule
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号