一个基于混合语言模型的日文识别后处理系统 Statistical and Structural Combined Method-based Post-Processing Algorithm for Japanese OCR期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

一个基于混合语言模型的日文识别后处理系统

引用本文：	谢旭东,丁晓青,彭良瑞,刘长松.一个基于混合语言模型的日文识别后处理系统[J].计算机工程与应用,2002,38(14):68-72.

作者姓名：	谢旭东丁晓青彭良瑞刘长松

作者单位：	清华大学电子工程系,智能技术与系统国家重点实验室,北京,100084

基金项目：	国家863高技术研究发展计划(编号:2001AA114081)，国家自然科学基金(编号:69972024)

摘要：	在文字识别系统中,为了进一步提高文本识别率,后处理模块是很重要的环节。文章针对日文的语言特性,建立统计方法和规则相结合的混和语言模型,实现了一个日文识别后处理系统。该系统首先利用Viterbi算法得到统计模型输出的最优结果,通过与前端识别器输入的识别结果相比较,确定可疑字位置,再利用上下文词匹配方法和语法规则库的使用对可疑字进行检错和纠错处理。经实验验证,该后处理系统对识别日文印刷体文本错误率平均下降21.4%。
关键词：	日文识别后处理语言模型统计方法知识库
文章编号：	1002-8331-(2002)14-0068-05
修稿时间：	2002年3月1日
Statistical and Structural Combined Method-based Post-Processing Algorithm for Japanese OCR

Xie Xudong Ding Xiaoqing Peng Liangrui,Liu Changsong.Statistical and Structural Combined Method-based Post-Processing Algorithm for Japanese OCR[J].Computer Engineering and Applications,2002,38(14):68-72.

Authors:	Xie Xudong Ding Xiaoqing Peng Liangrui Liu Changsong

Abstract:	The Post-Processing module plays an important role in an OCR system.This paper describes a Japanese post-processing system in TH-OCR multilingual OCR software,which combines statistical method and rules to construct a mixed Language Model,The system first uses Viterbi algorithm to get the optimal results of the statistical model,and then locate the suspicious characters by comparing the result of the classifier with it.Finally a contextual matching al-gorithm and grammar rules base are used to detect and correct the errors in the suspicious characters.Experiments show that the average error rates could be decreased by21.4%and proved this method is useful.

Keywords:	Japanese OCR post-processing language model statistical method knowledge base
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏