首页 | 本学科首页   官方微博 | 高级检索  
     

一种利用校对信息的汉字识别自适应后处理方法
引用本文:李元祥,刘长松,丁晓青.一种利用校对信息的汉字识别自适应后处理方法[J].中文信息学报,2001,15(1):46-52.
作者姓名:李元祥  刘长松  丁晓青
作者单位:清华大学电子工程系
基金项目:国家“8 63”高技术计划项目! 863 -3 0 6-ZT0 3 -0 3 -1,国家自然科学基金! (项目 69972 0 2 4 )
摘    要:后处理技术是汉字识别系统的重要组成部分。传统的识别后处理技术在很大程度上依赖于所训练的统计语言模型,没有考虑所处理文本的特殊性;而且没有利用识别器的动态识别特性。本文利用部分校对过的正确本文信息,一方面可以构建自适应语言模型,及时发现所处理文本的语言特点;另一方面可以利用识别器的动态识别特性,以修正候选字集;从而使得后续文本的识别后处理具有自适应性。40 万字的数据测试表明:这种方法的文本平均错误率较传统的后处理方法下降35.24%了,可以大大减轻数据录入人员的工作量,具有较高的实用价值。

关 键 词:汉字识别  后处理  语言模型  自适应  修正候选字集  
修稿时间:2000年3月14日

An Adaptive Post processing Method using Proofreading Information for Chinese Character Recognition
LI Yuan-xiang,LIU Chang-Song,DING Xiao-qing.An Adaptive Post processing Method using Proofreading Information for Chinese Character Recognition[J].Journal of Chinese Information Processing,2001,15(1):46-52.
Authors:LI Yuan-xiang  LIU Chang-Song  DING Xiao-qing
Affiliation:Department of Electronic Engineering ,Tsinghua University
Abstract:Post processing is a key component of Chinese character recognition system.Conventional post processing methods,which to a large extent rely on statistical language model,can't track dependencies within an article.They also can't take the dynamic idiosyncrasy of recognizer into account.This paper presents a novel adaptive post processing method that utilizes the partly corrected texts.These texts can be used to construct adaptive language model and to obtain the idiosyncrasy of recognizer which can help dynamically adjust candidates set.The method makes the post processing of successive documents recognition be of adaptability.Experiments on about 400000 Chinese characters show that the proposed method has 35.24% error reduction rate in average,compared with the conventional post processing method.This method can efficiently reduce the workload in the case of large scale data input and has higher practicability.
Keywords:Chinese character recognition  post  processing  language model  adaptation  candidate set modification
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号