首页 | 本学科首页   官方微博 | 高级检索  
     

基于HMM的汉语文本识别后处理研究
引用本文:李元祥,丁晓青,刘长松.基于HMM的汉语文本识别后处理研究[J].中文信息学报,1999,13(4):30-35.
作者姓名:李元祥  丁晓青  刘长松
作者单位:清华大学电子工程系
基金项目:国家863基金,国家自然科学基金
摘    要:本文用HMM(Hidden Markov Model)描述汉语文本识别后处理,将汉语语言和单字识别这两个概率模型结合起来,以充分利用单字识别器提供的信息。语言模型的参数由语料库统计得到;单字识别模型的参数为条件概率,经理论分析,它可转化为后验概率来求解。在分析训练样本集单字识别结果的基础上,提出一种统计方法估计候选字的后验概率。HMM在脱机手写体汉语文本识别中的实验表明,后处理性能除取决于语言模型外,还取决于后验概率的精确估计。

关 键 词:汉字识别  后处理  语言模型  隐马尔可夫模型  后验概率  

Post-processing Study of Chinese Document Recognition Based on HMM
Li Yuanxiang,Ding Xiaoqing,Liu Changsong.Post-processing Study of Chinese Document Recognition Based on HMM[J].Journal of Chinese Information Processing,1999,13(4):30-35.
Authors:Li Yuanxiang  Ding Xiaoqing  Liu Changsong
Affiliation:Department of Electronic Engineering , Tsinghua University
Abstract:In this paper , a post-processing method using HMM(Hidden Markov Model) for Chinese document recognition is proposed. HMM combines language model with single character recognition(SCR) model to make the best of SCR output . The parameters of language model are derived from corpus , while the parameters of SCR model are conditional probabilities that can be converted into posterior probabilities by theoretic analysis. On the basis of SCR output analysis , posterior probabilities of candidates are obtained by statistical method. Experiments in off - line Chinese document recognition show that post - processing performance depends on efficient evaluation of posterior probability , besides proper language model.
Keywords:Chinese Character Recognition  Post-processing  N-gram Language Model  Hidden Markov Model  Posterior Probability  
本文献已被 CNKI 等数据库收录!
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号