基于HMM的汉语文本识别后处理研究 Post-processing Study of Chinese Document Recognition Based on HMM期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于HMM的汉语文本识别后处理研究

引用本文：	李元祥,丁晓青,刘长松.基于HMM的汉语文本识别后处理研究[J].中文信息学报,1999,13(4):30-35.

作者姓名：	李元祥丁晓青刘长松

作者单位：	清华大学电子工程系

基金项目：	国家863基金,国家自然科学基金

摘要：	本文用HMM(Hidden Markov Model)描述汉语文本识别后处理,将汉语语言和单字识别这两个概率模型结合起来,以充分利用单字识别器提供的信息。语言模型的参数由语料库统计得到;单字识别模型的参数为条件概率,经理论分析,它可转化为后验概率来求解。在分析训练样本集单字识别结果的基础上,提出一种统计方法估计候选字的后验概率。HMM在脱机手写体汉语文本识别中的实验表明,后处理性能除取决于语言模型外,还取决于后验概率的精确估计。
关键词：	汉字识别后处理语言模型隐马尔可夫模型后验概率
Post-processing Study of Chinese Document Recognition Based on HMM

Li Yuanxiang,Ding Xiaoqing,Liu Changsong.Post-processing Study of Chinese Document Recognition Based on HMM[J].Journal of Chinese Information Processing,1999,13(4):30-35.

Authors:	Li Yuanxiang Ding Xiaoqing Liu Changsong

Affiliation:	Department of Electronic Engineering , Tsinghua University

Abstract:	In this paper , a post-processing method using HMM(Hidden Markov Model) for Chinese document recognition is proposed. HMM combines language model with single character recognition(SCR) model to make the best of SCR output . The parameters of language model are derived from corpus , while the parameters of SCR model are conditional probabilities that can be converted into posterior probabilities by theoretic analysis. On the basis of SCR output analysis , posterior probabilities of candidates are obtained by statistical method. Experiments in off - line Chinese document recognition show that post - processing performance depends on efficient evaluation of posterior probability , besides proper language model.

Keywords:	Chinese Character Recognition Post-processing N-gram Language Model Hidden Markov Model Posterior Probability
本文献已被 CNKI 等数据库收录！
	点击此处可从《中文信息学报》浏览原始摘要信息
	点击此处可从《中文信息学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏