Statistical Language Model for Chinese Text Proofreading Statistical Language Model for Chinese Text Proofreading期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Statistical Language Model for Chinese Text Proofreading

引用本文：	张仰森,曹元大. Statistical Language Model for Chinese Text Proofreading[J]. 北京理工大学学报(英文版), 2003, 12(4): 441-445

作者姓名：	张仰森曹元大

作者单位：	[1]DeparmentofComputerScience,ShanxiUniversity,Taiyuan,Shanxi030006,China [2]DepartmentofComputerScienceandEngineering,SchoolofInformationScienceandTechnology,BeijingInstituteofTechnology,Beijing100081,China

基金项目：	theYouthFundofScienceandTechnologyofShanxiProvince ( 2 0 0 2 10 15 )

摘要：	Statistical language modeling techniques are investigated so as to construct a language model for Chinese text proofreading. After the defects of n-gram model are analyzed, a novel statistical language model for Chinese text proofreading is proposed. This model takes full account of the information located before and after the target word wi, and the relationship between un-neighboring words wi and wi in linguistic environment(LE). First,the word association degree between wi and wj is defined by using the distance-weighted factor, wi is l words apart from wi in the LE, then Bayes formula is used to calculate the LE related degree of word wi, and lastly, the LE related degree is taken as criterion to predict the reasonability of word wl that appears in context. Comparing the proposed model with the traditional n-gram in a Chinese text automatic error detection system, the experiments results show that the error detection recall rate and precision rate of the system have been improved.
关键词：	统计语言模型汉语文本校正语言环境 n-克模型
收稿时间：	2003-05-30
Statistical Language Model for Chinese Text Proofreading

ZHANG Yang-sen and CAO Yuan-da. Statistical Language Model for Chinese Text Proofreading[J]. Journal of Beijing Institute of Technology, 2003, 12(4): 441-445

Authors:	ZHANG Yang-sen and CAO Yuan-da

Affiliation:	1. Department of Computer Science and Engineering, School of Information Science and Technology, Beijing Institute of Technology, Beijing 100081, China;Department of Computer Science, Shanxi University, Taiyuan, Shanxi 030006, China 2. Department of Computer Science and Engineering, School of Information Science and Technology, Beijing Institute of Technology, Beijing 100081, China

Abstract:	Statistical language modeling techniques are investigated so as to construct a language model for Chinese text proofreading. After the defects of n-gram model are analyzed, a novel statistical language model for Chinese text proofreading is proposed. This model takes full account of the information located before and after the target word wi, and the relationship between un-neighboring words w_i and w_j in linguistic environment(LE). First, the word association degree between w_i and w_j is defined by using the distance-weighted factor, w_j is l words apart from w_i in the LE, then Bayes formula is used to calculate the LE related degree of word w_i, and lastly, the LE related degree is taken as criterion to predict the reasonability of word w_i that appears in context. Comparing the proposed model with the traditional n-gram in a Chinese text automatic error detection system, the experiments results show that the error detection recall rate and precision rate of the system have been improved.

Keywords:	statistical language model n-gram linguistic environment text proofreading
本文献已被 CNKI 维普万方数据等数据库收录！
	点击此处可从《北京理工大学学报(英文版)》浏览原始摘要信息
	点击此处可从《北京理工大学学报(英文版)》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏