首页 | 本学科首页   官方微博 | 高级检索  
     

Statistical Language Model for Chinese Text Proofreading
引用本文:张仰森,曹元大. Statistical Language Model for Chinese Text Proofreading[J]. 北京理工大学学报(英文版), 2003, 12(4): 441-445
作者姓名:张仰森  曹元大
作者单位:[1]DeparmentofComputerScience,ShanxiUniversity,Taiyuan,Shanxi030006,China [2]DepartmentofComputerScienceandEngineering,SchoolofInformationScienceandTechnology,BeijingInstituteofTechnology,Beijing100081,China
基金项目:theYouthFundofScienceandTechnologyofShanxiProvince ( 2 0 0 2 10 15 )
摘    要:Statistical language modeling techniques are investigated so as to construct a language model for Chinese text proofreading. After the defects of n-gram model are analyzed, a novel statistical language model for Chinese text proofreading is proposed. This model takes full account of the information located before and after the target word wi, and the relationship between un-neighboring words wi and wi in linguistic environment(LE). First,the word association degree between wi and wj is defined by using the distance-weighted factor, wi is l words apart from wi in the LE, then Bayes formula is used to calculate the LE related degree of word wi, and lastly, the LE related degree is taken as criterion to predict the reasonability of word wl that appears in context. Comparing the proposed model with the traditional n-gram in a Chinese text automatic error detection system, the experiments results show that the error detection recall rate and precision rate of the system have been improved.

关 键 词:统计语言模型 汉语文本校正 语言环境 n-克模型
收稿时间:2003-05-30

Statistical Language Model for Chinese Text Proofreading
ZHANG Yang-sen and CAO Yuan-da. Statistical Language Model for Chinese Text Proofreading[J]. Journal of Beijing Institute of Technology, 2003, 12(4): 441-445
Authors:ZHANG Yang-sen and CAO Yuan-da
Affiliation:1. Department of Computer Science and Engineering, School of Information Science and Technology, Beijing Institute of Technology, Beijing 100081, China;Department of Computer Science, Shanxi University, Taiyuan, Shanxi 030006, China
2. Department of Computer Science and Engineering, School of Information Science and Technology, Beijing Institute of Technology, Beijing 100081, China
Abstract:Statistical language modeling techniques are investigated so as to construct a language model for Chinese text proofreading. After the defects of n-gram model are analyzed, a novel statistical language model for Chinese text proofreading is proposed. This model takes full account of the information located before and after the target word wi, and the relationship between un-neighboring words w_i and w_j in linguistic environment(LE). First, the word association degree between w_i and w_j is defined by using the distance-weighted factor, w_j is l words apart from w_i in the LE, then Bayes formula is used to calculate the LE related degree of word w_i, and lastly, the LE related degree is taken as criterion to predict the reasonability of word w_i that appears in context. Comparing the proposed model with the traditional n-gram in a Chinese text automatic error detection system, the experiments results show that the error detection recall rate and precision rate of the system have been improved.
Keywords:statistical language model  n-gram  linguistic environment  text proofreading
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《北京理工大学学报(英文版)》浏览原始摘要信息
点击此处可从《北京理工大学学报(英文版)》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号