首页 | 本学科首页   官方微博 | 高级检索  
     

语言模型复杂度度量与汉语熵的估算
引用本文:张仰森,曹元大,俞士汶. 语言模型复杂度度量与汉语熵的估算[J]. 小型微型计算机系统, 2006, 27(10): 1931-1934
作者姓名:张仰森  曹元大  俞士汶
作者单位:1. 北京大学,计算语言学研究所,北京,100871;北京信息科技大学,计算机及自动化系,北京,100085
2. 北京理工大学,计算机科学工程系,北京,100081
3. 北京大学,计算语言学研究所,北京,100871
基金项目:国家重点基础研究发展计划(973计划);国家高技术研究发展计划(863计划);中国博士后科学基金
摘    要:
运用信息论理论,从信息熵的角度对统计语言模型的复杂度度量方法进行了定量化的推理与描述,得出了语言模型对语言熵的估算值越小,说明该模型对语言的描述越精确以及两个n-1元文法模型插值形成的新模型,其性能好于n-1元文法模型,但不及n元文法模型的结论.并对应用语言模型估算汉语信息熵的方法进行了探讨.

关 键 词:语言模型  复杂度    语言模型评价
文章编号:1000-1220(2006)10-1931-04
收稿时间:2005-06-27
修稿时间:2005-06-27

Perplexity Measuring of Language Model and the Entropy Estimating of Chinese
ZHANG Yang-sen,CAO Yuan-da,YU Shi-wen. Perplexity Measuring of Language Model and the Entropy Estimating of Chinese[J]. Mini-micro Systems, 2006, 27(10): 1931-1934
Authors:ZHANG Yang-sen  CAO Yuan-da  YU Shi-wen
Affiliation:1.Institute of Computational Linguistics, Peking University, Beijing, 100871, China;2.Department of Computer and Automatization, Beijing Institute of Technology, Beijing 100085, China;3.Department of Computer Science and Engineering, Beijing Institute of Technology, Beijing 100081, China
Abstract:
The perplexity measuring methods of language model are quantificationally expressed from entropy angle with information theory. We have result in the inference that the more little entropy estimated by a LM is, the more well this model's performance, as well as the new model obtained by combining linearly two (n-1)-gram statistical linguistic models, its performance is better than (n-1)-gram model, but is not as good as n-gram model.
Keywords:language model   perplexity    entropy    perplexity measuring
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号