首页 | 本学科首页   官方微博 | 高级检索  
     

汉语信息熵和语言模型的复杂度
引用本文:吴军 王作英. 汉语信息熵和语言模型的复杂度[J]. 电子学报, 1996, 24(10): 69-71,86
作者姓名:吴军 王作英
作者单位:清华大学电子工程系
摘    要:本文介绍了估计汉语信息熵的方法,并通过对大量语料的统计,给出了汉语信息熵的一个上界-5.17比特/汉字。

关 键 词:熵 复杂度 统计语言模型 语音信号处理

The Entropy of Chinese and the Perplexity of the Language Models
Wu Jun,Wang Zhuoying. The Entropy of Chinese and the Perplexity of the Language Models[J]. Acta Electronica Sinica, 1996, 24(10): 69-71,86
Authors:Wu Jun  Wang Zhuoying
Abstract:In this paper, a method of estimating an upper bound of the entropy of printed Chinese is presented. A bound of 5. 17bits/character for the entropy is obtained by computing the entropy of the sample of Chinese corpus. The perplexity of several language models, which is a quantitative measurement for the ability of language models, is discussed. A new method of approximating high scale language model by the lower ones is also presented.
Keywords:Entropy  Perplexity   Stochastic Language Model  
本文献已被 CNKI 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号