汉语信息熵和语言模型的复杂度 The Entropy of Chinese and the Perplexity of the Language Models期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

汉语信息熵和语言模型的复杂度

引用本文：	吴军王作英. 汉语信息熵和语言模型的复杂度[J]. 电子学报, 1996, 24(10): 69-71,86

作者姓名：	吴军王作英

作者单位：	清华大学电子工程系

摘要：	本文介绍了估计汉语信息熵的方法，并通过对大量语料的统计，给出了汉语信息熵的一个上界－５．１７比特／汉字。
关键词：	熵复杂度统计语言模型语音信号处理
The Entropy of Chinese and the Perplexity of the Language Models

Wu Jun,Wang Zhuoying. The Entropy of Chinese and the Perplexity of the Language Models[J]. Acta Electronica Sinica, 1996, 24(10): 69-71,86

Authors:	Wu Jun Wang Zhuoying

Abstract:	In this paper, a method of estimating an upper bound of the entropy of printed Chinese is presented. A bound of 5. 17bits/character for the entropy is obtained by computing the entropy of the sample of Chinese corpus. The perplexity of several language models, which is a quantitative measurement for the ability of language models, is discussed. A new method of approximating high scale language model by the lower ones is also presented.

Keywords:	Entropy Perplexity Stochastic Language Model
本文献已被 CNKI 维普等数据库收录！