A Method to Build a Super Small but Practically Accurate Language Model for Handheld Devices A method to build a super small but practically accurate language model for handheld devices期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

A Method to Build a Super Small but Practically Accurate Language Model for Handheld Devices

作者姓名：	吴根清郑方

作者单位：	[1]CenterofSpeechTechnology,StateKeyLaboratoryofIntelligentTechnologyandSystemsDepartmentofComputerScienceandTechnology,TsinghuaUniversity,Beijing100084,P.R.China [2]TheauthorcurrentlyisalsowithBeijingd-EarTechnologiesCo.,Ltd.

摘要：	In this paper,an important question,whether a small language model can be practically accurate enough,is raised.Afterwards,the purpose of a language model,the problems that a language model faces,and the factors that affect the performance of a language model,are analyzed. Finally,a novel method for language model compression is proposed,which makes the large language model usable for applications in handheld devices,such as mobiles,smart phones,personal digital assistants (PDAs),and handheld personal computers (HPCs).In the proposed language model compression method,three aspects are included.First,the language model parameters are analyzed and a criterion based on the importance measure of n-grams is used to determine which n-grams should be kept and which removed.Second,a piecewise linear warping method is proposed to be used to compress the uni-gram count values in the full languagemodel.And third, a rank-based quantization method is adopted to quantize the bi-gram probability values.Experiments show that by using this compression method the language model can be reduced dramatically to only about 1M bytes while the performance almost does not decrease.This provides good evidence that a language model compressed by means of a well-designed compression technique is practically accurate enough,and it makes the language model usable in handheld devices.
关键词：	语言模型语言压缩分段线性变形分类量化驱动程序
A method to build a super small but practically accurate language model for handheld devices

Wu?GenQing?Email author,Zheng?Fang.A Method to Build a Super Small but Practically Accurate Language Model for Handheld Devices[J].Journal of Computer Science and Technology,2003,18(6):0-0.

Authors:	Email author" target="_blank">Wu?GenQing?Email author Zheng?Fang

Affiliation:	(1) Center of Speech Technology, State Key Laboratory of Intelligent Technology and Systems Department of Computer Science and Technology, Tsinghua University, 100084 Beijing, P.R. China

Abstract:	In this paper, an important question, whether a small language model can be practically accurate enough, is raised. Afterwards, the purpose of a language model, the problems that a language model faces, and the factors that affect the performance of a language model, are analyzed. Finally, a novel method for language model compression is proposed, which makes the large language model usable for applications in handheld devices, such as mobiles, smart phones, personal digital assistants (PDAs), and handheld personal computers (HPCs). In the proposed language model compression method, three aspects are included. First, the language model parameters are analyzed and a criterion based on the importance measure of n-grams is used to determine which n-grams should be kept and which removed. Second, a piecewise linear warping method is proposed to be used to compress the uni-gram count values in the full language model. And third, a rank-based quantization method is adopted to quantize the bi-gram probability values. Experiments show that by using this compression method the language model can be reduced dramatically to only about 1M bytes while the performance almost does not decrease. This provides good evidence that a language model compressed by means of a well-designed compression technique is practically accurate enough, and it makes the language model usable in handheld devices.

Keywords:	language model language model compression piecewise linear warping rank-based quantization
本文献已被 CNKI 维普万方数据 SpringerLink 等数据库收录！
	点击此处可从《计算机科学技术学报》浏览原始摘要信息
	点击此处可从《计算机科学技术学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏