首页 | 本学科首页   官方微博 | 高级检索  
     

A Method to Build a Super Small but Practically Accurate Language Model for Handheld Devices
作者姓名:吴根清  郑方
作者单位:[1]CenterofSpeechTechnology,StateKeyLaboratoryofIntelligentTechnologyandSystemsDepartmentofComputerScienceandTechnology,TsinghuaUniversity,Beijing100084,P.R.China [2]TheauthorcurrentlyisalsowithBeijingd-EarTechnologiesCo.,Ltd.
摘    要:In this paper,an important question,whether a small language model can be practically accurate enough,is raised.Afterwards,the purpose of a language model,the problems that a language model faces,and the factors that affect the performance of a language model,are analyzed. Finally,a novel method for language model compression is proposed,which makes the large language model usable for applications in handheld devices,such as mobiles,smart phones,personal digital assistants (PDAs),and handheld personal computers (HPCs).In the proposed language model compression method,three aspects are included.First,the language model parameters are analyzed and a criterion based on the importance measure of n-grams is used to determine which n-grams should be kept and which removed.Second,a piecewise linear warping method is proposed to be used to compress the uni-gram count values in the full languagemodel.And third, a rank-based quantization method is adopted to quantize the bi-gram probability values.Experiments show that by using this compression method the language model can be reduced dramatically to only about 1M bytes while the performance almost does not decrease.This provides good evidence that a language model compressed by means of a well-designed compression technique is practically accurate enough,and it makes the language model usable in handheld devices.

关 键 词:语言模型  语言压缩  分段线性变形  分类量化  驱动程序

A method to build a super small but practically accurate language model for handheld devices
Wu?GenQing?Email author,Zheng?Fang.A Method to Build a Super Small but Practically Accurate Language Model for Handheld Devices[J].Journal of Computer Science and Technology,2003,18(6):0-0.
Authors:Email author" target="_blank">Wu?GenQing?Email author  Zheng?Fang
Affiliation:(1) Center of Speech Technology, State Key Laboratory of Intelligent Technology and Systems Department of Computer Science and Technology, Tsinghua University, 100084 Beijing, P.R. China
Abstract:In this paper, an important question, whether a small language model can be practically accurate enough, is raised. Afterwards, the purpose of a language model, the problems that a language model faces, and the factors that affect the performance of a language model, are analyzed. Finally, a novel method for language model compression is proposed, which makes the large language model usable for applications in handheld devices, such as mobiles, smart phones, personal digital assistants (PDAs), and handheld personal computers (HPCs). In the proposed language model compression method, three aspects are included. First, the language model parameters are analyzed and a criterion based on the importance measure of n-grams is used to determine which n-grams should be kept and which removed. Second, a piecewise linear warping method is proposed to be used to compress the uni-gram count values in the full language model. And third, a rank-based quantization method is adopted to quantize the bi-gram probability values. Experiments show that by using this compression method the language model can be reduced dramatically to only about 1M bytes while the performance almost does not decrease. This provides good evidence that a language model compressed by means of a well-designed compression technique is practically accurate enough, and it makes the language model usable in handheld devices.
Keywords:language model  language model compression  piecewise linear warping  rank-based quantization  
本文献已被 CNKI 维普 万方数据 SpringerLink 等数据库收录!
点击此处可从《计算机科学技术学报》浏览原始摘要信息
点击此处可从《计算机科学技术学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号