首页 | 本学科首页   官方微博 | 高级检索  
     

一种面向用户的语言模型及其机器学习方法
引用本文:刘秉权,王晓龙. 一种面向用户的语言模型及其机器学习方法[J]. 哈尔滨工业大学学报, 2004, 36(2): 150-153
作者姓名:刘秉权  王晓龙
作者单位:哈尔滨工业大学,计算机科学与技术学院,黑龙江,哈尔滨,150001;哈尔滨工业大学,计算机科学与技术学院,黑龙江,哈尔滨,150001
基金项目:国家自然科学基金(69973015),国家高技术研究发展计划资助项目(2001AA114041).
摘    要:为改善语言模型的自适应能力,提出的面向用户的语言模型在组织结构上由通过大规模平衡语料的训练得到的通用语言模型(其原始参数维持不变)和通过在线学习得到的用户模型(其参数采用先进先出技术动态更新)组成;在数据存储结构上,通用模型采用多级索引结构来解决数据稀疏问题,用户模型采用线性结构表示,用二分法查找,根据最大限度纠正语言模型的转换错误和避免语言模型不平衡的原则,提出了适应汉语N-gram模型的机器学习方法.实验结果表明,这种机器学习方法具有“强化”特点,和“渐进学习”方式一起为应用系统提供了更灵活的选择。

关 键 词:语言模型  N-gram  自适应  音字转换
文章编号:0367-6234(2004)02-0150-04
修稿时间:2003-10-20

User- oriented Chinese language model and its machine learning
LIU Bing-quan,WANG Xiao-long School of Computer Science and Technology,Harbin Institute of Technology,Harbin ,China. User- oriented Chinese language model and its machine learning[J]. Journal of Harbin Institute of Technology, 2004, 36(2): 150-153
Authors:LIU Bing-quan  WANG Xiao-long School of Computer Science  Technology  Harbin Institute of Technology  Harbin   China
Affiliation:LIU Bing-quan,WANG Xiao-long School of Computer Science and Technology,Harbin Institute of Technology,Harbin 150001,China
Abstract:In order to improve the adaptability of the language model, the user - oriented language model is proposed consisting of the general - purpose language model ( with its original parameters kept unchanged) obtained through large - scale training on balanced corpus and the user model ( with its parameters dynamically updated using the first in and first out technique)obtained through on-line learning. In the data storage structure , a multi - level index structure is used in the general - purpose model to solve the data sparseness problem, and the user model is represented by linear structures, and searched by the halving method. A machine learning method suitable for Chinese N-gram model is proposed following the principle of correcting as much language model transfer errors as possible and avoiding language model imbalance. Experimental results indicate that this machine learning method has the "strengthening" characteristics, and provides together with the "progressive learning" mode a more flexible choice for the application system.
Keywords:language model  N-gram  adaptation  Pinyin-to-character conversion
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号