首页 | 本学科首页   官方微博 | 高级检索  
     

特定领域的汉语语言模型平滑算法比较研究
引用本文:杨琳,张建平,颜永红. 特定领域的汉语语言模型平滑算法比较研究[J]. 计算机工程与应用, 2006, 42(32): 14-16
作者姓名:杨琳  张建平  颜永红
作者单位:中科院声学所,中科信利语音实验室,北京,100080;中科院声学所,中科信利语音实验室,北京,100080;中科院声学所,中科信利语音实验室,北京,100080
摘    要:为了完成特定领域的语音识别任务,利用有限的语料建立高性能的语言模型成为提高系统性能的关键。针对此问题,对特定领域的语言模型进行了研究。提出了利用高频新词来加强模型的领域特征的方法,采取了两种方案:一种是将高频新词直接加入原有字典,并在训练过程中增加这些新词的权重,使模型更能表达与领域相关的特征;一种是基于高频新词统计出一个和领域相关的小词表,并对这两种方案进行了比较研究。通过实验研究了适合汉语语言的平滑策略。最后,实验结果表明,对于特定领域问题,语言模型平滑算法对模型性能影响较大;采用适合汉语的Witten-Bell插值平滑,可以使识别率达到88.4%,比通用模型性能相对提高了18.18%。

关 键 词:语言模型  特定领域  语音识别  平滑  字典
文章编号:1002-8331(2006)32-0014-03
收稿时间:2006-09-01
修稿时间:2006-09-01

Comparative Study on Smoothing Algorithms for Domain-Specific Chinese Language Models
YANG Lin,ZHANG Jian-ping,YAN Yong-hong. Comparative Study on Smoothing Algorithms for Domain-Specific Chinese Language Models[J]. Computer Engineering and Applications, 2006, 42(32): 14-16
Authors:YANG Lin  ZHANG Jian-ping  YAN Yong-hong
Affiliation:ThinklT Laboratory,Institute of Acoustics,Chinese Academy of Sciences,Beijing 100080,China
Abstract:It is important to build a powerful language model by using limited corpora in the field of speech recognition for a specific domain.To deal with this problem,two methods concerning how to process new words with high frequencies in a specific domain are presented.One way is to add the new words to the dictionary directly and then give them a high weight in the procedure of training.The other is to work out a new dictionary according to the new words. And based on some comparative experiments,these two methods and various smoothing algorithms are studied in detail. At last,it can be concluded that the performance of language model is affected by the smoothing algorithm greatly,and the Witten-Bell interpolation method could improve the recognition rate to 88.4%,which is 18.18% higher than the general language model.
Keywords:language model  specific domain  speech recognition  smoothing algorithm  dictionary
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号