首页 | 本学科首页   官方微博 | 高级检索  
     

利用领域信息的基于字的鲁棒中文口语理解研究
引用本文:包长春,徐为群,李亚丽,潘接林,颜永红.利用领域信息的基于字的鲁棒中文口语理解研究[J].微计算机应用,2010,31(6).
作者姓名:包长春  徐为群  李亚丽  潘接林  颜永红
作者单位:中科院声学所中科信利实验室,北京,100190
基金项目:国家科技支撑计划,国家自然科学基金 
摘    要:鲁棒性是口语理解研究最具挑战性的关键问题之一.本文采用两个策略提高口语解析的鲁棒性:一是使用浅层统计理解框架,将口语解析简化为实体识别,并且以字取代词作为基本处理单元;二是在统计框架下,分别从特征提取和语料扩充两个角度充分利用领域信息.实验结果显示上述方法能有效提升语义解析性能.对于人机对话的测试集,当输入为语音识别结果时,解析性能(F1值)由75.27提升至90.24,输入为人工转抄结果时,性能由80.59提升至97.14.

关 键 词:中文口语理解  领域信息  鲁棒性

Robust Character- Based Chinese Spoken Language Understanding with Domain Information
BAO Changchun,XU Weiqun,LI Yali,PAN Jielin,YAN Yonghong.Robust Character- Based Chinese Spoken Language Understanding with Domain Information[J].Microcomputer Applications,2010,31(6).
Authors:BAO Changchun  XU Weiqun  LI Yali  PAN Jielin  YAN Yonghong
Affiliation:BAO Changchun,XU Weiqun,LI Yali,PAN Jielin,YAN Yonghong(ThinkIT Lab,Institute of Acoustics,Chinese Academy of Sciences,Beijing,100190,China)
Abstract:For spoken language understanding,robustness is one of the most challenging key issues. To achieve good robustness,two strategies were investigated. One is to adopt a shallow statistical understanding framework,in which the task of spoken language understanding is simplified into a (named) entity recognition. In this framework,character is chosen as the basic processing unit instead of word. The other is to efficiently exploit domain information through subword enriched features and enlarged training corpus...
Keywords:Chinese spoken language understanding  domain knowledge  robustness  
本文献已被 CNKI 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号