基于语言模型验证的词义消歧语料获取 Word Sense Disambiguation Corpus Acquisition by Language Model Validation期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于语言模型验证的词义消歧语料获取

引用本文：	郭宇航,车万翔,刘挺.基于语言模型验证的词义消歧语料获取[J].中文信息学报,2008,22(6):38-42.

作者姓名：	郭宇航车万翔刘挺

作者单位：	哈尔滨工业大学计算机科学与技术学院信息检索研究室,黑龙江哈尔滨 150001

基金项目：	国家自然科学基金，国家自然科学基金

摘要：	作为一种稀缺资源,人工标注语料的匮乏限制了有指导词义消歧系统的大规模应用。有人提出了利用目标词的单义同义词在生语料中自动获取词义消歧语料的方法,然而,在某些上下文当中,用目标词替换这些单义的同义词并不合适,从而带来噪声。为此,笔者使用语言模型过滤这些噪声,达到净化训练数据,提高系统性能的目的。笔者在Senseval-3国际评测中文采样词词义消歧数据集上进行了实验,结果表明经过语言模型过滤的词义消歧系统性能明显高于未经过滤的系统。
关键词：	计算机应用中文信息处理词义消歧语言模型噪声过滤
Word Sense Disambiguation Corpus Acquisition by Language Model Validation

GUO Yu-hang,CHE Wan-xiang,LIU Ting.Word Sense Disambiguation Corpus Acquisition by Language Model Validation[J].Journal of Chinese Information Processing,2008,22(6):38-42.

Authors:	GUO Yu-hang CHE Wan-xiang LIU Ting

Affiliation:	Information Retrieval Lab, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China

Abstract:	The lack of hand-crafted training data is a critical issue for supervised word sense disambiguation(WSD) systems.The monosemous lexical relatives substitution of target words have been proposed to acquire WSD corpus from the Web automatically.However,in some cases,the monosemous lexical relatives cannot be substituted by the target word suitably and then noises will be brought in.We propose a language models validation method to filter these noises,which can purify the training data,and improve the performance accordingly.Our experiments on Senseval-3 Chinese lexical sample task show that the system based on the training data acquired from the Web with language model validation achieves better accuracy than the one without language models validation.

Keywords:	computer application Chinese information processing word sense disambiguation language model noise filter
本文献已被 CNKI 维普万方数据等数据库收录！
	点击此处可从《中文信息学报》浏览原始摘要信息
	点击此处可从《中文信息学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏