伪实例与人工标注实例相结合的词义消歧方法 Combining Pseudo-Samples and Manually-Tagged Samples for Word Sense Disambiguation期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

伪实例与人工标注实例相结合的词义消歧方法

引用本文：	车超,滕弘飞.伪实例与人工标注实例相结合的词义消歧方法[J].中文信息学报,2009,23(6):31-39.

作者姓名：	车超滕弘飞

作者单位：	1.大连理工大学计算机科学与工程系, 辽宁大连　116024; 2.大连理工大学机械工程学院, 辽宁大连　116024

基金项目：	国家自然科学基金资助项目，国家863高技术研究发展计划资助项目

摘要：	知识获取是制约基于语料库的词义消歧方法性能提高的瓶颈,使用等价伪词的自动语料标注方法是近年来解决该问题的有效方法。等价伪词是用来代替歧义词在语料中查找消歧实例的词。但使用等价伪词获得的部分伪实例质量太差,且无法为没有或很少同义词的歧义词确定等价伪词。基于此,该文提出一种将等价伪词获得的伪实例和人工标注实例相结合的词义消歧方法。该方法通过计算伪实例与歧义词上下文的句子相似度,删除质量低下的伪实例。并借助人工标注语料为某些无等价伪词的歧义词提供消歧实例,计算各义项的分布概率。在Senseval-3汉语消歧任务上的实验中,该文方法取得了平均F-值为0.79的成绩。
关键词：	计算机应用中文信息处理词义消歧知网等价伪词贝叶斯分类器自动标注语料
Combining Pseudo-Samples and Manually-Tagged Samples for Word Sense Disambiguation

CHE Chao,TENG Hongfei.Combining Pseudo-Samples and Manually-Tagged Samples for Word Sense Disambiguation[J].Journal of Chinese Information Processing,2009,23(6):31-39.

Authors:	CHE Chao TENG Hongfei

Affiliation:	1. Department of Compute Science & Engineering, Dalian University of Technology, Dalian, Liaoning 116024, China; 2. School of Mechanical Engineering, Dalian University of Technology, Dalian, Liaoning 116024, China

Abstract:	The corpusbased method for word sense disambiguation (WSD) suffers from “knowledge acquisition bottleneck” problem. The automatic lexical sample acquisition method based on equivalent pseudowords (EPs) is an effective way to solve of this problem. However, some pseudosamples collected by EPs have low quality and the EPs can not be acquired when the ambiguous word has few monosemous synonyms. This paper proposes a WSD method combining pseudosamples and manacquired samples. The method calculates the sentence similarity with the context of the ambiguous word to remove pseudosamples with low quality. Moreover, the method utilizes the manuallytagged corpus to get the sense distribution probability and provide samples for the ambiguous words that have little monosemous synonym. Our method achieves an average Fmeasure of 0.79 through the WSD experiments performed on Senseval3 Chinese lexical sample task.

Keywords:	computer application Chinese information processing word sense disambiguation HowNet equivalent pseudowords Bayesian classifier automatic sample acquisition
本文献已被万方数据等数据库收录！
	点击此处可从《中文信息学报》浏览原始摘要信息
	点击此处可从《中文信息学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏