首页 | 本学科首页   官方微博 | 高级检索  
     

伪实例与人工标注实例相结合的词义消歧方法
引用本文:车超,滕弘飞.伪实例与人工标注实例相结合的词义消歧方法[J].中文信息学报,2009,23(6):31-39.
作者姓名:车超  滕弘飞
作者单位:1.大连理工大学 计算机科学与工程系, 辽宁 大连 116024;
2.大连理工大学 机械工程学院, 辽宁 大连 116024
基金项目:国家自然科学基金资助项目,国家863高技术研究发展计划资助项目 
摘    要:知识获取是制约基于语料库的词义消歧方法性能提高的瓶颈,使用等价伪词的自动语料标注方法是近年来解决该问题的有效方法。等价伪词是用来代替歧义词在语料中查找消歧实例的词。但使用等价伪词获得的部分伪实例质量太差,且无法为没有或很少同义词的歧义词确定等价伪词。基于此,该文提出一种将等价伪词获得的伪实例和人工标注实例相结合的词义消歧方法。该方法通过计算伪实例与歧义词上下文的句子相似度,删除质量低下的伪实例。并借助人工标注语料为某些无等价伪词的歧义词提供消歧实例,计算各义项的分布概率。在Senseval-3汉语消歧任务上的实验中,该文方法取得了平均F-值为0.79的成绩。

关 键 词:计算机应用  中文信息处理  词义消歧  知网  等价伪词  贝叶斯分类器  自动标注语料
  

Combining Pseudo-Samples and Manually-Tagged Samples for Word Sense Disambiguation
CHE Chao,TENG Hongfei.Combining Pseudo-Samples and Manually-Tagged Samples for Word Sense Disambiguation[J].Journal of Chinese Information Processing,2009,23(6):31-39.
Authors:CHE Chao  TENG Hongfei
Affiliation:1. Department of Compute Science & Engineering, Dalian University of Technology, Dalian, Liaoning 116024, China;
2. School of Mechanical Engineering, Dalian University of Technology, Dalian, Liaoning 116024, China
Abstract:The corpusbased method for word sense disambiguation (WSD) suffers from “knowledge acquisition bottleneck” problem. The automatic lexical sample acquisition method based on equivalent pseudowords (EPs) is an effective way to solve of this problem. However, some pseudosamples collected by EPs have low quality and the EPs can not be acquired when the ambiguous word has few monosemous synonyms. This paper proposes a WSD method combining pseudosamples and manacquired samples. The method calculates the sentence similarity with the context of the ambiguous word to remove pseudosamples with low quality. Moreover, the method utilizes the manuallytagged corpus to get the sense distribution probability and provide samples for the ambiguous words that have little monosemous synonym. Our method achieves an average Fmeasure of 0.79 through the WSD experiments performed on Senseval3 Chinese lexical sample task.
Keywords:computer application  Chinese information processing  word sense disambiguation  HowNet  equivalent pseudowords  Bayesian classifier  automatic sample acquisition
本文献已被 万方数据 等数据库收录!
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号