首页 | 本学科首页   官方微博 | 高级检索  
     

基于词语距离的网络图词义消歧
引用本文:杨陟卓,黄河燕.基于词语距离的网络图词义消歧[J].软件学报,2012,23(4):776-785.
作者姓名:杨陟卓  黄河燕
作者单位:北京市海量语言信息处理与云计算应用工程技术研究中心(北京理工大学),北京100081;北京理工大学计算机学院,北京100081
基金项目:国家自然科学基金(61132009);国防基础基金;北京理工大学科技创新计划重大项目培育专项计划
摘    要:传统的基于知识库的词义消歧方法,以一定窗口大小下的词语作为背景,对歧义词词义进行推断.该窗口大小下的所有词语无论距离远近,都对歧义词的词义具有相同的影响,使词义消歧效果不佳.针对此问题,提出了一种基于词语距离的网络图词义消歧模型.该模型在传统的网络图词义消歧模型的基础上,充分考虑了词语距离对消歧效果的影响.通过模型重构、优化改进、参数估计以及评测比较,论证了该模型的特点:距离歧义词较近的词语,会对其词义有较强的推荐作用;而距离较远的词,会对其词义有较弱的推荐作用.实验结果表明,该模型可以有效提高中文词义消歧性能,与SemEval-2007:task#5最好的成绩相比,该方法在MacroAve(macro-average accuracy)上提高了3.1%.

关 键 词:词语距离  马尔可夫链  网络图模型  PageRank  参数估计
收稿时间:2011/3/18 0:00:00
修稿时间:9/2/2011 12:00:00 AM

Graph Based Word Sense Disambiguation Method Using Distance Between Words
YANG Zhi-Zhuo and HUANG He-Yan.Graph Based Word Sense Disambiguation Method Using Distance Between Words[J].Journal of Software,2012,23(4):776-785.
Authors:YANG Zhi-Zhuo and HUANG He-Yan
Affiliation:1,2 1(Beijing Engineering Applications Research Center of High Volume Language Information Processing and Cloud Computing(Beijing Institute of Technology),Beijing 100081,China) 2(School of Computer Science and Technology,Beijing Institute of Technology,Beijing 100081,China)
Abstract:Almost all existing knowledge-based word sense disambiguation(WSD) methods used exploit context information contain,in certain window size around ambiguous word,are ineffective because all words in the window size have the same impact on determining the sense of ambiguous word.In order to solve the problem,this paper proposes a novel WSD model based on distance between words,which is built on the basics of traditional graph WSD model and can make full use of distance information.Through model reconstruction,optimization,parameter estimation and evaluation of comparison,the study demonstrates the feature of the new model: The words nearby ambiguous word will have more impact to the final sense of ambiguous word while the words far away from it will have less.Experimental results show that the proposed model can improve Chinese WSD performance,compared with the best evaluation results of SemEval-2007: task #5,this model gets MacroAve(macro-average accuracy) increase 3.1%.
Keywords:word distance  Markov chain  graph based model  PageRank  parameter estimation
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号