首页 | 本学科首页   官方微博 | 高级检索  
     

基于知网义原词向量表示的无监督词义消歧方法
引用本文:唐共波,于 东,荀恩东. 基于知网义原词向量表示的无监督词义消歧方法[J]. 中文信息学报, 2015, 29(6): 23-29
作者姓名:唐共波  于 东  荀恩东
作者单位:1. 北京语言大学 大数据与语言教育研究所,北京 100083;
2. 北京语言大学 信息科学学院,北京100083
基金项目:国家自然科学基金(61300081,61170162),北京语言大学研究生创新基金项目(中央高校基本科研业务费专项资金)(15YCX100)
摘    要:词义消歧一直是自然语言处理领域中的重要问题,该文将知网(HowNet)中表示词语语义的义原信息融入到语言模型的训练中。通过义原向量对词语进行向量化表示,实现了词语语义特征的自动学习,提高了特征学习效率。针对多义词的语义消歧,该文将多义词的上下文作为特征,形成特征向量,通过计算多义词词向量与特征向量之间相似度进行词语消歧。作为一种无监督的方法,该方法大大降低了词义消歧的计算和时间成本。在SENSEVAL-3的测试数据中准确率达到了37.7%,略高于相同测试集下其他无监督词义消歧方法的准确率。

关 键 词:词向量  《知网》  词义消歧  无监督方法
  

An Unsupervised Word Sense Disambiguation Method Based on #br#Sememe Vector in HowNet
TANG Gongbo,YU Dong,XUN Endong. An Unsupervised Word Sense Disambiguation Method Based on #br#Sememe Vector in HowNet[J]. Journal of Chinese Information Processing, 2015, 29(6): 23-29
Authors:TANG Gongbo  YU Dong  XUN Endong
Affiliation:1. Institute of Big Data and Language Education, Beijing Language and Culture University, Beijing 100083, China;
2. College of Information Science, Beijing Language and Culture University, Beijing 100083, China)
Abstract:Word sense disambiguation (WSD) is a classical issues in nature language processing. In this paper, we trained a language model with the sememe information in HowNet that can represent word semantic, so as to learn the semantic features of words automatically and improve the efficiency of feature learning. Then, we represent words by vectors of sememes. Meanwhile, the contexts of the polysemes is used as features. And then we disambiguate the polysemant by computing the vectors’ cosine similarity between polysemes and feature. We choose SENSEVAL-3 as test set, and achieve 37.7% in precision, which is better than other unsupervised method in the same test data.Key words word embedding; HowNet; WSD; unsupervised methods
Keywords:word embedding   HowNet   WSD   unsupervised methods  
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号