首页 | 本学科首页   官方微博 | 高级检索  
     

融合词向量和主题模型的领域实体消歧*
引用本文:马晓军,郭剑毅,王红斌,张志坤,线岩团,余正涛.融合词向量和主题模型的领域实体消歧*[J].模式识别与人工智能,2017,30(12):1130-1137.
作者姓名:马晓军  郭剑毅  王红斌  张志坤  线岩团  余正涛
作者单位:1.昆明理工大学 信息工程与自动化学院 昆明 650500
2.昆明理工大学 智能信息处理重点实验室 昆明 650500
基金项目:国家自然科学基金项目(No.61562052,61462054,61363044)资助
摘    要:针对Skip-gram词向量计算模型在处理多义词时只能计算一个混合多种语义的词向量,不能对多义词不同含义进行区分的问题,文中提出融合词向量和主题模型的领域实体消歧方法.采用词向量的方法从背景文本和知识库中分别获取指称项和候选实体的向量形式,结合上下位关系领域知识库,进行上下文相似度和类别指称相似度计算,利用潜在狄利克雷分布(LDA)主题模型和Skip-gram词向量模型获取多义词不同含义的词向量表示,抽取主题领域关键词,进行领域主题关键词相似度计算.最后融合三类特征,选择相似度最高的候选实体作为最终的目标实体.实验表明,相比现有消歧方法,文中方法消歧结果更优.

关 键 词:实体消歧  词向量模型  领域知识库  潜在狄利克雷分布(LDA)主题模型  
收稿时间:2017-09-15

Entity Disambiguation in Specific Domains Combining Word Vector and Topic Models
MA Xiaojun,GUO Jianyi,WANG Hongbin,ZHANG Zhikun,XIAN Yantuan,YU Zhengtao.Entity Disambiguation in Specific Domains Combining Word Vector and Topic Models[J].Pattern Recognition and Artificial Intelligence,2017,30(12):1130-1137.
Authors:MA Xiaojun  GUO Jianyi  WANG Hongbin  ZHANG Zhikun  XIAN Yantuan  YU Zhengtao
Affiliation:1.Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500
2.Key Laboratory of Intelligent Information Processing, Kunming University of Science and Technology, Kunming 650500
Abstract:When the Skip-gram word vector model deals with the polysemous words, only one word vector with mixed multiple semantics can be computed and different meanings of polysemous words can not be distinguished. In this paper, an entity disambiguation method combining the word vector and the topic model in specific domains is proposed. The word vector method is used to obtain the vector form of the reference term and the candidate entity from the background text and the knowledge base, respectively. The similarities of the context and the category reference are calculated, and the LDA topic model and the Skip-gram word vector models are used to obtain the word vector representation of different meanings of the polysemous words. Meanwhile, the domain keywords are extracted and then the domain topic keyword similarity are calculated. Finally, three types of features are combined, and the candidate entity with the highest similarity is selected as the final target entity. Experiments show that the proposed method has better disambiguation results than the existing disambiguation methods.
Keywords:Entity Disambiguation  Word Vector Model  Domain Knowledge Base  Latent Dirichlet Allocations(LDA) Topic Model  
点击此处可从《模式识别与人工智能》浏览原始摘要信息
点击此处可从《模式识别与人工智能》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号