首页 | 本学科首页   官方微博 | 高级检索  
     

基于聚类集成的人名消歧算法
引用本文:阳怡林,周杰,李弼程. 基于聚类集成的人名消歧算法[J]. 计算机应用研究, 2016, 33(9)
作者姓名:阳怡林  周杰  李弼程
作者单位:解放军信息工程大学信息系统工程学院,解放军信息工程大学信息系统工程学院,解放军信息工程大学信息系统工程学院
摘    要:传统人名消歧算法中,每类特征仅反映人物实体的部分信息,且不同聚类算法各有优缺点。本文提出了一种基于聚类集成的人名消歧算法。首先,从文本中提取上下文特征、实体特征、社会关系特征得到三个相似度矩阵,并对这三个相似度矩阵进行融合得到一个融合相似度矩阵;然后,把这四个相似度矩阵作为输入,利用不同的聚类算法得到不同的划分;最后,采用基于均方误差邻接矩阵聚类(Squared Error Adjacency Matrix Clustering,SEAM)算法对这些划分进行集成,实现人名消歧。在CLP2010人名消歧训练语料上进行实验,结果表明,新算法有效地提高了人名消歧的准确性和鲁棒性。

关 键 词:聚类集成   人名消歧   凝聚层次聚类   相似度矩阵;
收稿时间:2015-04-21
修稿时间:2016-08-01

Name disambiguation algorithm Based on Ensemble
Yang Yilin,zhoujie and Li Bicheng. Name disambiguation algorithm Based on Ensemble[J]. Application Research of Computers, 2016, 33(9)
Authors:Yang Yilin  zhoujie  Li Bicheng
Affiliation:Information System Engineering College,PLA Information Engineering University,Zhengzhou,Information System Engineering College,PLA Information Engineering University,Zhengzhou,Information System Engineering College,PLA Information Engineering University,Zhengzhou
Abstract:In traditional methods of name disambiguation, each kind of features only reflects partial information of entities, and each algorithm has its advantages and disadvantages. This paper proposed a new name disambiguation algorithm based on ensemble. Firstly, by extracting context features, physical ones, and social relations three similarity matrices can be obtained, and then they are merged into a fusion similarity matrix. Secondly, through inputting four similarity matrices, different divisions are produced by different clustering algorithms. Finally, these divisions are integrated by squared error adjacency matrix clustering algorithm, and name disambiguation can be implemented. Experimental results on Chinese name disambiguation evaluation corpus of CLP2010 show that the new algorithm can effectively improve the accuracy and robustness of name disambiguation.
Keywords:Ensemble   Name Disambiguation   Hierarchical Clustering Algorithm   Similarity Matrix
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号