首页 | 本学科首页   官方微博 | 高级检索  
     

基于加权聚类集成的标签传播算法
引用本文:张美琴,白亮,王俊斌.基于加权聚类集成的标签传播算法[J].智能系统学报,2018,13(6):994-998.
作者姓名:张美琴  白亮  王俊斌
作者单位:1. 山西大学 计算机与信息技术学院, 山西 太原 030006;2. 山西大学 计算智能与中文信息处理教育部重点实验室, 山西 太原 030006
摘    要:标签传播算法(LPA)是一种高效地处理大规模网络的社区发现算法,由于其近乎线性的时间复杂度而受到广泛关注。然而,该算法每个节点的标签依赖于其邻居节点,其迭代速度和聚类有效性对标签信息的更新顺序非常敏感,影响了社区发现结果的准确性和稳定性。基于该问题,提出了一种基于加权聚类集成的标签传播算法。该算法利用多次标签传播算法的结果作为基聚类集,并用模块度评估每个基聚类的重要性,使其作为节点相似性度量的权值形成加权相似性矩阵,最后通过层次聚类得出最终的社区划分结果。在实验分析中,该算法和其他5个具有代表性的标签传播算法的改进算法在真实数据集上进行了比较,展示了新算法能有效地提高标签传播算法的社区发现精度。

关 键 词:数据挖掘  网络数据  社区发现  标签传播算法  聚类集成  基聚类  模块度  加权度量

Label propagation algorithm based on weighted clustering ensemble
ZHANG Meiqin,BAI Liang,WANG Junbin.Label propagation algorithm based on weighted clustering ensemble[J].CAAL Transactions on Intelligent Systems,2018,13(6):994-998.
Authors:ZHANG Meiqin  BAI Liang  WANG Junbin
Affiliation:1. College of Computer Science and Technology, Shanxi University, Taiyuan 030006, China;2. Key Laboratory of Symbol Computation and Knowledge Engineering(Shanxi University), Ministry of Education, Taiyuan 030006, China
Abstract:Label propagation algorithm (LPA) is one of the high-efficiency community detection algorithms for processing large-scale network data. It has attracted much attention because of its nearly linear time complexity with the number of nodes. However, in the algorithm, the label of each node depends on the labels of its neighbor nodes, which makes the iteration speed and clustering performance of the algorithm very sensitive to the order of label information update; this influences the accuracy and stability of the community detection result. To solve this problem, a new LPA is proposed based on weighted clustering ensemble. The new algorithm runs the LPAs many times to obtain several partition results, which can be regarded as a base clustering set. Furthermore, the modularity measure is used to evaluate the importance of each clustering. Based on the evaluation results, a weighted similarity measure is defined between nodes to obtain a weighted similarity matrix of pairwise nodes. Finally, hierarchical clustering on the similarity matrix is used to obtain a final community division result. In the experimental analysis, the new algorithm is compared with several other improved LPAs on five real representative network datasets. The experimental results show that the new algorithm is more effective for improving the community detection accuracy.
Keywords:data mining  network data  community detection  label propagation algorithm  clustering ensemble  base clustering  modularity measure  weighted measure
点击此处可从《智能系统学报》浏览原始摘要信息
点击此处可从《智能系统学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号