首页 | 本学科首页   官方微博 | 高级检索  
     

基于局部引力和距离的聚类算法
引用本文:杜洁,马燕,黄慧.基于局部引力和距离的聚类算法[J].计算机应用,2022,42(5):1472-1479.
作者姓名:杜洁  马燕  黄慧
作者单位:上海师范大学 信息与机电工程学院,上海 201418
基金项目:国家自然科学基金资助项目(61373004)~~;
摘    要:密度峰值聚类(DPC)算法对于密度多样、形状复杂的数据集不能准确选择聚类中心,同时基于局部引力的聚类(LGC)算法参数较多且需要手动调参。针对这些问题,提出了一种基于局部引力和距离的聚类算法(LGDC)。首先,利用局部引力模型计算数据点的集中度(CE),根据集中度确定每个数据点与高集中度的点之间的距离;然后,选取具有高集中度值和高距离值的数据点作为聚类中心;最后,基于簇的内部点集中度远高于边界点的集中度的思想,分配其余数据点,并且利用平衡k近邻实现参数的自动调整。实验结果表明,LGDC在4个合成数据集上取得了更好的聚类效果;且在Wine、SCADI、Soybean等真实数据集上,LGDC的调整兰德系数(ARI)指标相较DPC、LGC等算法平均提高了0.144 7。

关 键 词:密度峰值聚类  引力聚类  局部引力模型  集中度  距离  
收稿时间:2021-04-06
修稿时间:2021-07-09

Clustering algorithm based on local gravity and distance
Jie DU,Yan MA,Hui HUANG.Clustering algorithm based on local gravity and distance[J].journal of Computer Applications,2022,42(5):1472-1479.
Authors:Jie DU  Yan MA  Hui HUANG
Affiliation:College of Information,Mechanical and Electrical Engineering,Shanghai Normal University,Shanghai 201418,China
Abstract:The Density Peak Clustering (DPC) algorithm cannot accurately select the cluster centers for the datasets with various density and complex shape. The Clustering by Local Gravitation (LGC) algorithm has many parameters which need manual adjustment. To address these issues, a new Clustering algorithm based on Local Gravity and Distance (LGDC) was proposed. Firstly, the local gravity model was used to calculate the ConcEntration (CE) of data points, and the distance between each point and the point with higher CE value was determined according to CE. Then, the data points with high CE and high distance were selected as cluster centers. Finally, the remaining data points were allocated based on the idea that the CE of internal points of the cluster was much higher than that of the boundary points. At the same time, the balanced k nearest neighbor was used to adjust the parameters automatically. Experimental results show that, LGDC achieves better clustering effect on four synthetic datasets. Compared with algorithms such as DPC and LGC, LGDC has the index of Adjustable Rand Index (ARI) improved by 0.144 7 on average on the real datasets such as Wine, SCADI and Soybean.
Keywords:density peak clustering  gravity clustering  local gravity model  concentration  distance  
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号