首页 | 官方网站   微博 | 高级检索  
     

基于代表点与K近邻的密度峰值聚类算法
引用本文:张清华,周靖鹏,代永杨,王国胤.基于代表点与K近邻的密度峰值聚类算法[J].软件学报,2023,34(12):5629-5648.
作者姓名:张清华  周靖鹏  代永杨  王国胤
作者单位:旅游多源数据感知与决策技术文化和旅游部重点实验室(重庆邮电大学), 重庆 400065;计算智能重庆市重点实验室(重庆邮电大学), 重庆 400065
基金项目:国家重点研发计划(2020YFC2003502); 国家自然科学基金(61876201); 重庆市自然科学基金(cstc2019jcyj-cxttX0002, cstc2021ycjh-bgzxm0013); 重庆市教委重点合作项目(HZ2021008)
摘    要:密度峰值聚类(density peaks clustering, DPC)是一种基于密度的聚类算法,该算法可以直观地确定类簇数量,识别任意形状的类簇,并且自动检测、排除异常点.然而, DPC仍存在些许不足:一方面, DPC算法仅考虑全局分布,在类簇密度差距较大的数据集聚类效果较差;另一方面, DPC中点的分配策略容易导致“多米诺效应”.为此,基于代表点(representative points)与K近邻(K-nearest neighbors, KNN)提出了RKNN-DPC算法.首先,构造了K近邻密度,再引入代表点刻画样本的全局分布,提出了新的局部密度;然后,利用样本的K近邻信息,提出一种加权的K近邻分配策略以缓解“多米诺效应”;最后,在人工数据集和真实数据集上与5种聚类算法进行了对比实验,实验结果表明,所提出的RKNN-DPC可以更准确地识别类簇中心并且获得更好的聚类结果.

关 键 词:聚类分析  密度峰值聚类  代表点  K近邻(KNN)
收稿时间:2021/12/21 0:00:00
修稿时间:2022/4/18 0:00:00

Density Peaks Clustering Algorithm Based on Representative Points and K-nearest Neighbors
ZHANG Qing-Hu,ZHOU Jing-Peng,DAI Yong-Yang,WANG Guo-Yin.Density Peaks Clustering Algorithm Based on Representative Points and K-nearest Neighbors[J].Journal of Software,2023,34(12):5629-5648.
Authors:ZHANG Qing-Hu  ZHOU Jing-Peng  DAI Yong-Yang  WANG Guo-Yin
Affiliation:Key Laboratory of Tourism Multisource Data Perception and Decision, Ministry of Culture and Tourism (Chongqing University of Posts and Telecommunications), Chongqing 400065, China;Chongqing Key Laboratory of Computational Intelligence (Chongqing University of Posts and Telecommunications), Chongqing 400065, China
Abstract:Density peaks clustering (DPC) is a density-based clustering algorithm that can intuitively determine the number of clusters, identify clusters of any shape, and automatically detect and exclude abnormal points. However, DPC still has some shortcomings: The DPC algorithm only considers the global distribution, and the clustering performance is poor for datasets with large cluster density differences. In addition, the point allocation strategy of DPC is likely to cause a Domino effect. Hence, this study proposes a DPC algorithm based on representative points and K-nearest neighbors (KNN), namely, RKNN-DPC. First, the KNN density is constructed, and the representative points are introduced to describe the global distribution of samples and propose a new local density. Then, the KNN information of samples is used to propose a weighted KNN allocation strategy to relieve the Domino effect. Finally, a comparative experiment is conducted with five clustering algorithms on artificial datasets and real datasets. The experimental results show that the RKNN-DPC algorithm can more accurately identify cluster centers and obtain better clustering results.
Keywords:cluster analysis  density peaks clustering (DPC)  representative point  K-nearest neighbors (KNN)
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号