首页 | 本学科首页   官方微博 | 高级检索  
     

基于自然最近邻的密度峰值聚类算法
引用本文:汤鑫瑶,张正军,储杰,严涛. 基于自然最近邻的密度峰值聚类算法[J]. 计算机科学, 2021, 48(3): 151-157. DOI: 10.11896/jsjkx.200100112
作者姓名:汤鑫瑶  张正军  储杰  严涛
作者单位:南京理工大学理学院 南京 210094;南京理工大学理学院 南京 210094;南京理工大学理学院 南京 210094;南京理工大学理学院 南京 210094
摘    要:针对密度峰值聚类算法(Density Peaks Clustering,DPC)需要人为指定截断距离d c,以及局部密度定义简单和一步分配策略导致算法在复杂数据集上表现不佳的问题,提出了一种基于自然最近邻的密度峰值聚类算法(Density Peaks Clustering based on Natural Nearest Neighbor,NNN-DPC)。该算法无需指定任何参数,是一种非参数的聚类方法。该算法首先根据自然最近邻的定义,给出新的局部密度计算方法来描述数据的分布,揭示内在的联系;然后设计了两步分配策略来进行样本点的划分。最后定义了簇间相似度并提出了新的簇合并规则进行簇的合并,从而得到最终聚类结果。实验结果表明,在无需参数的情况下,NNN-DPC算法在各类数据集上都有优秀的泛化能力,对于流形数据或簇间密度差异大的数据能更加准确地识别聚类数目和分配样本点。与DPC、FKNN-DPC(Fuzzy Weighted K-nearest Density Peak Clustering)以及其他3种经典聚类算法的性能指标相比,NNN-DPC算法更具优势。

关 键 词:聚类算法  自然最近邻居  密度峰值  局部密度

Density Peaks Clustering Algorithm Based on Natural Nearest Neighbor
TANG Xin-yao,ZHANG Zheng-jun,CHU Jie,YAN Tao. Density Peaks Clustering Algorithm Based on Natural Nearest Neighbor[J]. Computer Science, 2021, 48(3): 151-157. DOI: 10.11896/jsjkx.200100112
Authors:TANG Xin-yao  ZHANG Zheng-jun  CHU Jie  YAN Tao
Affiliation:(School of Science,Nanjing University of Science and Technology,Nanjing 210094,China)
Abstract:Aiming at the problem that the density peak clustering(DPC)algorithm requires manually selected parameters(cutoff distance d c),as well as the problem of a poor performance on complex data sets caused by the simple definition of local density and the one-step assignment strategy,a new density peak clustering algorithm based on natural nearest neighbors(NNN-DPC)is proposed.The algorithm does not need to specify any parameters and is a non-parametric clustering method.Based on the definition of natural nearest neighbors,this algorithm firstly gives a new local density calculation formula to describe the distribution of data,and reveals the internal connection.A two-step assignment strategy is designed to divide the sample points.Finally,the similarity between clusters is defined,and a new cluster merging rule is proposed to merge the clusters to obtain the final clustering result.The experimental results show that without parameters,the NNN-DPC algorithm has excellent generalization ability on various types of data sets,and can more accurately identify the number and distribution of clusters on manifold data or data with large differences of density between clusters,and assign sample points to the corresponding clusters.Compared with the perfor-mance indicators of DPC,FKNN-DPC,and three other classic clustering algorithms,the NNN-DPC algorithm has a great advantage.
Keywords:Clustering algorithm  Natural nearest neighbor  Density peaks  Local density
本文献已被 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号