首页 | 本学科首页   官方微博 | 高级检索  
     

基于自适应可达距离的密度峰值聚类算法
引用本文:章曼,张正军,冯俊淇,严涛. 基于自适应可达距离的密度峰值聚类算法[J]. 计算机应用, 2022, 42(6): 1914-1921. DOI: 10.11772/j.issn.1001-9081.2021040551
作者姓名:章曼  张正军  冯俊淇  严涛
作者单位:南京理工大学 理学院,南京 210094
基金项目:国家自然科学基金资助项目(11671205)~~;
摘    要:针对基于快速搜索和发现密度峰值的聚类(CFSFDP)算法中截断距离需要人工选取,以及最近邻分配带来的误差导致的在具有不同密度簇的复杂数据集上的聚类效果不佳的问题,提出了一种基于自适应可达距离的密度峰值聚类(ARD-DPC)算法。该算法利用非参数核密度估计方法计算点的局部密度,根据决策图选取聚类中心,并利用自适应可达距离分配数据点,从而得到最终的聚类结果。在4个合成数据集和6个UCI数据集上进行了仿真实验,将所提算法ARD-DPC与基于快速搜索和发现密度峰值的聚类(CFSFDP)、基于密度的噪声应用空间聚类(DBSCAN)、基于密度自适应距离的密度峰聚类(DADPC)算法进行了比较,实验结果表明,相比其他三种算法,ARD-DPC算法在7个数据集上的标准化互信息(NMI)、兰德指数(RI)和F1-measure取得了最大值,在2个数据集分别取得F1-measure和NMI的最大值,只对模糊度较高、聚类特征不明显的Pima数据集聚类效果不佳;同时,ARD-DPC算法在合成数据集上能准确地识别出聚类数目和具有复杂密度的簇。

关 键 词:聚类算法  密度峰值  截断距离  非参数核密度估计  自适应可达距离  
收稿时间:2021-04-12
修稿时间:2021-07-22

Density peak clustering algorithm based on adaptive reachable distance
Man ZHANG,Zhengjun ZHANG,Junqi FENG,Tao YAN. Density peak clustering algorithm based on adaptive reachable distance[J]. Journal of Computer Applications, 2022, 42(6): 1914-1921. DOI: 10.11772/j.issn.1001-9081.2021040551
Authors:Man ZHANG  Zhengjun ZHANG  Junqi FENG  Tao YAN
Affiliation:School of Science,Nanjing University of Science and Technology,Nanjing Jiangsu 210094,China
Abstract:Concerning the problem that the cutoff distance needs to be selected manually in Clustering by Fast Search and Find of Density Peaks (CFSFDP) algorithm, as well as the poor clustering effect on complex datasets with different density clusters due to the error caused by nearest neighbor assignment, a Density Peak Clustering algorithm based on Adaptive Reachable Distance (ARD-DPC) was proposed. In this algorithm, a non-parametric kernel density estimation method was used to calculate the local density of points, and the clustering centers were selected by the decision graph. Then, an adaptive reachable distance was used to assign the data points and obtain the final clustering result. Simulation experiments were conducted on 4 synthetic datasets and 6 UCI datasets, and the proposed algorithm was compared with CFSFDP (Clustering by Fast Search and Find of Density Peaks), DBSCAN (Density-Based Spatial Clustering of Applications with Noise) and DADPC (Density Peaks Clustering based on Density Adaptive distance). Experimental results show that compared to the three other algorithms, the proposed ARD-DPC algorithm achieves the all highest Normalized Mutual Information (NMI), Rand Index (RI) and F1-measure on 4 synthetic datasets and 3 UCI datasets, the only highest NMI on UCI Breast dataset, the only highest F1-measure on UCI Heart dataset, but does not cluster UCI Pima dataset well, which has high fuzzyness and unclear clustering feature. At the same time, ARD-DPC algorithm can accurately identify the number of clusters and clusters with complex density on the synthetic datasets.
Keywords:clustering algorithm  density peak  cutoff distance  non-parametric kernel density estimation  adaptive reachable distance  
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号