首页 | 本学科首页   官方微博 | 高级检索  
     

混合的密度峰值聚类算法
引用本文:王军,周凯,程勇.混合的密度峰值聚类算法[J].计算机应用,2019,39(2):403-408.
作者姓名:王军  周凯  程勇
作者单位:南京信息工程大学计算机与软件学院,南京210044;南京信息工程大学科技产业处,南京210044;南京信息工程大学计算机与软件学院,南京,210044;南京信息工程大学科技产业处,南京,210044
基金项目:国家自然科学基金资助项目(41875184,61373064);江苏省"六大人才高峰"创新团队项目(TD-XYDXX-004);赛尔网络下一代互联网技术创新项目(NGII20170610,NGII20171204);江苏省农业气象重点实验室开放基金资助项目(KYQ1309)。
摘    要:密度峰值聚类(DP)算法是一种新的基于密度的聚类算法,当它处理的单个聚类包含多个密度峰值时,会将每个不同密度峰值视为潜在聚类中心,以致难以在数据集中确定正确数量聚类,为此,提出一种混合的密度峰值聚类算法C-DP。首先,以密度峰值点为初始聚类中心将数据集划分为子簇;然后,借鉴代表点层次聚类算法(CURE),从子簇中选取分散的代表点,将拥有最小距离的代表点对的类进行合并,引入参数收缩因子以控制类的形状。仿真实验结果表明,在4个合成数据集上C-DP算法比DP算法聚类效果更好;在真实数据集上的Rand Index指标对比表明,在数据集S1上,C-DP算法比DP算法性能提高了2.32%,在数据集4k2_far上,C-DP算法比DP算法性能提高了1.13%。由此可见,C-DP算法在单个类簇中包含多密度峰值的数据集中能提高聚类的准确性。

关 键 词:密度峰值  层次聚类  类合并  代表点  收缩因子
收稿时间:2018-07-02
修稿时间:2018-08-24

Mixed density peaks clustering algorithm
WANG Jun,ZHOU Kai,CHENG Yong.Mixed density peaks clustering algorithm[J].journal of Computer Applications,2019,39(2):403-408.
Authors:WANG Jun  ZHOU Kai  CHENG Yong
Affiliation:1. School of Computer & Software, Nanjing University of Information Science & Technology, Jiangsu Nanjing 210044, China;2. Technology Industry Department, Nanjing University of Information Science & Technology, Jiangsu Nanjing 210044, China
Abstract:As a new density-based clustering algorithm, clustering by fast search and find of Density Peaks (DP) algorithm regards each density peak as a potential clustering center when dealing with a single cluster with multiple density peaks, therefore it is difficult to determine the correct number of clusters in the data set. To solve this problem, a mixed density peak clustering algorithm namely C-DP was proposed. Firstly, the density peak points were considered as the initial clustering centers and the dataset was divided into sub-clusters. Then, learned from the Clustering Using Representatives algorithm (CURE), the scattered representative points were selected from the sub-clusters, the clusters of the representative point pairs with the smallest distance were merged, and a parameter contraction factor was introduced to control the shape of the clusters. The experimental results show that the C-DP algorithm has better clustering effect than the DP algorithm on four synthetic datasets. The comparison of the Rand Index indicator on real datasets shows that on the dataset S1 and 4k2_far, the performance of C-DP is 2.32% and 1.13% higher than that of the DP. It can be seen that the C-DP algorithm improves the accuracy of clustering when datasets contain multiple density peaks in a single cluster.
Keywords:density peak                                                                                                                        hierarchical clustering                                                                                                                        class merging                                                                                                                        representative point                                                                                                                        contraction factor
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号