首页 | 本学科首页   官方微博 | 高级检索  
     

结合K近邻的改进密度峰值聚类算法
引用本文:薛小娜,高淑萍,彭弘铭,吴会会.结合K近邻的改进密度峰值聚类算法[J].计算机工程与应用,2018,54(7):36-43.
作者姓名:薛小娜  高淑萍  彭弘铭  吴会会
作者单位:1.西安电子科技大学 数学与统计学院,西安 710126 2.西安电子科技大学 通信工程学院,西安 710071
摘    要:针对密度峰值聚类算法(DPC)在处理维数较高、含噪声及结构复杂数据集时聚类性能不佳问题,提出一种结合K近邻的改进密度峰值聚类算法(IDPCA)。该算法首先给出新的局部密度度量方法来描述每个样本在空间中的分布情况,然后引入核心点的概念并结合K近邻思想设计了全局搜索分配策略,通过不断将核心点的未分配K近邻正确归类以加快聚类速度,进而提出一种基于K近邻加权的统计学习分配策略,利用剩余点的K近邻加权信息来确定其被分配到各局部类的概率,有效提高了聚类质量。实验结果表明,IDPCA算法在21个典型的测试数据集上均有良好的适用性,而在与DPC算法及另外3种典型聚类算法的性能指标对比上,其优势更为明显。

关 键 词:数据挖掘  聚类算法  局部密度  密度峰值  K近邻  

Improved density peaks clustering algorithm combining K-Nearest Neighbors
XUE Xiaona,GAO Shuping,PENG Hongming,WU Huihui.Improved density peaks clustering algorithm combining K-Nearest Neighbors[J].Computer Engineering and Applications,2018,54(7):36-43.
Authors:XUE Xiaona  GAO Shuping  PENG Hongming  WU Huihui
Affiliation:1. School of Mathematics and Statistics, Xidian University, Xi’an 710126, China 2. School of Telecommunications Engineering, Xidian University, Xi’an 710071, China
Abstract:Concerning the problem that Density Peaks Clustering(DPC) algorithm has poor performance on the datasets with high dimension, noise and complex structure, an Improved Density Peaks Clustering Algorithm(IDPCA) combining K-Nearest Neighbors is proposed. Firstly, a new definition of local density is proposed to describe the distribution of the spatial samples. Secondly, the concept of core point is introduced and a global search allocation strategy is designed based on K-Nearest Neighbors thought to classify the unassigned K-Nearest Neighbors of core points correctly, which accelerates the clustering speed. Thirdly, a statistical learning allocation strategy is developed, by using the weighted K-Nearest Neighbors’ information of the unassigned points to calculate the probability of them being assigned to each local cluster, which improves the clustering quality effectively. Finally, compared with DPC and other three classical clustering methods on 21 test datasets including synthetic and real-world datasets, the experimental results show that IDPCA outperforms them on four different evaluation indexes.
Keywords:data mining  clustering algorithm  local density  density peaks  K-Nearest Neighbors  
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号