首页 | 本学科首页   官方微博 | 高级检索  
     

基于K近邻和优化分配策略的密度峰值聚类算法
引用本文:孙林,秦小营,徐久成,薛占熬. 基于K近邻和优化分配策略的密度峰值聚类算法[J]. 软件学报, 2022, 33(4): 1390-1411
作者姓名:孙林  秦小营  徐久成  薛占熬
作者单位:河南师范大学 计算机科学与信息工程学院, 河南 新乡 453007;教育人工智能与个性化学习河南省重点实验室, 河南 新乡 453007
基金项目:国家自然科学基金(62076089,61976082,61772176);河南省科技攻关项目(212102210136)
摘    要:密度峰值聚类(density peak clustering, DPC)是一种简单有效的聚类分析方法.但在实际应用中,对于簇间密度差别大或者簇中存在多密度峰的数据集,DPC很难选择正确的簇中心;同时,DPC中点的分配方法存在多米诺骨牌效应.针对这些问题,提出一种基于K近邻(K-nearest neighbors,KNN)和优化分配策略的密度峰值聚类算法.首先,基于KNN、点的局部密度和边界点确定候选簇中心;定义路径距离以反映候选簇中心之间的相似度,基于路径距离提出密度因子和距离因子来量化候选簇中心作为簇中心的可能性,确定簇中心.然后,为了提升点的分配的准确性,依据共享近邻、高密度最近邻、密度差值和KNN之间距离构建相似度,并给出邻域、相似集和相似域等概念,以协助点的分配;根据相似域和边界点确定初始聚类结果,并基于簇中心获得中间聚类结果.最后,依据中间聚类结果和相似集,从簇中心到簇边界将簇划分为多层,分别设计点的分配策略;对于具体层次中的点,基于相似域和积极域提出积极值以确定点的分配顺序,将点分配给其积极域中占主导地位的簇,获得最终聚类结果.在11个合成数据集和27个真实数据集上进行仿真...

关 键 词:密度峰值聚类  K近邻  簇中心  积极值  分配策略
收稿时间:2021-01-10
修稿时间:2021-07-16

Density Peak Clustering Algorithm Based on K-nearest Neighbors and Optimized Allocation Strategy
SUN Lin,QIN Xiao-Ying,XU Jiu-Cheng,XUE Zhan-Ao. Density Peak Clustering Algorithm Based on K-nearest Neighbors and Optimized Allocation Strategy[J]. Journal of Software, 2022, 33(4): 1390-1411
Authors:SUN Lin  QIN Xiao-Ying  XU Jiu-Cheng  XUE Zhan-Ao
Affiliation:College of Computer and Information Engineering, Henan Normal University, Xinxiang 453007, China;Key Laboratory of Artificial Intelligence and Personalized Learning in Education of Henan Province, Xinxiang 453007, China
Abstract:The density peak clustering (DPC) algorithm is a simple and effective clustering analysis algorithm. However, in real-world practical applications, it is difficult for DPC to select the correct cluster centers for datasets with large differences of density among clusters or multi-density peaks in clusters. Furthermore, the allocation method of points in DPC has a domino effect. To address these issues, a density peak clustering algorithm based on the K-nearest neighbors (KNN) and the optimized allocation strategy was proposed. First, the candidate cluster centers using the KNN, densities of points, and boundary points were determined. The path distance was defined to reflect the similarity between the candidate cluster centers, based on which, the density factor and distance factor were proposed to quantify the possibility of candidate cluster centers as cluster centers, and then the cluster centers were determined. Second, to improve the allocation precision of points, according to the shared nearest neighbors, high density nearest neighbor, density difference, and distance between KNN, the similarity measures were constructed, and then some concepts of the neighborhood, similarity set, and similarity domain were proposed to assist in the allocation of points. The initial clustering results were determined according to the similarity domains and boundary points, and then the intermediate clustering results were achieved based on the cluster centers. Finally, according to the intermediate clustering results and similarity set, the clusters were divided into multiple layers from the cluster centers to the cluster boundaries, for which the allocation strategies of points were designed, respectively. To determine the allocation order of points in the specific layer, the positive value was presented based on the similarity domain and positive domain. The point was allocated to the dominant cluster in its positive domain. Thus, the final clustering results were obtained. The experimental results on 11 synthetic datasets and 27 real datasets demonstrate that our algorithm has good clustering performance in metrics of the purity, F-measure, accuracy, Rand index, adjusted Rand index and normalized mutual information when compared with the state-of-the-art DPC algorithms.
Keywords:density peak clustering  K-nearest neighbors  cluster center  positive value  allocation strategy
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号