首页 | 本学科首页   官方微博 | 高级检索  
     

基于网格的快速搜寻密度峰值的聚类算法优化研究
引用本文:孙昊,张明新,戴娇,尚赵伟.基于网格的快速搜寻密度峰值的聚类算法优化研究[J].计算机工程与科学,2017,39(5):964-970.
作者姓名:孙昊  张明新  戴娇  尚赵伟
作者单位:(1.苏州大学计算机科学与技术学院,江苏 苏州 215006; 2.常熟理工学院计算机科学与工程学院,江苏 常熟 215500;3.重庆大学计算机学院,重庆 400030)
基金项目:国家自然科学基金(61173130)
摘    要:CFSFDP是基于密度的新型聚类算法,可聚类非球形数据集,具有聚类速度快、实现简单等优点。然而该算法在指定全局密度阈值dc时未考虑数据空间分布特性,导致聚类质量下降,且无法对多密度峰值的数据集准确聚类。针对以上缺点,提出基于网格分区的CFSFDP(简称GbCFSFDP)聚类算法。该算法利用网格分区方法将数据集进行分区,并对各分区进行局部聚类,避免使用全局dc,然后进行子类合并,实现对数据密度与类间距分布不均匀及多密度峰值的数据集准确聚类。两个典型数据集的仿真实验表明,GbCFSFDP算法比CFSFDP算法具有更加精确的聚类效果。

关 键 词:聚类  密度阈值  网格分区  类合并
收稿时间:2016-01-15
修稿时间:2017-05-25

Optimization of grid based clustering by fast search and find of density peaks
SUN Hao,ZHANG Ming-xin,DAI Jiao,SHANG Zhao-wei.Optimization of grid based clustering by fast search and find of density peaks[J].Computer Engineering & Science,2017,39(5):964-970.
Authors:SUN Hao  ZHANG Ming-xin  DAI Jiao  SHANG Zhao-wei
Affiliation:(1.School of Computer Science and Technology,Soochow University,Suzhou 215006; 2.School of Computer Science and Engineering,Changshu Institute of Technology,Changshu 215500; 3.College of Computer Science,University of Chongqing,Chongqing 400030,China)
Abstract:The CFSFDP is a clustering algorithm based on density peaks, which can cluster arbitrary shape data sets, and has the advantages of fast clustering and simple realization. However, the global density threshold dc, which can lead to the decrease of clustering quality, is specified without the consideration of spatial distribution of the data. Moreover, the data sets with multi-density peaks cannot be clustered accurately. To resolve the above shortcomings, we propose an optimized CFSFDP algorithm based on grid (GbCFSFDP). To avoid the using of global dc, the algorithm divides the data sets into smaller partitions by using the grid partitioning method and performs local clustering on them. Then the GbCFSFDP merges the sub classes. Data sets, which are unevenly distributed and have multi-density peaks, are correctly classified. Simulation experiments of two typical data sets show that the GbCFSFDP algorithm is more accurate than the CFSFDP.
Keywords:clustering  density threshold  grid partition  merging clusters  
点击此处可从《计算机工程与科学》浏览原始摘要信息
点击此处可从《计算机工程与科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号