首页 | 本学科首页   官方微博 | 高级检索  
     

基于多维网格空间的改进K-means聚类算法
引用本文:邵伦,周新志,赵成萍,张旭.基于多维网格空间的改进K-means聚类算法[J].计算机应用,2018,38(10):2850-2855.
作者姓名:邵伦  周新志  赵成萍  张旭
作者单位:四川大学 电子信息学院, 成都 610065
基金项目:国家973计划项目(2013CB328903-2)。
摘    要:K-means算法是被广泛使用的一种聚类算法,传统的K-means算法中初始聚类中心的选择具有随机性,易使算法陷入局部最优,聚类结果不稳定。针对此问题,引入多维网格空间的思想,首先将样本集映射到一个虚拟的多维网格空间结构中,然后从中搜索出包含样本数最多且距离较远的子网格作为初始聚类中心网格,最后计算出各初始聚类中心网格中所包含样本的均值点来作为初始聚类中心。此法选择出来的初始聚类中心与实际聚类中心拟合度高,进而可据此初始聚类中心稳定高效地得到最终的聚类结果。通过使用计算机模拟数据集和UCI机器学习数据集进行测试,结果表明改进算法的迭代次数和错误率比较稳定,且均小于传统K-means算法测试结果的平均值,能有效避免陷入局部最优,并且聚类结果稳定。

关 键 词:K-means算法  聚类算法  初始聚类中心  多维网格空间  均值点  
收稿时间:2018-04-23
修稿时间:2018-06-24

Improved K-means clustering algorithm based on multi-dimensional grid space
SHAO Lun,ZHOU Xinzhi,ZHAO Chengping,ZHANG Xu.Improved K-means clustering algorithm based on multi-dimensional grid space[J].journal of Computer Applications,2018,38(10):2850-2855.
Authors:SHAO Lun  ZHOU Xinzhi  ZHAO Chengping  ZHANG Xu
Affiliation:College of Electronics and Information Engineering, Sichuan University, Chengdu Sichuan 610065, China
Abstract:K-means algorithm is a widely used clustering algorithm, but the selection of the initial clustering centers in the traditional K-means algorithm is random, which makes the algorithm easily fall into local optimum and causes instability in the clustering result. In order to solve this problem, the idea of multi-dimensional grid space was introduced to the selection of initial clustering center. Firstly, the sample set was mapped to a virtual multi-dimensional grid space structure. Secondly, the sub-grids containing the largest number of samples and being far away from each other were searched as the initial cluster center grids in the space structure. Finally, the mean points of the samples in the initial cluster center grids were calculated as the initial clustering centers. The initial clustering centers chosen by this method are very close to the actual clustering centers, so that the final clustering result can be obtained stably and efficiently. By using computer simulation data set and UCI machine learning data sets to test, both the iterative number and error rate of the improved algorithm are stable, and smaller than the average of the traditional K-means algorithm. The improved algorithm can effectively avoid falling into local optimum and guarantee the stability of clustering result.
Keywords:K-means algorithm  clustering algorithm  initial clustering center  multi-dimensional grid space  mean point  
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号