首页 | 本学科首页   官方微博 | 高级检索  
     

基于随机抽样的加速 K-均值聚类方法
引用本文:王秀华.基于随机抽样的加速 K-均值聚类方法[J].计算机与现代化,2013(12):27-29,33.
作者姓名:王秀华
作者单位:晋中学院计算机学院,山西晋中030600
摘    要:针对传统K-均值聚类方法不能有效处理大规模数据聚类的问题,提出一种基于随机抽样的加速K-均值聚类(K-means Clustering Algorithm Based on Random Sampling , Kmeans_RS)方法,以提高传统K-均值聚类方法的效率。首先从大规模的聚类数据集中进行随机抽样,得到规模较小的工作集,在工作集上进行传统K-均值聚类,得到聚类中心和半径,并得到抽样结果;然后通过衡量剩下的聚类样本与已得到的抽样结果之间的关系,对剩余的样本进行归类。该方法通过随机抽样大大地减小了参与K-均值聚类的问题规模,从而有效提高了聚类效率,可解决大规模数据的聚类问题。实验结果表明,Kmeans_RS方法在大规模数据集中在保持聚类效果的同时大幅度提高了聚类效率。

关 键 词:K-均值聚类  随机抽样  中心  半径  工作集  效率

A Speeding K-means Clustering Method Based on S ampling
WANG Xiu-hua.A Speeding K-means Clustering Method Based on S ampling[J].Computer and Modernization,2013(12):27-29,33.
Authors:WANG Xiu-hua
Affiliation:WANG Xiu-hua (School of Computer Science and Technology, Jinzhong College, Jinzhong 030600, China)
Abstract:To solve problems that traditional K -means clustering algorithm can not solve the large scale dataset clustering , this pa-per presents a speeding K-means clustering method based on random sampling ,called Kmeans_RS clustering algorithm .The working set is selected from the original clustering dataset by random sampling and the traditional K -means clustering method is executed on this working set .Then the center and radius of every cluster is computed and the sampling result is obtained .The last clustering result of all dataset is obtained by measuring the relationship of sampling result and other data to cluster the remai -ning data.The random sampling way is used in this process and the size of K-means clustering is decreased , so the clustering ef-ficiency is improved largely and it can be used to solve the large scale clustering problems .Simulation results demonstrate that the excellent clustering efficiency is obtained by this parallel speeding K-means method .
Keywords:K-means clustering  random sampling  center  radius  working set  efficiency
本文献已被 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号