KBAC:一种基于K-means的自适应聚类 KBAC:K-means Based Adaptive Clustering for Massive Dataset期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

KBAC:一种基于K-means的自适应聚类

引用本文：	徐晓旻,肖仰华.KBAC:一种基于K-means的自适应聚类[J].小型微型计算机系统,2012(10):2268-2272.

作者姓名：	徐晓旻肖仰华

作者单位：	复旦大学计算机科学技术学院

基金项目：	国家自然科学基金项目(61003001,71071098)资助;高等学校博士学科点专项科研基金项目(20100071120032)资助

摘要：	K-means聚类算法存在的主要不足之处之一在于需要用户指定聚类核数目,在一般应用场景下,用户无法给出合适的聚类核数目.另一方面,K-means聚类所具有的可并行化特点非常适合运用到云计算平台上以处理大规模数据样本的聚类任务.本文提出KBAC算法采用K-means算法作为预聚类过程并在云平台上进行实现和优化,能够自适应确定最佳聚类核数目并进行聚类.其核心思想是将样本空间聚类问题转换为图上社团发现问题.理论和实验证明,通过在云计算框架下实现K-means预聚类过程的并行化,KBAC算法能够高效地对大规模数据进行聚类,并获得高质量的聚类结果.
关键词：	K-means MapReduce 聚类社团发现
KBAC:K-means Based Adaptive Clustering for Massive Dataset

XU Xiao-min,XIAO Yang-hua.KBAC:K-means Based Adaptive Clustering for Massive Dataset[J].Mini-micro Systems,2012(10):2268-2272.

Authors:	XU Xiao-min XIAO Yang-hua

Affiliation:	(School of Computer Science,Fudan University,Shanghai 200433,China)

Abstract:	One of the main drawbacks of K-means clustering algorithm is that the number of clusters should be specified by users.In most of the real application scenarios,it is impossible for the user to provide the number of clusters beforehand.On the other hand,its potential parallelizability provides a way to cluster massive dataset efficiently.In this paper,we proposed KBAC algorithm which adopted K-means algorithm as pre-clustering procedure to cluster massive data adaptively under MapReduce cloud framework.The main idea of the algorithm is to reduce the problem of clustering on vector space to community detection problem on graph.Theoretical and experimental results indicated that KBAC algorithm could enhance the clustering quality and efficiency under cloud.

Keywords:	K-means MapReduce clustering community detection
本文献已被 CNKI 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏