面向大规模数据的快速并行聚类划分算法研究 Study of Fast Parallel Clustering Partition Algorithm for Large Data Sets期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

面向大规模数据的快速并行聚类划分算法研究

引用本文：	牛新征,佘堑.面向大规模数据的快速并行聚类划分算法研究[J].计算机科学,2012,39(1):134-137,151.

作者姓名：	牛新征佘堑

作者单位：	电子科技大学计算机科学与工程学院成都610054

基金项目：	中央高校基本科研业务费电子科技大学项目

摘要：	随着聚类分析中处理数据量的急剧增加,面对大规模数据,传统K-Means聚类算法面临着巨大挑战。为了提高传统K-Means聚类算法的效率,针对已有基于MPI的并行K-Means聚类算法和基于Hadoop的分布式K-Means云聚类算法,从聚心初始化和通信模式等入手,提出了改进思路和具体实现。实验结果表明,所提算法能大大减少通信量和计算量,具有较高的执行效率。研究结果可以为以后设计更好的大规模数据快速并行聚类划分算法提供研究依据。
关键词：	云计算 K-Means 大规模数据 MPI Hadoop
Study of Fast Parallel Clustering Partition Algorithm for Large Data Sets

NIU Xin-zheng , SHE Kun.Study of Fast Parallel Clustering Partition Algorithm for Large Data Sets[J].Computer Science,2012,39(1):134-137,151.

Authors:	NIU Xin-zheng SHE Kun

Affiliation:	Kun(School of Computing Science and Engineering,UESTC,Chengdu 610054,China)

Abstract:	With the rapid increase of data amounts in clustering algorithms' processing, traditional K-Means clustering algorithm is facing huge challenge for large data sets. In order to improve efficiency of traditional K-Means clustering algorithm, this paper proposed some improvement ideas and implementation using the cluster center initialization and communication mode, according to parallel clustering algorithm based on MPI and distributed clustering algorithm based on Hadoop in cloud. The results show that research of the algorithm can reduce the communication and computation largely, and can have higher implementation efficiency. I}hc research fruits will help us to design better and fast parallel clustering partition algorithm for large data sets in future.

Keywords:	Cloud computing K-means Large data sets Message passing interface Hadoop
本文献已被 CNKI 万方数据等数据库收录！
	点击此处可从《计算机科学》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏