K-均值聚类算法的MapReduce模型实现 The K-means Clustering Algorithm Research Based on the MapReduce Model期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

K-均值聚类算法的MapReduce模型实现

引用本文：	王鹏,王睿婕. K-均值聚类算法的MapReduce模型实现[J]. 长春理工大学学报(自然科学版), 2015, 0(3): 120-124. DOI: 10.3969/j.issn.1672-9870.2015.03.029

作者姓名：	王鹏王睿婕

作者单位：	长春理工大学计算机科学技术学院,长春,130022

摘要：	针对日益严峻的大数据处理时间长、执行速率低等问题,通过深入分析,提出了一种提高大规模数据聚类效率的方法。以K-均值聚类算法为原型,利用Map Reduce模型在大规模数据处理方面的优势,对原有算法进行并行化改进,设计出一种基于Hadoop分布式云平台的K-均值聚类Map Reduce模型。应用此模型,对淘宝用户仿真数据进行聚类试验,试验结果表明,对K-均值聚类算法的Map Reduce模型实现后,性能优于原算法性能,缩短了聚类时间,提高了聚类效率,特别适于对海量数据进行聚类处理。
关键词：	大数据 MapReduce模型 K-均值聚类算法
The K-means Clustering Algorithm Research Based on the MapReduce Model

WANG Peng , WANG Ruijie. The K-means Clustering Algorithm Research Based on the MapReduce Model[J]. Journal of Changchun University of Science and Technology, 2015, 0(3): 120-124. DOI: 10.3969/j.issn.1672-9870.2015.03.029

Authors:	WANG Peng WANG Ruijie

Abstract:	Increasingly grim for a long time big data processing, and low execution rate, through in-depth analysis, this paper presents a method to improve the efficiency of large-scale data clustering methods.K- means clustering algo-rithm to prototype, utilizing the advantages of MapReduce model for large-scale data processing, the original algorithm parallelization improvements designed K- means clustering algorithm model based on Hadoop MapReduce distributed cloud platform .Using this model,the simulation data for Taobao users to cluster trial,which demonstrated the feasibili-ty of this method,shortening the clustering time,especially suitable for massive data clustering process.

Keywords:	big data MapReduce programming model K-means clustering algorithm
本文献已被 CNKI 万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏