首页 | 本学科首页   官方微博 | 高级检索  
     

K-均值聚类算法的MapReduce模型实现
引用本文:王鹏,王睿婕. K-均值聚类算法的MapReduce模型实现[J]. 长春理工大学学报(自然科学版), 2015, 0(3): 120-124. DOI: 10.3969/j.issn.1672-9870.2015.03.029
作者姓名:王鹏  王睿婕
作者单位:长春理工大学 计算机科学技术学院,长春,130022
摘    要:针对日益严峻的大数据处理时间长、执行速率低等问题,通过深入分析,提出了一种提高大规模数据聚类效率的方法。以K-均值聚类算法为原型,利用Map Reduce模型在大规模数据处理方面的优势,对原有算法进行并行化改进,设计出一种基于Hadoop分布式云平台的K-均值聚类Map Reduce模型。应用此模型,对淘宝用户仿真数据进行聚类试验,试验结果表明,对K-均值聚类算法的Map Reduce模型实现后,性能优于原算法性能,缩短了聚类时间,提高了聚类效率,特别适于对海量数据进行聚类处理。

关 键 词:大数据  MapReduce模型  K-均值聚类算法

The K-means Clustering Algorithm Research Based on the MapReduce Model
WANG Peng , WANG Ruijie. The K-means Clustering Algorithm Research Based on the MapReduce Model[J]. Journal of Changchun University of Science and Technology, 2015, 0(3): 120-124. DOI: 10.3969/j.issn.1672-9870.2015.03.029
Authors:WANG Peng    WANG Ruijie
Abstract:Increasingly grim for a long time big data processing, and low execution rate, through in-depth analysis, this paper presents a method to improve the efficiency of large-scale data clustering methods.K- means clustering algo-rithm to prototype, utilizing the advantages of MapReduce model for large-scale data processing, the original algorithm parallelization improvements designed K- means clustering algorithm model based on Hadoop MapReduce distributed cloud platform .Using this model,the simulation data for Taobao users to cluster trial,which demonstrated the feasibili-ty of this method,shortening the clustering time,especially suitable for massive data clustering process.
Keywords:big data  MapReduce programming model  K-means clustering algorithm
本文献已被 CNKI 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号