首页 | 本学科首页   官方微博 | 高级检索  
     

基于MapReduce框架下的K-means聚类算法的改进
引用本文:宋阳,石鸿雁.基于MapReduce框架下的K-means聚类算法的改进[J].计算机与现代化,2019,0(8):28-32,43.
作者姓名:宋阳  石鸿雁
作者单位:沈阳工业大学理学院,辽宁 沈阳,110870;沈阳工业大学理学院,辽宁 沈阳,110870
基金项目:国家自然科学基金资助项目(61074005); 辽宁省高等学校优秀科技人才支持计划项目(LR2012005)
摘    要:针对K-means算法处理海量数据的聚类效果和速率,提出一种基于MapReduce框架下的K-means算法分布式并行化编程模型。首先对K-means聚类算法初始化敏感的问题,给出一种新的相异度函数,根据数据间的相异程度来确定k值,并选取相异度较小的点作为初始聚类中心,再把K-means算法部署在MapReduce编程模型上,通过改进MapReduce编程模型来加快K-means算法处理海量数据的速度。实验表明,基于MapReduce框架下改进的K-means算法与传统的K-means算法相比,准确率及收敛时间方面均有所提高,并且并行聚类模型在不同数据规模和计算节点数目上具有良好的扩展性。

关 键 词:K-MEANS算法  相异度函数  MAPREDUCE模型
收稿时间:2019-08-16

Improved K-means Clustering Algorithm Based on MapReduce Framework
SONG Yang,SHI Hong-yan.Improved K-means Clustering Algorithm Based on MapReduce Framework[J].Computer and Modernization,2019,0(8):28-32,43.
Authors:SONG Yang  SHI Hong-yan
Affiliation:(School of Science, Shenyang University of Technology, Shenyang 110870, China)
Abstract:Aiming at the clustering effect and speed of K-means algorithm in processing massive data, a distributed parallel programming model of K-means clustering algorithm based on MapReduce framework is proposed. First, for the sensitive initialization problem of K-means clustering algorithm, a new dissimilarity function is given, according to the degree of dissimilarity between data, k value is determined, and the point with smaller dissimilarity is selected as the initial clustering center, then the K-means algorithm is deployed on the MapReduce programming model, K-means algorithm speeds up to deal with massive data by improving MapReduce programming model. Experiments show that both accuracy and convergence time of the improved K-means algorithm under MapReduce are improved compared with the traditional K-means algorithm, and the parallel clustering model has good expansivity in different data scales and the number of calculated nodes.
Keywords:K-means algorithm  dissimilarity function  MapReduce model  
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《计算机与现代化》浏览原始摘要信息
点击此处可从《计算机与现代化》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号