基于MapReduce的随机抽样K-means算法 K-means algorithm of random sample based on MapReduce期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于MapReduce的随机抽样K-means算法

引用本文：	王永贵,武超,戴伟.基于MapReduce的随机抽样K-means算法[J].计算机工程与应用,2016,52(8):74-79.

作者姓名：	王永贵武超戴伟

作者单位：	辽宁工程技术大学，辽宁葫芦岛 125105

摘要：	K-means算法处理海量数据时，易产生系统内存溢出的现象。利用MapReduce框架改进K-means虽然解决了这个问题，但也存在着聚类效果不稳定以及准确率不高等问题，提出一种改进算法，利用MapReduce框架实现K-means时，采用多次随机抽样，通过计算密度、距离与平方误差等方法，最终选取较优的初始聚类中心，并在迭代中采用新的中心点计算方法。实验结果证明，改进后的算法具有较好的稳定性、准确性和加速比。
关键词：	K-means 随机抽样海量数据 MapReduce
K-means algorithm of random sample based on MapReduce

WANG Yonggui,WU Chao,DAI Wei.K-means algorithm of random sample based on MapReduce[J].Computer Engineering and Applications,2016,52(8):74-79.

Authors:	WANG Yonggui WU Chao DAI Wei

Affiliation:	Liaoning Technical University, Huludao, Liaoning 125105, China

Abstract:	The K-means algorithm when dealing with massive data, is easy to bring the phenomenon of memory overflow. Although this problem is solved by using the MapReduce framework to improve K-means, the phenomenon clustering effect is not so stable and the accuracy is not so high. It is necessary to raise an improved algorithm, which uses MapReduce framework to implement the K-means, by means of random sampling, calculating density, distance and the square difference. Finally, it selects the best initial cluster center and adopts the new method of center point calculation in the iteration. Experimental results show that, the improved algorithm has good stability， accuracy and accelerating ratio.

Keywords:	K-means random sampling massive data MapReduce

	点击此处可从《计算机工程与应用》浏览原始摘要信息
	点击此处可从《计算机工程与应用》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏