基于MapReduce的分治k均值聚类方法 Divide and conquer k-means clustering method based on MapReduce期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于MapReduce的分治k均值聚类方法

引用本文：	臧艳辉,席运江,赵雪章.基于MapReduce的分治k均值聚类方法[J].计算机工程与设计,2020,41(5):1345-1351.

作者姓名：	臧艳辉席运江赵雪章

作者单位：	佛山职业技术学院电子信息学院,广东佛山528137;华南理工大学经济管理学院,广东广州510000

基金项目：	佛山市科技计划;国家自然科学基金

摘要：	针对原始k均值法在MapReduce建模中执行时间较长和聚类结果欠佳问题,提出一种基于MapReduce的分治k均值聚类方法。采取分治法处理大数据集,将所要处理的整个数据集拆分为较小的块并存储在每台机器的主存储器中;通过可用的机器传播,将数据集的每个块由其分配的机器独立地进行聚类;采用最小加权距离确定数据点应该被分配的类簇,判断收敛性。实验结果表明,与传统k均值聚类方法和流式k均值聚类方法相比,所提方法用时更短,结果更优。
关键词：	数据聚类基于MapReduce的聚类分治法大数据 k均值法
Divide and conquer k-means clustering method based on MapReduce

ZANG Yan-hui,XI Yun-jiang,ZHAO Xue-zhang.Divide and conquer k-means clustering method based on MapReduce[J].Computer Engineering and Design,2020,41(5):1345-1351.

Authors:	ZANG Yan-hui XI Yun-jiang ZHAO Xue-zhang

Affiliation:	(School of Electronic Information,Foshan Polytechnic,Foshan 528137,China;School of Economics and Management,South China University of Technology,Guangzhou 510000,China)

Abstract:	Aiming at the problems of long execution time and poor clustering results of original k-means method in MapReduce modeling,a divide-and-conquer k-means clustering method based on MapReduce was proposed.Divide and conquer was adopted to process large data sets.The whole data set to be processed was broken into smaller blocks and stored in the main memory of each machine.Through available machine propagation,each block of the data set was clustered independently by its allocated machine.The minimum weighted distance was used to determine the class cluster to which the data points should be assigned,and the convergence was judged.Experimental results show that,compared with the traditional k-means clustering method and the streaming k-means clustering method,the proposed method has shorter application time and better results.

Keywords:	data clustering MapReduce-based clustering divide and conquer method big data k-means method
本文献已被维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏