首页 | 本学科首页   官方微博 | 高级检索  
     

基于MapReduce的分治k均值聚类方法
引用本文:臧艳辉,席运江,赵雪章.基于MapReduce的分治k均值聚类方法[J].计算机工程与设计,2020,41(5):1345-1351.
作者姓名:臧艳辉  席运江  赵雪章
作者单位:佛山职业技术学院电子信息学院,广东佛山528137;华南理工大学经济管理学院,广东广州510000
基金项目:佛山市科技计划;国家自然科学基金
摘    要:针对原始k均值法在MapReduce建模中执行时间较长和聚类结果欠佳问题,提出一种基于MapReduce的分治k均值聚类方法。采取分治法处理大数据集,将所要处理的整个数据集拆分为较小的块并存储在每台机器的主存储器中;通过可用的机器传播,将数据集的每个块由其分配的机器独立地进行聚类;采用最小加权距离确定数据点应该被分配的类簇,判断收敛性。实验结果表明,与传统k均值聚类方法和流式k均值聚类方法相比,所提方法用时更短,结果更优。

关 键 词:数据聚类  基于MapReduce的聚类  分治法  大数据  k均值法

Divide and conquer k-means clustering method based on MapReduce
ZANG Yan-hui,XI Yun-jiang,ZHAO Xue-zhang.Divide and conquer k-means clustering method based on MapReduce[J].Computer Engineering and Design,2020,41(5):1345-1351.
Authors:ZANG Yan-hui  XI Yun-jiang  ZHAO Xue-zhang
Affiliation:(School of Electronic Information,Foshan Polytechnic,Foshan 528137,China;School of Economics and Management,South China University of Technology,Guangzhou 510000,China)
Abstract:Aiming at the problems of long execution time and poor clustering results of original k-means method in MapReduce modeling,a divide-and-conquer k-means clustering method based on MapReduce was proposed.Divide and conquer was adopted to process large data sets.The whole data set to be processed was broken into smaller blocks and stored in the main memory of each machine.Through available machine propagation,each block of the data set was clustered independently by its allocated machine.The minimum weighted distance was used to determine the class cluster to which the data points should be assigned,and the convergence was judged.Experimental results show that,compared with the traditional k-means clustering method and the streaming k-means clustering method,the proposed method has shorter application time and better results.
Keywords:data clustering  MapReduce-based clustering  divide and conquer method  big data  k-means method
本文献已被 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号