首页 | 本学科首页   官方微博 | 高级检索  
     

基于MapReduce并行化计算的大数据聚类算法
引用本文:张文杰,蒋烈辉.基于MapReduce并行化计算的大数据聚类算法[J].计算机应用研究,2020,37(1):53-56.
作者姓名:张文杰  蒋烈辉
作者单位:解放军信息工程大学 网络空间安全学院,郑州450001;数字工程与先进计算国家重点实验室,郑州450001;解放军信息工程大学 网络空间安全学院,郑州450001;数字工程与先进计算国家重点实验室,郑州450001
基金项目:河南省基础前沿课题;河南省科技攻关计划项目
摘    要:面对大数据规模庞大且计算复杂等问题,基于MapReduce框架采用两阶段渐进式的聚类思想,提出了改进的K-means并行化计算的大数据聚类方法。第一阶段,该算法通过Canopy算法初始化划分聚类中心,从而迅速获取粗精度的聚类中心点;第二阶段,基于MapReduce框架提出了并行化计算方案,使每个数据点围绕其邻近的Canopy中心进行细化的聚类或合并,从而对大数据实现快速、准确地聚类分析。在MapReduce并行框架上进行算法验证,实验结果表明,所提算法能够有效地提升并行计算效率,减少计算时间,并提升大数据的聚类精度。

关 键 词:大数据  MapReduce  并行计算  数据聚类
收稿时间:2018/5/6 0:00:00
修稿时间:2018/6/27 0:00:00

Parallel computation algorithm for big data clustering based on MapReduce
ZHANG Wenjie and JIANG Liehui.Parallel computation algorithm for big data clustering based on MapReduce[J].Application Research of Computers,2020,37(1):53-56.
Authors:ZHANG Wenjie and JIANG Liehui
Affiliation:Faculty of Cyberspace Security,PLA Information Engineering University,
Abstract:Aiming at solving the problem of big data''s large scale and complex computation, this paper adopted the idea of two-stage progressive clustering, and proposed a parallel computation algorithm for big data clustering based on MapReduce. In the first stage, this method acquired the initialized clustering center through Canopy algorithm, in order to find relatively accurate cluster center points quickly. In the second stage, it presented a novel scheme of parallel computation based on MapReduce framework, which makes each data node cluster or merge around its adjacent Canopy center node. In this way, the algorithm can make the procedure of data clustering fast and accurately. The results of the experiments deployed on MapReduce show that this algorithm can effectively improve the efficiency of parallel computing, reduce computing time, and improve big data''s clustering accuracy.
Keywords:big data  MapReduce  parallel computation  data clustering
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号