首页 | 本学科首页   官方微博 | 高级检索  
     

基于MapReduce的Canopy-Kmeans改进算法
引用本文:毛典辉.基于MapReduce的Canopy-Kmeans改进算法[J].计算机工程与应用,2012,48(27):22-26,68.
作者姓名:毛典辉
作者单位:北京工商大学计算机与信息工程学院,北京,100048
基金项目:国家自然科学基金(No.2009ZX05038-001);北京市属高等学校科学技术与研究生教育创新工程建设项目(No.PXM2012_014213_000037)
摘    要:针对分布式Canopy-Kmeans算法中Canopy选取的随机性问题,采用"最小最大原则"对该算法进行了改进,避免了Cannopy选取的盲目性;采用MapReduce并行计算框架对算法进行了并行扩展,使之能够充分利用集群的计算和存储能力,从而适应海量数据的应用场景。以海量互联网新闻信息聚类作为应用背景,对改进后的算法进行了实验分析。实验结果表明:该方法较随机挑选Canopy策略在分类准确率以及抗噪能力上都明显提高,而且在处理海量数据时表现出较大的性能优势。

关 键 词:Canopy-Kmeans算法  MapReduce  分布式聚类

Improved Canopy-Kmeans algorithm based on MapReduce
MAO Dianhui.Improved Canopy-Kmeans algorithm based on MapReduce[J].Computer Engineering and Applications,2012,48(27):22-26,68.
Authors:MAO Dianhui
Affiliation:MAO Dianhui School of Computer and Information Engineering,Beijing Technology and Business University,Beijing 100048,China
Abstract:In order to solve the problem that how to void random Canopy selection of Canopy-Kmeans algorithm,this paper introduces an improved algorithm based on the minimum and maximum principle and realizes processing massive data based on MapReduce framework.Meanwhile,the algorithm is carried out in massive Internet news aggregation.The experiments show that the strategy of Canopy selection based on the minimum and maximum principle has higher classification accuracy and noise immunity compared to random strategy.
Keywords:Canopy-Kmeans  MapReduce  distributed aggregation
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号