首页 | 本学科首页   官方微博 | 高级检索  
     

基于云计算平台的并行DBSCAN算法
引用本文:蔡永强,陈平华,李惠. 基于云计算平台的并行DBSCAN算法[J]. 广东工业大学学报, 2016, 33(1): 51-56. DOI: 10.3969/j.issn.1007-7162.2016.01.010
作者姓名:蔡永强  陈平华  李惠
作者单位:广东工业大学 计算机学院,广东 广州 510006
基金项目:广东省教育部产学研结合资助项目(2012B091000058);广东省专业镇中小微企业服务平台建设资助项目(2012B040500034)
摘    要:DBSCAN算法是一种典型的基于密度的聚类算法,具有速度快、可以发现噪声的优点,但在处理大规模数据时出现聚类效率低、内存和I/O消耗大、聚类精度降低的问题,集群式计算机技术特别是云计算技术的发展提供了解决DBSCAN算法缺陷的方案.文中提出了数据预分区的并行PMDBSCAN算法,该算法在聚类之前对数据分区预处理,利用并行编程模型MapReduce实现DBSCAN算法并行化,结合重叠分区思想,减少I/O消耗.实验结果表明,在大规模数据集上,PMDBSCAN算法聚类有效提高了聚类的速度、减少了I/O消耗、改善了聚类的质量.

关 键 词:大规模数据库; DBSCAN算法; 重叠分区; 映射/归约  
收稿时间:2014-04-02

Parallel DBSCAN Algorithm Based on Cloud Computing Platform
CAI Yong-Qiang,CHEN Ping-Hua,LI Hui. Parallel DBSCAN Algorithm Based on Cloud Computing Platform[J]. Journal of Guangdong University of Technology, 2016, 33(1): 51-56. DOI: 10.3969/j.issn.1007-7162.2016.01.010
Authors:CAI Yong-Qiang  CHEN Ping-Hua  LI Hui
Affiliation:School of Computers, Guangdong University of Technology, Guangzhou 510006, China
Abstract:As a typical representative of clustering algorithm, DBSCAN algorithm has the advantages of fast speed and helps to find the noise of data. However, in big data processing, there are problems of low clustering efficiency, high memory and I/O requirement, and poor clustering precision. With the support of cluster computer technology especially the development of cloud computing, the solutions to the problems of DBSCAN algorithm mentioned above can be provided and progressed significantly. This paper proposes a parallel PMDBSCAN algorithm based on data partition which can pre-process data partition before clustering, realize parallelization of DBSCAN algorithm by parallel programming model MapReduce, and reduce I/O consumption according to overlapping partition. The results show that in dealing with large-scale data the PMDBSCAN algorithm increases the speed of clustering, reduces I/O consumption and improves cluster quality significantly.
Keywords:large-scale database; DBSCAN algorithm; data overlapping partition; MapReduce  
本文献已被 万方数据 等数据库收录!
点击此处可从《广东工业大学学报》浏览原始摘要信息
点击此处可从《广东工业大学学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号