首页 | 本学科首页   官方微博 | 高级检索  
     


A Fast Parallel Clustering Algorithm for Large Spatial Databases
Authors:Xiaowei Xu  Jochen Jäger  Hans-Peter Kriegel
Affiliation:(1) Corporate Technology, Siemens AG, Otto-Hahn-Ring 6, D-81730 München, Germany;(2) Institute for Computer Science, University of Munich, Oettingenstr. 67, D-80538 München, Germany
Abstract:The clustering algorithm DBSCAN relies on a density-based notion of clusters and is designed to discover clusters of arbitrary shape as well as to distinguish noise. In this paper, we present PDBSCAN, a parallel version of this algorithm. We use the ‘shared-nothing’ architecture with multiple computers interconnected through a network. A fundamental component of a shared-nothing system is its distributed data structure. We introduce the dR*-tree, a distributed spatial index structure in which the data is spread among multiple computers and the indexes of the data are replicated on every computer. We implemented our method using a number of workstations connected via Ethernet (10 Mbit). A performance evaluation shows that PDBSCAN offers nearly linear speedup and has excellent scaleup and sizeup behavior.
Keywords:clustering algorithms  parallel algorithms  distributed algorithms  scalable data mining  distributed index structures  spatial databases
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号