A Fast Parallel Clustering Algorithm for Large Spatial Databases |
| |
Authors: | Xiaowei Xu Jochen Jäger Hans-Peter Kriegel |
| |
Affiliation: | (1) Corporate Technology, Siemens AG, Otto-Hahn-Ring 6, D-81730 München, Germany;(2) Institute for Computer Science, University of Munich, Oettingenstr. 67, D-80538 München, Germany |
| |
Abstract: | The clustering algorithm DBSCAN relies on a density-based notion of clusters and is designed to discover clusters of arbitrary shape as well as to distinguish noise. In this paper, we present PDBSCAN, a parallel version of this algorithm. We use the ‘shared-nothing’ architecture with multiple computers interconnected through a network. A fundamental component of a shared-nothing system is its distributed data structure. We introduce the dR*-tree, a distributed spatial index structure in which the data is spread among multiple computers and the indexes of the data are replicated on every computer. We implemented our method using a number of workstations connected via Ethernet (10 Mbit). A performance evaluation shows that PDBSCAN offers nearly linear speedup and has excellent scaleup and sizeup behavior. |
| |
Keywords: | clustering algorithms parallel algorithms distributed algorithms scalable data mining distributed index structures spatial databases |
本文献已被 SpringerLink 等数据库收录! |
|