首页 | 本学科首页   官方微博 | 高级检索  
     


Multi-step density-based clustering
Authors:Stefan Brecheisen  Hans-Peter Kriegel  Martin Pfeifle
Affiliation:(1) Institute for Informatics, University of Munich, Oettingenstr. 67, 80538 Munich, Germany
Abstract:Data mining in large databases of complex objects from scientific, engineering or multimedia applications is getting more and more important. In many areas, complex distance measures are first choice but also simpler distance functions are available which can be computed much more efficiently. In this paper, we will demonstrate how the paradigm of multi-step query processing which relies on exact as well as on lower-bounding approximated distance functions can be integrated into the two density-based clustering algorithms DBSCAN and OPTICS resulting in a considerable efficiency boost. Our approach tries to confine itself to ɛ-range queries on the simple distance functions and carries out complex distance computations only at that stage of the clustering algorithm where they are compulsory to compute the correct clustering result. Furthermore, we will show how our approach can be used for approximated clustering allowing the user to find an individual trade-off between quality and efficiency. In order to assess the quality of the resulting clusterings, we introduce suitable quality measures which can be used generally for evaluating the quality of approximated partitioning and hierarchical clusterings. In a broad experimental evaluation based on real-world test data sets, we demonstrate that our approach accelerates the generation of exact density-based clusterings by more than one order of magnitude. Furthermore, we show that our approximated clustering approach results in high quality clusterings where the desired quality is scalable with respect to (w.r.t.) the overall number of exact distance computations. Stefan Brecheisen is a teaching and research assistant in Prof.$ Hans-Peter Kriegel's group. He works in the field of similarity search in spatial objects. Hans-Peter Kriegel is a full professor at the University of Munich and head of the database group since 1991. He studied computer science at the University of Karlsruhe, Germany, and finished his doctoral thesis there in 1976. He has more than 200 publications in international journals and reviewed conference proceedings. His research interests are database systems for complex objects (molecular biology, medical science, multimedia, CAD, etc.), in particular query processing, similarity search, high-dimensional index structures, as well as knowledge discovery in databases and data mining. Martin Pfeifle is a teaching and research assistant in Prof.$ Hans-Peter Kriegel's group. He finished his doctoral thesis on “Spatial Database Support for Virtual Engineering” in the spring of 2004.
Keywords:Approximated clustering  Complex objects  Data mining  Density-based clustering
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号