Discovering pattern-based subspace clusters by pattern tree |
| |
Authors: | Jihong Guan Yanglan Gan Hao Wang |
| |
Affiliation: | aDepartment of Computer Science and Technology, Tongji University, Shanghai 201804, China;bDepartment of Computer Science and Technology, Hefei University of Technology, Hefei 23009, China |
| |
Abstract: | Traditional clustering models based on distance similarity are not always effective in capturing correlation among data objects, while pattern-based clustering can do well in identifying correlation hidden among data objects. However, the state-of-the-art pattern-based clustering methods are inefficient and provide no metric to measure the clustering quality. This paper presents a new pattern-based subspace clustering method, which can tackle the problems mentioned above. Observing the analogy between mining frequent itemsets and discovering subspace clusters, we apply pattern tree – a structure used in frequent itemsets mining to determining the target subspaces by scanning the database once, which can be done efficiently in large datasets. Furthermore, we introduce a general clustering quality evaluation model to guide the identifying of meaningful clusters. The proposed new method enables the users to set flexibly proper quality-control parameters to meet different needs. Experimental results on synthetic and real datasets show that our method outperforms the existing methods in both efficiency and effectiveness. |
| |
Keywords: | Clustering analysis Subspace clustering Pattern similarity Pattern tree |
本文献已被 ScienceDirect 等数据库收录! |
|