首页 | 本学科首页   官方微博 | 高级检索  
     


“Best K”: critical clustering structures in categorical datasets
Authors:Keke Chen  Ling Liu
Affiliation:(1) Department of Computer Science and Engineering, Wright State University, Dayton, OH, USA;(2) College of Computing, Georgia Institute of Technology, Atlanta, GA, USA
Abstract:The demand on cluster analysis for categorical data continues to grow over the last decade. A well-known problem in categorical clustering is to determine the best K number of clusters. Although several categorical clustering algorithms have been developed, surprisingly, none has satisfactorily addressed the problem of best K for categorical clustering. Since categorical data does not have an inherent distance function as the similarity measure, traditional cluster validation techniques based on geometric shapes and density distributions are not appropriate for categorical data. In this paper, we study the entropy property between the clustering results of categorical data with different K number of clusters, and propose the BKPlot method to address the three important cluster validation problems: (1) How can we determine whether there is significant clustering structure in a categorical dataset? (2) If there is significant clustering structure, what is the set of candidate “best Ks”? (3) If the dataset is large, how can we efficiently and reliably determine the best Ks?
Keywords:Categorical data clustering  Entropy  Cluster validation
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号