首页 | 本学科首页   官方微博 | 高级检索  
     


Hierarchical clustering of mixed data based on distance hierarchy
Authors:Chung-Chian Hsu  Chin-Long Chen  Yu-Wei Su
Affiliation:Department of Information Management, National Yunlin University of Science and Technology, Douliu, Yunlin 640, Taiwan
Abstract:Data clustering is an important data mining technique which partitions data according to some similarity criterion. Abundant algorithms have been proposed for clustering numerical data and some recent research tackles the problem of clustering categorical or mixed data. Unlike the subtraction scheme used for numerical attributes, there is no standard for measuring distance between categorical values. In this article, we propose a distance representation scheme, distance hierarchy, which facilitates expressing the similarity between categorical values and also unifies distance measuring of numerical and categorical values. We then apply the scheme to mixed data clustering, in particular, to integrate with a hierarchical clustering algorithm. Consequently, this integrated approach can uniformly handle numerical data and categorical data, and also enables one to take the similarity between categorical values into consideration. Experimental results show that the proposed approach produces better clustering results than conventional clustering algorithms when categorical attributes are present and their values have different degree of similarity.
Keywords:Categorical data  Distance hierarchy  Hierarchical clustering  k-Means  Mixed data
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号