首页 | 本学科首页   官方微博 | 高级检索  
     


Efficient algorithms based on the k-means and Chaotic League Championship Algorithm for numeric,categorical, and mixed-type data clustering
Affiliation:1. Laboratório de Inteligência Computacional do Araripe, Instituto Federal do Sertão Parnambucano, Brazil;2. Centro de Informática, Universidade Federal de Pernambuco, Brazil;1. Department of Electronic Engineering, City University of Hong Kong, Kowloon, Hong Kong;2. Department of Mathematics, Chinese University of Hong Kong, Shatin, Hong Kong;1. Guangdong Provincial Key Lab. of Computer Integrated Manufacturing Systems, School of Electromechanical Engineering, Guangdong University of Technology, Guangzhou, Guangdong 510006, China;2. Knowledge Management and Innovation Research Centre, Department of Industrial and Systems Engineering, The Hong Kong Polytechnic University, Hung Hom, Hong Kong 999077, China;1. Department of Biomedical Engineering, National Institute of Technology, Raipur, Chhattisgarh, India;2. Department of Computer Applications, National Institute of Technology, Raipur, Chhattisgarh, India;3. Department of Electrical Engineering, National Institute of Technology, Raipur, Chhattisgarh, India
Abstract:The success rates of the expert or intelligent systems depend on the selection of the correct data clusters. The k-means algorithm is a well-known method in solving data clustering problems. It suffers not only from a high dependency on the algorithm's initial solution but also from the used distance function. A number of algorithms have been proposed to address the centroid initialization problem, but the produced solution does not produce optimum clusters. This paper proposes three algorithms (i) the search algorithm C-LCA that is an improved League Championship Algorithm (LCA), (ii) a search clustering using C-LCA (SC-LCA), and (iii) a hybrid-clustering algorithm called the hybrid of k-means and Chaotic League Championship Algorithm (KSC-LCA) and this algorithm has of two computation stages. The C-LCA employs chaotic adaptation for the retreat and approach parameters, rather than constants, which can enhance the search capability. Furthermore, to overcome the limitation of the original k-means algorithm using the Euclidean distance that cannot handle the categorical attribute type properly, we adopt the Gower distance and the mechanism for handling a discrete value requirement of the categorical value attribute. The proposed algorithms can handle not only the pure numeric data but also the mixed-type data and can find the best centroids containing categorical values. Experiments were conducted on 14 datasets from the UCI repository. The SC-LCA and KSC-LCA competed with 16 established algorithms including the k-means, k-means++, global k-means algorithms, four search clustering algorithms and nine hybrids of k-means algorithm with several state-of-the-art evolutionary algorithms. The experimental results show that the SC-LCA produces the cluster with the highest F-Measure on the pure categorical dataset and the KSC-LCA produces the cluster with the highest F-Measure for the pure numeric and mixed-type tested datasets. Out of 14 datasets, there were 13 centroids produced by the SC-LCA that had better F-Measures than that of the k-means algorithm. On the Tic-Tac-Toe dataset containing only categorical attributes, the SC-LCA can achieve an F-Measure of 66.61 that is 21.74 points over that of the k-means algorithm (44.87). The KSC-LCA produced better centroids than k-means algorithm in all 14 datasets; the maximum F-Measure improvement was 11.59 points. However, in terms of the computational time, the SC-LCA and KSC-LCA took more NFEs than the k-means and its variants but the KSC-LCA ranks first and SC-LCA ranks fourth among the hybrid clustering and the search clustering algorithms that we tested. Therefore, the SC-LCA and KSC-LCA are general and effective clustering algorithms that could be used when an expert or intelligent system requires an accurate high-speed cluster selection.
Keywords:
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号