Partition-and-merge based fuzzy genetic clustering algorithm for categorical data |
| |
Affiliation: | 1. School of Computer and Software Engineering, Xihua University, Chengdu, Sichuan, 610039, China;2. School of Electrical and Electronic Engineering, University of Adelaide, Adelaide, SA 5005, Australia;3. School of Electrical Engineering and Electronic Information, Xihua University, Chengdu, Sichuan, 610039, China;4. Research Group of Natural Computing, Department of Computer Science and Artificial Intelligence, University of Seville, Sevilla, 41012, Spain |
| |
Abstract: | Categorical data clustering is a difficult and challenging task due to the special characteristic of categorical attributes: no natural order. Thus, this study aims to propose a two-stage method named partition-and-merge based fuzzy genetic clustering algorithm (PM-FGCA) for categorical data. The proposed PM-FGCA uses a fuzzy genetic clustering algorithm to partition the dataset into a maximum number of clusters in the first stage. Then, the merge stage is designed to select two clusters among the clusters that generated in the first stage based on its inter-cluster distances and merge two selected clusters to one cluster. This procedure is repeated until the number of clusters equals to the predetermined number of clusters. Thereafter, some particular instances in each cluster are considered to be re-assigned to other clusters based on the intra-cluster distances. The proposed PM-FGCA is implemented on ten categorical datasets from UCI machine learning repository. In order to evaluate the clustering performance, the proposed PM-FGCA is compared with some existing methods such as k-modes algorithm, fuzzy k-modes algorithm, genetic fuzzy k-modes algorithm, and non-dominated sorting genetic algorithm using fuzzy membership chromosomes. Adjusted Ranked Index (ARI), Normalized Mutual Information (NMI), and Davies–Bouldin (DB) index are selected as three clustering validation indices which are represented to both external index (i.e., ARI and NMI) and internal index (i.e., DB). Consequently, the experimental result shows that the proposed PM-FGCA outperforms the benchmark methods in terms of the tested indices. |
| |
Keywords: | Categorical data Fuzzy clustering Genetic algorithm Partition-and-merge |
本文献已被 ScienceDirect 等数据库收录! |
|