A subspace decision cluster classifier for text classification |
| |
Authors: | Yan Li Edward Hung Korris Chung |
| |
Affiliation: | 1. Liuxian Road, West Campus, Shenzhen Polytechnic, Shenzhen 518055, China;2. PQ Building, MailBox 56, The Hong Kong Polytechnic University, Hung Hom, KLN, Hong Kong;1. National School of Computer Science, RIADI Laboratory, Manouba, Tunisia;2. Telecom Bretagne, ITI Department, Brest, France;1. Department of Computer Science, Yonsei University, South Korea;2. Department of Computer Engineering, Gachon University, South Korea;3. Department of Integrative Biology and Physiology, University of California, Los Angeles, USA;1. Institute of Information Science and Technologies (ISTI) of the National Research Council (CNR), via G. Moruzzi 1, 56124 Pisa, Italy;2. Linköping University, SE-581 83 Linköping, Sweden;1. School of Science, Ningbo University of Technology, 315211, Ningbo, China;2. School of Applied Mathematics, Xiamen University of Technology, 361024, Xiamen, China;1. Department of Control Science and Engineering, Harbin Institute of Technology (HIT), Harbin 150001, China;2. School of Geography and Planning, Sun Yat-Sen University (SYSU), Guangzhou 510275, China;3. Department of Automatic Test and Control, HIT, Harbin 150001, China |
| |
Abstract: | In this paper, a new classification method (SDCC) for high dimensional text data with multiple classes is proposed. In this method, a subspace decision cluster classification (SDCC) model consists of a set of disjoint subspace decision clusters, each labeled with a dominant class to determine the class of new objects falling in the cluster. A cluster tree is first generated from a training data set by recursively calling a subspace clustering algorithm Entropy Weighting k-Means algorithm. Then, the SDCC model is extracted from the subspace decision cluster tree. Various tests including Anderson–Darling test are used to determine the stopping condition of the tree growing. A series of experiments on real text data sets have been conducted. Their results show that the new classification method (SDCC) outperforms the existing methods like decision tree and SVM. SDCC is particularly suitable for large, high dimensional sparse text data with many classes. |
| |
Keywords: | |
本文献已被 ScienceDirect 等数据库收录! |
|