Constraint Score: A new filter method for feature selection with pairwise constraints |
| |
Authors: | Daoqiang Zhang Songcan Chen Zhi-Hua Zhou |
| |
Affiliation: | 1. Department of Computer Technology and Information Systems, Mehmet Akif Ersoy University, 15039 Burdur, Turkey;2. Evolutionary Computation Research Group, School of Engineering and Computer Science, Victoria University of Wellington, Wellington 6140, New Zealand;1. School of Computer Science, China University of Geosciences, Wuhan 430074, PR China;2. School of Computer, National University of Defense Technology, Changsha 410073, PR China;3. School of Computing and Information Technology, University of Wollongong, NSW, 2500, Australia;4. Department of Pharmacy, Huai’an Second People’s Hospital Affiliated to Xuzhou Medical College, Huai’an, 223002, PR China |
| |
Abstract: | Feature selection is an important preprocessing step in mining high-dimensional data. Generally, supervised feature selection methods with supervision information are superior to unsupervised ones without supervision information. In the literature, nearly all existing supervised feature selection methods use class labels as supervision information. In this paper, we propose to use another form of supervision information for feature selection, i.e. pairwise constraints, which specifies whether a pair of data samples belong to the same class (must-link constraints) or different classes (cannot-link constraints). Pairwise constraints arise naturally in many tasks and are more practical and inexpensive than class labels. This topic has not yet been addressed in feature selection research. We call our pairwise constraints guided feature selection algorithm as Constraint Score and compare it with the well-known Fisher Score and Laplacian Score algorithms. Experiments are carried out on several high-dimensional UCI and face data sets. Experimental results show that, with very few pairwise constraints, Constraint Score achieves similar or even higher performance than Fisher Score with full class labels on the whole training data, and significantly outperforms Laplacian Score. |
| |
Keywords: | |
本文献已被 ScienceDirect 等数据库收录! |
|