首页 | 本学科首页   官方微博 | 高级检索  
     


A feature cluster taxonomy based feature selection technique
Affiliation:1. Institute of Engineering and Management, Kolkata, India;2. University of Calcutta, Kolkata, India;3. Iwate Prefectural University, Japan;1. Department of Civil, Environmental, Aerospace, and Material Engineering, Polytechnic School, University of Palermo, Italy, Viale delle Scienze, Ed 8, 90128 Palermo, ITALY;2. Department of Energy, Information Engineering and Mathematical Models, Polytechnic School, University of Palermo, Italy, Viale delle Scienze, Ed 8, 90128 Palermo, ITALY
Abstract:Feature subset selection is basically an optimization problem for choosing the most important features from various alternatives in order to facilitate classification or mining problems. Though lots of algorithms have been developed so far, none is considered to be the best for all situations and researchers are still trying to come up with better solutions. In this work, a flexible and user-guided feature subset selection algorithm, named as FCTFS (Feature Cluster Taxonomy based Feature Selection) has been proposed for selecting suitable feature subset from a large feature set. The proposed algorithm falls under the genre of clustering based feature selection techniques in which features are initially clustered according to their intrinsic characteristics following the filter approach. In the second step the most suitable feature is selected from each cluster to form the final subset following a wrapper approach. The two stage hybrid process lowers the computational cost of subset selection, especially for large feature data sets. One of the main novelty of the proposed approach lies in the process of determining optimal number of feature clusters. Unlike currently available methods, which mostly employ a trial and error approach, the proposed method characterises and quantifies the feature clusters according to the quality of the features inside the clusters and defines a taxonomy of the feature clusters. The selection of individual features from a feature cluster can be done judiciously considering both the relevancy and redundancy according to user’s intention and requirement. The algorithm has been verified by simulation experiments with different bench mark data set containing features ranging from 10 to more than 800 and compared with other currently used feature selection algorithms. The simulation results prove the superiority of our proposal in terms of model performance, flexibility of use in practical problems and extendibility to large feature sets. Though the current proposal is verified in the domain of unsupervised classification, it can be easily used in case of supervised classification.
Keywords:
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号