首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Feature selection is a dimensionality reduction technique that helps to improve data visualization, simplify learning, and enhance the efficiency of learning algorithms. The existing redundancy-based approach, which relies on relevance and redundancy criteria, does not account for feature complementarity. Complementarity implies information synergy, in which additional class information becomes available due to feature interaction. We propose a novel filter-based approach to feature selection that explicitly characterizes and uses feature complementarity in the search process. Using theories from multi-objective optimization, the proposed heuristic penalizes redundancy and rewards complementarity, thus improving over the redundancy-based approach that penalizes all feature dependencies. Our proposed heuristic uses an adaptive cost function that uses redundancy–complementarity ratio to automatically update the trade-off rule between relevance, redundancy, and complementarity. We show that this adaptive approach outperforms many existing feature selection methods using benchmark datasets.  相似文献   

2.
Feature subset selection is a substantial problem in the field of data classification tasks. The purpose of feature subset selection is a mechanism to find efficient subset retrieved from original datasets to increase both efficiency and accuracy rate and reduce the costs of data classification. Working on high-dimensional datasets with a very large number of predictive attributes while the number of instances is presented in a low volume needs to be employed techniques to select an optimal feature subset. In this paper, a hybrid method is proposed for efficient subset selection in high-dimensional datasets. The proposed algorithm runs filter-wrapper algorithms in two phases. The symmetrical uncertainty (SU) criterion is exploited to weight features in filter phase for discriminating the classes. In wrapper phase, both FICA (fuzzy imperialist competitive algorithm) and IWSSr (Incremental Wrapper Subset Selection with replacement) in weighted feature space are executed to find relevant attributes. The new scheme is successfully applied on 10 standard high-dimensional datasets, especially within the field of biosciences and medicine, where the number of features compared to the number of samples is large, inducing a severe curse of dimensionality problem. The comparison between the results of our method and other algorithms confirms that our method has the most accuracy rate and it is also able to achieve to the efficient compact subset.  相似文献   

3.
Today, feature selection is an active research in machine learning. The main idea of feature selection is to choose a subset of available features, by eliminating features with little or no predictive information, as well as redundant features that are strongly correlated. There are a lot of approaches for feature selection, but most of them can only work with crisp data. Until now there have not been many different approaches which can directly work with both crisp and low quality (imprecise and uncertain) data. That is why, we propose a new method of feature selection which can handle both crisp and low quality data. The proposed approach is based on a Fuzzy Random Forest and it integrates filter and wrapper methods into a sequential search procedure with improved classification accuracy of the features selected. This approach consists of the following main steps: (1) scaling and discretization process of the feature set; and feature pre-selection using the discretization process (filter); (2) ranking process of the feature pre-selection using the Fuzzy Decision Trees of a Fuzzy Random Forest ensemble; and (3) wrapper feature selection using a Fuzzy Random Forest ensemble based on cross-validation. The efficiency and effectiveness of this approach is proved through several experiments using both high dimensional and low quality datasets. The approach shows a good performance (not only classification accuracy, but also with respect to the number of features selected) and good behavior both with high dimensional datasets (microarray datasets) and with low quality datasets.  相似文献   

4.
“Dimensionality” is one of the major problems which affect the quality of learning process in most of the machine learning and data mining tasks. Having high dimensional datasets for training a classification model may lead to have “overfitting” of the learned model to the training data. Overfitting reduces generalization of the model, therefore causes poor classification accuracy for the new test instances. Another disadvantage of dimensionality of dataset is to have high CPU time requirement for learning and testing the model. Applying feature selection to the dataset before the learning process is essential to improve the performance of the classification task. In this study, a new hybrid method which combines artificial bee colony optimization technique with differential evolution algorithm is proposed for feature selection of classification tasks. The developed hybrid method is evaluated by using fifteen datasets from the UCI Repository which are commonly used in classification problems. To make a complete evaluation, the proposed hybrid feature selection method is compared with the artificial bee colony optimization, and differential evolution based feature selection methods, as well as with the three most popular feature selection techniques that are information gain, chi-square, and correlation feature selection. In addition to these, the performance of the proposed method is also compared with the studies in the literature which uses the same datasets. The experimental results of this study show that our developed hybrid method is able to select good features for classification tasks to improve run-time performance and accuracy of the classifier. The proposed hybrid method may also be applied to other search and optimization problems as its performance for feature selection is better than pure artificial bee colony optimization, and differential evolution.  相似文献   

5.

Presently, while automated depression diagnosis has made great progress, most of the recent works have focused on combining multiple modalities rather than strengthening a single one. In this research work, we present a unimodal framework for depression detection based on facial expressions and facial motion analysis. We investigate a wide set of visual features extracted from different facial regions. Due to high dimensionality of the obtained feature sets, identification of informative and discriminative features is a challenge. This paper suggests a hybrid dimensionality reduction approach which leverages the advantages of the filter and wrapper methods. First, we use a univariate filter method, Fisher Discriminant Ratio, to initially reduce the size of each feature set. Subsequently, we propose an Incremental Linear Discriminant Analysis (ILDA) approach to find an optimal combination of complementary and relevant feature sets. We compare the performance of the proposed ILDA with the batch-mode LDA and also the Composite Kernel based Support Vector Machine (CKSVM) method. The experiments conducted on the Distress Analysis Interview Corpus Wizard-of-Oz (DAIC-WOZ) dataset demonstrate that the best depression classification performance is obtained by using different feature extraction methods in combination rather than individually. ILDA generates better depression classification results in comparison to the CKSVM. Moreover, ILDA based wrapper feature selection incurs lower computational cost in comparison to the CKSVM and the batch-mode LDA methods. The proposed framework significantly improves the depression classification performance, with an F1 Score of 0.805, which is better than all the video based depression detection models suggested in literature, for the DAIC-WOZ dataset. Salient facial regions and well performing visual feature extraction methods are also identified.

  相似文献   

6.
Besides optimizing classifier predictive performance and addressing the curse of the dimensionality problem, feature selection techniques support a classification model as simple as possible. In this paper, we present a wrapper feature selection approach based on Bat Algorithm (BA) and Optimum-Path Forest (OPF), in which we model the problem of feature selection as an binary-based optimization technique, guided by BA using the OPF accuracy over a validating set as the fitness function to be maximized. Moreover, we present a methodology to better estimate the quality of the reduced feature set. Experiments conducted over six public datasets demonstrated that the proposed approach provides statistically significant more compact sets and, in some cases, it can indeed improve the classification effectiveness.  相似文献   

7.
This paper studies a new feature selection method for data classification that efficiently combines the discriminative capability of features with the ridge regression model. It first sets up the global structure of training data with the linear discriminant analysis that assists in identifying the discriminative features. And then, the ridge regression model is employed to assess the feature representation and the discrimination information, so as to obtain the representative coefficient matrix. The importance of features can be calculated with this representative coefficient matrix. Finally, the new subset of selected features is applied to a linear Support Vector Machine for data classification. To validate the efficiency, sets of experiments are conducted with twenty benchmark datasets. The experimental results show that the proposed approach performs much better than the state-of-the-art feature selection algorithms in terms of the evaluating indicator of classification. And the proposed feature selection algorithm possesses a competitive performance compared with existing feature selection algorithms with regard to the computational cost.  相似文献   

8.
Feature selection is used to choose a subset of relevant features for effective classification of data. In high dimensional data classification, the performance of a classifier often depends on the feature subset used for classification. In this paper, we introduce a greedy feature selection method using mutual information. This method combines both feature–feature mutual information and feature–class mutual information to find an optimal subset of features to minimize redundancy and to maximize relevance among features. The effectiveness of the selected feature subset is evaluated using multiple classifiers on multiple datasets. The performance of our method both in terms of classification accuracy and execution time performance, has been found significantly high for twelve real-life datasets of varied dimensionality and number of instances when compared with several competing feature selection techniques.  相似文献   

9.
特征选择方法可以从成千上万个特征中选择合适的少量特征,使模型更加有效、高效。本文考虑到真实场景下高维数据集中特征之间互相关联以及使用复杂网络结构描述特征空间的全局性与合理性,提出无监督场景下的基于复杂网络节点度中心性的特征选择方法。根据特征间的相关性大小,设定阈值选择保留符合要求的关联;再利用保留的关联生成以特征为节点的无向无权重网络结构;最后以衡量节点度中心性的方法筛选此网络中影响力最大的节点集,亦即最优特征子集。本文方法为处理特征重要性及特征冗余增加了灵活性。采用对比实验,将本文方法与常用特征选择或特征提取方法在多个高维数据集上进行性能比较。实验分析结果表明此方法的有效性以及普适性。  相似文献   

10.
11.
With the rapid development of information techniques, the dimensionality of data in many application domains, such as text categorization and bioinformatics, is getting higher and higher. The high‐dimensionality data may bring many adverse situations, such as overfitting, poor performance, and low efficiency, to traditional learning algorithms in pattern classification. Feature selection aims at reducing the dimensionality of data and providing discriminative features for pattern learning algorithms. Due to its effectiveness, feature selection is now gaining increasing attentions from a variety of disciplines and currently many efforts have been attempted in this field. In this paper, we propose a new supervised feature selection method to pick important features by using information criteria. Unlike other selection methods, the main characteristic of our method is that it not only takes both maximal relevance to the class labels and minimal redundancy to the selected features into account, but also works like feature clustering in an agglomerative way. To measure the relevance and redundancy of feature exactly, two different information criteria, i.e., mutual information and coefficient of relevance, have been adopted in our method. The performance evaluations on 12 benchmark data sets show that the proposed method can achieve better performance than other popular feature selection methods in most cases.  相似文献   

12.
13.
An efficient filter feature selection (FS) method is proposed in this paper, the SVM-FuzCoC approach, achieving a satisfactory trade-off between classification accuracy and dimensionality reduction. Additionally, the method has reasonably low computational requirements, even in high-dimensional feature spaces. To assess the quality of features, we introduce a local fuzzy evaluation measure with respect to patterns that embraces fuzzy membership degrees of every pattern in their classes. Accordingly, the above measure reveals the adequacy of data coverage provided by each feature. The required membership grades are determined via a novel fuzzy output kernel-based support vector machine, applied on single features. Based on a fuzzy complementary criterion (FuzCoC), the FS procedure iteratively selects features with maximum additional contribution in regard to the information content provided by previously selected features. This search strategy leads to small subsets of powerful and complementary features, alleviating the feature redundancy problem. We also devise different SVM-FuzCoC variants by employing seven other methods to derive fuzzy degrees from SVM outputs, based on probabilistic or fuzzy criteria. Our method is compared with a set of existing FS methods, in terms of performance capability, dimensionality reduction, and computational speed, via a comprehensive experimental setup, including synthetic and real-world datasets.  相似文献   

14.
ABSTRACT

High dimensional remote sensing data sets typically contain redundancy amongst the features. Traditional approaches to feature selection are prone to instability and selection of sub-optimal features in these circumstances. They can also be computationally expensive, especially when dealing with very large remote sensing data sets. This article presents an efficient, deterministic feature ranking method that is robust to redundancy. Affinity propagation is used to group correlated features into clusters. A relevance criterion is evaluated for each feature. Clusters are then ranked based on the median of the relevance values of their constituent features. The most relevant individual features can then be selected automatically from the best clusters. Other criteria, such as computation time or measurement cost, can optionally be considered interactively when making this selection. The proposed feature selection method is compared to competing filter approach methods on a number of remote sensing data sets containing feature redundancy. Mutual information and naive Bayes relevance criteria were evaluated in conjunction with the feature selection methods. Using the proposed method it was shown that the stability of selected features improved under different data samplings, while similar or better classification accuracies were achieved compared to competing methods.  相似文献   

15.
This paper proposes a filter-based algorithm for feature selection. The filter is based on the partitioning of the set of features into clusters. The number of clusters, and consequently the cardinality of the subset of selected features, is automatically estimated from data. The computational complexity of the proposed algorithm is also investigated. A variant of this filter that considers feature-class correlations is also proposed for classification problems. Empirical results involving ten datasets illustrate the performance of the developed algorithm, which in general has obtained competitive results in terms of classification accuracy when compared to state of the art algorithms that find clusters of features. We show that, if computational efficiency is an important issue, then the proposed filter may be preferred over their counterparts, thus becoming eligible to join a pool of feature selection algorithms to be used in practice. As an additional contribution of this work, a theoretical framework is used to formally analyze some properties of feature selection methods that rely on finding clusters of features.  相似文献   

16.
Quality data mining analysis based on microarray gene expression data is a good approach for disease classification and other fields, such as pharmacology, as well as a useful tool for medical innovation. One of the challenges in classification is that microarrays involve high dimensionality and a large number of redundant and irrelevant features. Feature selection is the most popular method for determining the optimal number of features that will be used for classification. Feature selection is important to accelerate learning, which is represented only by the optimal feature subset. The current approach for microarray feature selection for the filter method is to simply select the top-ranked genes, i.e., keeping the 50 or 100 best-ranked genes. However, the current approach is determined by human intuition; it requires trial and error, and thus, is time-consuming. Accordingly, this study aims to propose a metaheuristic approach for selecting the top n relevant genes in drug microarray data to enhance the minimum redundancy–maximum relevance (mRMR) filter method. Three metaheuristics are applied, namely, particle swarm optimization (PSO), cuckoo search (CS), and artificial bee colony (ABC). Subsequently, k-nearest neighbor and support vector machine are used as classifiers to evaluate classification performance. The experiment used a microarray gene dataset of liver xenobiotic and pharmacological responses. Experimental results show that meta-heuristic is more efficient approaches that have reduced the complexity of the classifier. Furthermore, the results show that mRMR-CS exhibits the best performance compared with mRMR-PSO and mRMR-ABC.  相似文献   

17.
Dimensionality reduction is an important and challenging task in machine learning and data mining. Feature selection and feature extraction are two commonly used techniques for decreasing dimensionality of the data and increasing efficiency of learning algorithms. Specifically, feature selection realized in the absence of class labels, namely unsupervised feature selection, is challenging and interesting. In this paper, we propose a new unsupervised feature selection criterion developed from the viewpoint of subspace learning, which is treated as a matrix factorization problem. The advantages of this work are four-fold. First, dwelling on the technique of matrix factorization, a unified framework is established for feature selection, feature extraction and clustering. Second, an iterative update algorithm is provided via matrix factorization, which is an efficient technique to deal with high-dimensional data. Third, an effective method for feature selection with numeric data is put forward, instead of drawing support from the discretization process. Fourth, this new criterion provides a sound foundation for embedding kernel tricks into feature selection. With this regard, an algorithm based on kernel methods is also proposed. The algorithms are compared with four state-of-the-art feature selection methods using six publicly available datasets. Experimental results demonstrate that in terms of clustering results, the proposed two algorithms come with better performance than the others for almost all datasets we experimented with here.  相似文献   

18.
In this paper, a novel hybrid method, which integrates an effective filter maximum relevance minimum redundancy (MRMR) and a fast classifier extreme learning machine (ELM), has been introduced for diagnosing erythemato-squamous (ES) diseases. In the proposed method, MRMR is employed as a feature selection tool for dimensionality reduction in order to further improve the diagnostic accuracy of the ELM classifier. The impact of the type of activation functions, the number of hidden neurons and the size of the feature subsets on the performance of ELM have been investigated in detail. The effectiveness of the proposed method has been rigorously evaluated against the ES disease dataset, a benchmark dataset, from UCI machine learning database in terms of classification accuracy. Experimental results have demonstrated that our method has achieved the best classification accuracy of 98.89% and an average accuracy of 98.55% via 10-fold cross-validation technique. The proposed method might serve as a new candidate of powerful methods for diagnosing ES diseases.  相似文献   

19.
Hyperspectral images usually consist of hundreds of spectral bands, which can be used to precisely characterize different land cover types. However, the high dimensionality also has some disadvantages, such as the Hughes effect and a high storage demand. Band selection is an effective method to address these issues. However, most band selection algorithms are conducted with the high-dimensional band images, which will bring high computation complexity and may deteriorate the selection performance. In this paper, spatial feature extraction is used to reduce the dimensionality of band images and improve the band selection performance. The experiment results obtained on three real hyperspectral datasets confirmed that the spatial feature extraction-based approach exhibits more robust classification accuracy when compared with other methods. Besides, the proposed method can dramatically reduce the dimensionality of each band image, which makes it possible for band selection to be implemented in real time situations.  相似文献   

20.
One of the fundamental motivations for feature selection is to overcome the curse of dimensionality problem. This paper presents a novel feature selection method utilizing a combination of differential evolution (DE) optimization method and a proposed repair mechanism based on feature distribution measures. The new method, abbreviated as DEFS, utilizes the DE float number optimizer in the combinatorial optimization problem of feature selection. In order to make the solutions generated by the float-optimizer suitable for feature selection, a roulette wheel structure is constructed and supplied with the probabilities of features distribution. These probabilities are constructed during iterations by identifying the features that contribute to the most promising solutions. The proposed DEFS is used to search for optimal subsets of features in datasets with varying dimensionality. It is then utilized to aid in the selection of Wavelet Packet Transform (WPT) best basis for classification problems, thus acting as a part of a feature extraction process. Practical results indicate the significance of the proposed method in comparison with other feature selection methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号