首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
针对多维数据集,为得到一个最优特征子集,提出一种基于特征聚类的封装式特征选择算法。在初始阶段,利用三支决策理论动态地将原始特征集划分为若干特征子空间,通过特征聚类算法对每个特征子空间内的特征进行聚类;从每个特征类簇里挑选代表特征,利用邻域互信息对剩余特征进行降序排序并依次迭代选择,使用封装器评估该特征是否应该被选择,可得到一个具有最低分类错误率的最优特征子集。在UCI数据集上的实验结果表明,相较于其它特征选择算法,该算法能有效地提高各数据集在libSVM、J48、Nave Bayes以及KNN分类器上的分类准确率。  相似文献   

2.
Ensemble of classifiers is a learning paradigm where many classifiers are jointly used to solve a problem. Research has shown that ensemble is very effective for classification tasks. Diversity and accuracy are two basic requirements for the ensemble creation. In this paper, we propose an ensemble creation method based on GA wrapper feature selection. Preliminary experimental results on real-world data show that the proposed method is promising, especially when the number of training data is limited.  相似文献   

3.
基于蝙蝠算法的信息工程监理多目标优化研究   总被引:2,自引:0,他引:2  
为了优化信息工程监理过程中的多目标问题,通过对各目标权重分配方法的改进,构建针对各监理阶段的多目标控制优化模型,采用蝙蝠算法对其进行求解,并同粒子群算法比较,仿真结果表明,该算法能够适用于对信息工程监理多目标优化问题的最优解的搜索且优于基本粒子群算法。  相似文献   

4.
This paper investigates feature selection based on rough sets for dimensionality reduction in Case-Based Reasoning classifiers. In order to be useful, Case-Based Reasoning systems should be able to manage imprecise, uncertain and redundant data to retrieve the most relevant information in a potentially overwhelming quantity of data. Rough Set Theory has been shown to be an effective tool for data mining and for uncertainty management. This paper has two central contributions: (1) it develops three strategies for feature selection, and (2) it proposes several measures for estimating attribute relevance based on Rough Set Theory. Although we concentrate on Case-Based Reasoning classifiers, the proposals are general enough to be applicable to a wide range of learning algorithms. We applied these proposals on twenty data sets from the UCI repository and examined the impact of feature selection over classification performance. Our evaluation shows that all three proposals benefit the basic Case-Based Reasoning system. They also present robustness in comparison to well-known feature selection strategies.  相似文献   

5.
Derived from the traditional manifold learning algorithms, local discriminant analysis methods identify the underlying submanifold structures while employing discriminative information for dimensionality reduction. Mathematically, they can all be unified into a graph embedding framework with different construction criteria. However, such learning algorithms are limited by the curse-of-dimensionality if the original data lie on the high-dimensional manifold. Different from the existing algorithms, we consider the discriminant embedding as a kernel analysis approach in the sample space, and a kernel-view based discriminant method is proposed for the embedded feature extraction, where both PCA pre-processing and the pruning of data can be avoided. Extensive experiments on the high-dimensional data sets show the robustness and outstanding performance of our proposed method.  相似文献   

6.
A new improved forward floating selection (IFFS) algorithm for selecting a subset of features is presented. Our proposed algorithm improves the state-of-the-art sequential forward floating selection algorithm. The improvement is to add an additional search step called “replacing the weak feature” to check whether removing any feature in the currently selected feature subset and adding a new one at each sequential step can improve the current feature subset. Our method provides the optimal or quasi-optimal (close to optimal) solutions for many selected subsets and requires significantly less computational load than optimal feature selection algorithms. Our experimental results for four different databases demonstrate that our algorithm consistently selects better subsets than other suboptimal feature selection algorithms do, especially when the original number of features of the database is large.  相似文献   

7.
求解高维函数优化问题的交叉熵蝙蝠算法   总被引:1,自引:0,他引:1  
为改善蝙蝠算法求解高维函数优化问题的全局搜索能力,提高其搜索精度,将交叉熵方法和蝙蝠算法相结合,提出一种交叉熵蝙蝠算法。该算法将基于重要度抽样和Kullback-Leibler距离的交叉熵全局随机优化算法应用于蝙蝠算法中,采用自适应平滑技术提高算法的收敛速度,利用交叉熵方法的遍历性、自适应性和鲁棒性,有效抑制蝙蝠算法的早熟收敛现象。对经典测试函数和CEC2005测试函数的仿真结果表明,该算法具有全局搜索能力强、求解精度高和鲁棒性好等特性。  相似文献   

8.
We focus on a hybrid approach of feature selection. We begin our analysis with a filter model, exploiting the geometrical information contained in the minimum spanning tree (MST) built on the learning set. This model exploits a statistical test of relative certainty gain, used in a forward selection algorithm. In the second part of the paper, we show that the MST can be replaced by the 1 nearest-neighbor graph without challenging the statistical framework. This leads to a feature selection algorithm belonging to a new category of hybrid models (filter-wrapper). Experimental results on readily available synthetic and natural domains are presented and discussed.  相似文献   

9.
We introduce a novel wrapper Algorithm for Feature Selection, using Support Vector Machines with kernel functions. Our method is based on a sequential backward selection, using the number of errors in a validation subset as the measure to decide which feature to remove in each iteration. We compare our approach with other algorithms like a filter method or Recursive Feature Elimination SVM to demonstrate its effectiveness and efficiency.  相似文献   

10.
基于VPRSM的音频特征选择   总被引:1,自引:0,他引:1  
在音频索引中保持音频特征非常重要,但是在很多情况下特征数量又很庞大,直接处理这些海量数据是非常耗时的.特征选择作为数据挖掘的一个处理步骤,在特征维数的减少和非相关数据的约简方面已经有很成功的使用.提出了一种基于变精度粗糙集模型(variable precision rough setmodel,VPRSM)的音频特征选择算法.实验结果表明,该算法能够得到最小约简,并且最大程度地保持了音频数据的特征,提高检索效率.  相似文献   

11.
特征降维是文本分类过程中的一个重要环节。在现有特征选择方法的基础上,综合考虑特征词在正类和负类中的分布性质,综合四种衡量特征类别区分能力的指标,提出了一个新的特征选择方法,即综合比率(CR)方法。实验采用K-最近邻分类算法(KNN)来考查CR方法的有效性,实验结果表明该方法能够取得比现有特征选择方法更优的降维效果。  相似文献   

12.
Subspace based feature selection for pattern recognition   总被引:1,自引:0,他引:1  
Feature selection is an essential topic in the field of pattern recognition. The feature selection strategy has a direct influence on the accuracy and processing time of pattern recognition applications. Features can be evaluated with either univariate approaches, which examine features individually, or multivariate approaches, which consider possible feature correlations and examine features as a group. Although univariate approaches do not take the correlation among features into consideration, they can provide the individual discriminatory power of the features, and they are also much faster than multivariate approaches. Since it is crucial to know which features are more or less informative in certain pattern recognition applications, univariate approaches are more useful in these cases. This paper therefore proposes subspace based separability measures to determine the individual discriminatory power of the features. These measures are then employed to sort and select features in a multi-class manner. The feature selection performances of the proposed measures are evaluated and compared with the univariate forms of classic separability measures (Divergence, Bhattacharyya, Transformed Divergence, and Jeffries-Matusita) on several datasets. The experimental results clearly indicate that the new measures yield comparable or even better performance than the classic ones in terms of classification accuracy and dimension reduction rate.  相似文献   

13.
This paper develops a manifold-oriented stochastic neighbor projection (MSNP) technique for feature extraction. MSNP is designed to find a linear projection for the purpose of capturing the underlying pattern structure of observations that actually lie on a nonlinear manifold. In MSNP, the similarity information of observations is encoded with stochastic neighbor distribution based on geodesic distance metric, then the same distribution is required to be hold in feature space. This learning criterion not only empowers MSNP to extract nonlinear feature through a linear projection, but makes MSNP competitive as well by reason that distribution preservation is more workable and flexible than rigid distance preservation. MSNP is evaluated in three applications: data visualization for faces image, face recognition and palmprint recognition. Experimental results on several benchmark databases suggest that the proposed MSNP provides a unsupervised feature extraction approach with powerful pattern revealing capability for complex manifold data.  相似文献   

14.
A new scheme, incorporating dimensionality reduction and clustering, suitable for classification of a large volume of remotely sensed data using a small amount of memory is proposed. The scheme involves transforming the data from multidimensional n-space to a 3-dimensional primary color space of blue, green and red coordinates. The dimensionality reduction is followed by data reduction, which involves assigning 3-dimensional samples to a 2-dimensional array. Finally, a multi-stage ISODATA technique incorporating a novel seedpoint picking method is used to obtain the desired number of clusters.

The storage requirements are reduced to a low value by making five passes through the data and storing necessary information during each pass. The first three passes are used to find the minimum and maximum values of some of the variables. The data reduction is done and a classification table is formed during the fourth pass. The classification map is obtained during the fifth pass. The computer memory required is about 2K machine words.

The efficacy of the algorithm is justified by simulation studies using multispectral LANDSAT data.  相似文献   


15.
Medical datasets are often classified by a large number of disease measurements and a relatively small number of patient records. All these measurements (features) are not important or irrelevant/noisy. These features may be especially harmful in the case of relatively small training sets, where this irrelevancy and redundancy is harder to evaluate. On the other hand, this extreme number of features carries the problem of memory usage in order to represent the dataset. Feature Selection (FS) is a solution that involves finding a subset of prominent features to improve predictive accuracy and to remove the redundant features. Thus, the learning model receives a concise structure without forfeiting the predictive accuracy built by using only the selected prominent features. Therefore, nowadays, FS is an essential part of knowledge discovery. In this study, new supervised feature selection methods based on hybridization of Particle Swarm Optimization (PSO), PSO based Relative Reduct (PSO-RR) and PSO based Quick Reduct (PSO-QR) are presented for the diseases diagnosis. The experimental result on several standard medical datasets proves the efficiency of the proposed technique as well as enhancements over the existing feature selection techniques.  相似文献   

16.
This paper presents a novel wrapper feature selection algorithm for classification problems, namely hybrid genetic algorithm (GA)- and extreme learning machine (ELM)-based feature selection algorithm (HGEFS). It utilizes GA to wrap ELM to search for the optimum subsets in the huge feature space, and then, a set of subsets are selected to make ensemble to improve the final prediction accuracy. To prevent GA from being trapped in the local optimum, we propose a novel and efficient mechanism specifically designed for feature selection problems to maintain GA’s diversity. To measure each subset’s quality fairly and efficiently, we adopt a modified ELM called error-minimized extreme learning machine (EM-ELM) which automatically determines an appropriate network architecture for each feature subsets. Moreover, EM-ELM has good generalization ability and extreme learning speed which allows us to perform wrapper feature selection processes in an affordable time. In other words, we simultaneously optimize feature subset and classifiers’ parameters. After finishing the search process of GA, to further promote the prediction accuracy and get a stable result, we select a set of EM-ELMs from the obtained population to make the final ensemble according to a specific ranking and selecting strategy. To verify the performance of HGEFS, empirical comparisons are carried out on different feature selection methods and HGEFS with benchmark datasets. The results reveal that HGEFS is a useful method for feature selection problems and always outperforms other algorithms in comparison.  相似文献   

17.
Software defect prediction is aimed to find potential defects based on historical data and software features. Software features can reflect the characteristics of software modules. However, some of these features may be more relevant to the class (defective or non-defective), but others may be redundant or irrelevant. To fully measure the correlation between different features and the class, we present a feature selection approach based on a similarity measure (SM) for software defect prediction. First, the feature weights are updated according to the similarity of samples in different classes. Second, a feature ranking list is generated by sorting the feature weights in descending order, and all feature subsets are selected from the feature ranking list in sequence. Finally, all feature subsets are evaluated on a k-nearest neighbor (KNN) model and measured by an area under curve (AUC) metric for classification performance. The experiments are conducted on 11 National Aeronautics and Space Administration (NASA) datasets, and the results show that our approach performs better than or is comparable to the compared feature selection approaches in terms of classification performance.  相似文献   

18.
After surveying existing feature selection procedures based upon the Karhunen-Loeve (K-L) expansion, the paper describes a new K-L technique that overcomes some of the limitations of the earlier procedures. The new method takes into account information on both the class variances and means, but lays particular emphasis on the classification potential of the latter. The results of a series of experiments concerned with the classification of real vector-electrocardiogram and artificially generated data demonstrate the advantages of the new method. They suggest that it is particularly useful for pattern recognition when combined with classification procedures based upon discriminant functions obtained by recursive least squares analysis.  相似文献   

19.
This paper presents a novel approach to feature selection based on analysis of class regions which are generated by a fuzzy classifier. A measure for feature evaluation is proposed and is defined as the exception ratio. The exception ratio represents the degree of overlaps in the class regions, in other words, the degree of having exceptions inside of fuzzy rules generated by the fuzzy classifier. It is shown that for a given set of features, a subset of features that has the lowest sum of the exception ratios has the tendency to contain the most relevant features, compared to the other subsets with the same number of features. An algorithm is then proposed that performs elimination of irrelevant features. Given a set of remaining features, the algorithm eliminates the next feature, the elimination of which minimizes the sum of the exception ratios. Next, a terminating criterion is given. Based on this criterion, the proposed algorithm terminates when a significant increase in the sum of the exception ratios occurs due to the next elimination. Experiments show that the proposed algorithm performs well in eliminating irrelevant features while constraining the increase in recognition error rates for unknown data of the classifiers in use.  相似文献   

20.
It is well known that a powerful method to tackle diverse problems with lack of knowledge and/or uncertainty are Fuzzy Logic Systems (FLSs). In the literature, there exist different fuzzy inference mechanisms based on fuzzy variables and fuzzy rules to obtain a solution. In this work we introduce a generalization of the inference algorithm proposed by Mamdani, by using overlap functions and overlap indices. A challenging issue is the selection of most suitable overlap expressions for each problem. For this aim, we propose to use the convex combination of several ones. In this way, the conclusions obtained by our FLSs avoid the bad results obtained by an inadequate overlap expression. We test our proposal on a real problem of forest fire detection using a wireless sensor network.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号