首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Ensemble selection, which aims to select a proper subset of the original whole ensemble, can be seen as a combinatorial optimization problem, and usually can achieve a pruned ensemble with better performance than the original one. Ensemble selection by greedy methods has drawn a lot of attention, and many greedy ensemble selection algorithms have been proposed, many of which focus on the design of a new evaluation measure or on the study about different search directions. It is well accepted that diversity plays a crucial role in ensemble selection methods. Many evaluation measures based on diversity have been proposed and have achieved a good success. However, most of the existing researches have neglected the substantial local optimal problem of greedy methods, which is just the central issue addressed in this paper, where a new Ensemble Selection (GraspEnS) algorithm based on Greedy Randomized Adaptive Search Procedure (GRASP) is proposed. The typical greedy ensemble selection approach is improved by the random factor incorporated into GraspEnS. Moreover, the GraspEnS algorithm realizes multi-start searching and appropriately expands the search range of the typical greedy approaches. Experimental results demonstrate that the newly devised GraspEnS algorithm is able to achieve a final pruned subensemble with comparable or better performance compared with its competitors.  相似文献   

2.
Although greedy algorithms possess high efficiency, they often receive suboptimal solutions of the ensemble pruning problem, since their exploration areas are limited in large extent. And another marked defect of almost all the currently existing ensemble pruning algorithms, including greedy ones, consists in: they simply abandon all of the classifiers which fail in the competition of ensemble selection, causing a considerable waste of useful resources and information. Inspired by these observations, an interesting greedy Reverse Reduce-Error (RRE) pruning algorithm incorporated with the operation of subtraction is proposed in this work. The RRE algorithm makes the best of the defeated candidate networks in a way that, the Worst Single Model (WSM) is chosen, and then, its votes are subtracted from the votes made by those selected components within the pruned ensemble. The reason is because, for most cases, the WSM might make mistakes in its estimation for the test samples. And, different from the classical RE, the near-optimal solution is produced based on the pruned error of all the available sequential subensembles. Besides, the backfitting step of RE algorithm is replaced with the selection step of a WSM in RRE. Moreover, the problem of ties might be solved more naturally with RRE. Finally, soft voting approach is employed in the testing to RRE algorithm. The performances of RE and RRE algorithms, and two baseline methods, i.e., the method which selects the Best Single Model (BSM) in the initial ensemble, and the method which retains all member networks of the initial ensemble (ALL), are evaluated on seven benchmark classification tasks under different initial ensemble setups. The results of the empirical investigation show the superiority of RRE over the other three ensemble pruning algorithms.  相似文献   

3.
Ensemble learning is the process of aggregating the decisions of different learners/models. Fundamentally, the performance of the ensemble relies on the degree of accuracy in individual learner predictions and the degree of diversity among the learners. The trade-off between accuracy and diversity within the ensemble needs to be optimized to provide the best grouping of learners as it relates to their performance. In this optimization theory article, we propose a novel ensemble selection algorithm which, focusing specifically on clustering problems, selects the optimal subset of the ensemble that has both accurate and diverse models. Those ensemble selection algorithms work for a given number of the best learners within the subset prior to their selection. The cardinality of a subset of the ensemble changes the prediction accuracy. The proposed algorithm in this study determines both the number of best learners and also the best ones. We compared our prediction results to recent ensemble clustering selection algorithms by the number of cardinalities and best predictions, finding better and approximated results to the optimum solutions.  相似文献   

4.
Feature selection in MLPs and SVMs based on maximum output information   总被引:5,自引:0,他引:5  
This paper presents feature selection algorithms for multilayer perceptrons (MLPs) and multiclass support vector machines (SVMs), using mutual information between class labels and classifier outputs, as an objective function. This objective function involves inexpensive computation of information measures only on discrete variables; provides immunity to prior class probabilities; and brackets the probability of error of the classifier. The maximum output information (MOI) algorithms employ this function for feature subset selection by greedy elimination and directed search. The output of the MOI algorithms is a feature subset of user-defined size and an associated trained classifier (MLP/SVM). These algorithms compare favorably with a number of other methods in terms of performance on various artificial and real-world data sets.  相似文献   

5.
特征选择算法是微阵列数据分析的重要工具,特征选择算法的分类性能和稳定性对微阵列数据分析至关重要。为了提高特征选择算法的分类性能和稳定性,提出一种面向高维微阵列数据的集成特征选择算法来弥补单个基因子集信息量的不足,提高基因特征选择算法的分类性能和稳定性。该算法首先采用信噪比方法选择若干区分基因;然后对每个区分基因利用条件信息相关系数评估候选基因与区分基因的相关性,生成多个相关基因子集,最后,通过集成学习技术整合多个相似基因子集。实验结果表明,本文提出的集成特征选择算法的分类性能以及稳定性在多数情况下均优于只选择单个基因子集的方法。  相似文献   

6.
理论及实验表明,在训练集上具有较大边界分布的组合分类器泛化能力较强。文中将边界概念引入到组合剪枝中,并用它指导组合剪枝方法的设计。基于此,构造一个度量标准(MBM)用于评估基分类器相对于组合分类器的重要性,进而提出一种贪心组合选择方法(MBMEP)以降低组合分类器规模并提高它的分类准确率。在随机选择的30个UCI数据集上的实验表明,与其它一些高级的贪心组合选择算法相比,MBMEP选择出的子组合分类器具有更好的泛化能力。  相似文献   

7.
Iterated greedy algorithms belong to the class of stochastic local search methods. They are based on the simple and effective principle of generating a sequence of solutions by iterating over a constructive greedy heuristic using destruction and construction phases. This paper, first, presents an efficient randomized iterated greedy approach for the minimum weight dominating set problem, where—given a vertex-weighted graph—the goal is to identify a subset of the graphs’ vertices with minimum total weight such that each vertex of the graph is either in the subset or has a neighbor in the subset. Our proposed approach works on a population of solutions rather than on a single one. Moreover, it is based on a fast randomized construction procedure making use of two different greedy heuristics. Secondly, we present a hybrid algorithmic model in which the proposed iterated greedy algorithm is combined with the mathematical programming solver CPLEX. In particular, we improve the best solution provided by the iterated greedy algorithm with the solution polishing feature of CPLEX. The simulation results obtained on a widely used set of benchmark instances shows that our proposed algorithms outperform current state-of-the-art approaches.  相似文献   

8.
In applications of learning from examples to real-world tasks,feature subset selection is important to speed up training and to improve generalization performance.Ideally,an inductive algorithm should use subset of features as small as possible.In this paper however,the authors show that the problem of selecting the minimum subset of features is NP-hard.The paper then presents a greedy algorithm for reature subset selection.The result of running the greedy algorithm on hand-written numeral recognition problem is also given.  相似文献   

9.
Features selection is the process of choosing the relevant subset of features from the high-dimensional dataset to enhance the performance of the classifier. Much research has been carried out in the present world for the process of feature selection. Algorithms such as Naïve Bayes (NB), decision tree, and genetic algorithm are applied to the high-dimensional dataset to select the relevant features and also to increase the computational speed. The proposed model presents a solution for selection of features using ensemble classifier algorithms. The proposed algorithm is the combination of minimum redundancy and maximum relevance (mRMR) and forest optimization algorithm (FOA). Ensemble-based algorithms such as support vector machine (SVM), K-nearest neighbor (KNN), and NB is further used to enhance the performance of the classifier algorithm. The mRMR-FOA is used to select the relevant features from the various datasets and 21% to 24% improvement is recorded in the feature selection. The ensemble classifier algorithms further improves the performance of the algorithm and provides accuracy of 96%.  相似文献   

10.
In practice, classifiers in an ensemble are not independent. This paper is the continuation of our previous work on ensemble subset selection [A. Ula?, M. Semerci, O.T. Y?ld?z, E. Alpayd?n, Incremental construction of classifier and discriminant ensembles, Information Sciences, 179 (9) (2009) 1298–1318] and has two parts: first, we investigate the effect of four factors on correlation: (i) algorithms used for training, (ii) hyperparameters of the algorithms, (iii) resampled training sets, (iv) input feature subsets. Simulations using 14 classifiers on 38 data sets indicate that hyperparameters and overlapping training sets have higher effect on positive correlation than features and algorithms. Second, we propose postprocessing before fusing using principal component analysis (PCA) to form uncorrelated eigenclassifiers from a set of correlated experts. Combining the information from all classifiers may be better than subset selection where some base classifiers are pruned before combination, because using all allows redundancy.  相似文献   

11.
Feature selection in high-dimensional data is one of the active areas of research in pattern recognition. Most of the algorithms in this area try to select a subset of features in a way to maximize the accuracy of classification regardless of the number of selected features that affect classification time. In this article, a new method for feature selection algorithm in high-dimensional data is proposed that can control the trade-off between accuracy and classification time. This method is based on a greedy metaheuristic algorithm called greedy randomized adaptive search procedure (GRASP). It uses an extended version of a simulated annealing (SA) algorithm for local search. In this version of SA, new parameters are embedded that allow the algorithm to control the trade-off between accuracy and classification time. Experimental results show supremacy of the proposed method over previous versions of GRASP for feature selection. Also, they show how the trade-off between accuracy and classification time is controllable by the parameters introduced in the proposed method.  相似文献   

12.
This paper describes a novel feature selection algorithm for unsupervised clustering, that combines the clustering ensembles method and the population based incremental learning algorithm. The main idea of the proposed unsupervised feature selection algorithm is to search for a subset of all features such that the clustering algorithm trained on this feature subset can achieve the most similar clustering solution to the one obtained by an ensemble learning algorithm. In particular, a clustering solution is firstly achieved by a clustering ensembles method, then the population based incremental learning algorithm is adopted to find the feature subset that best fits the obtained clustering solution. One advantage of the proposed unsupervised feature selection algorithm is that it is dimensionality-unbiased. In addition, the proposed unsupervised feature selection algorithm leverages the consensus across multiple clustering solutions. Experimental results on several real data sets demonstrate that the proposed unsupervised feature selection algorithm is often able to obtain a better feature subset when compared with other existing unsupervised feature selection algorithms.  相似文献   

13.
We present an adaptation of model-based clustering for partially labeled data, that is capable of finding hidden cluster labels. All the originally known and discoverable clusters are represented using localized feature subset selections (subspaces), obtaining clusters unable to be discovered by global feature subset selection. The semi-supervised projected model-based clustering algorithm (SeSProC) also includes a novel model selection approach, using a greedy forward search to estimate the final number of clusters. The quality of SeSProC is assessed using synthetic data, demonstrating its effectiveness, under different data conditions, not only at classifying instances with known labels, but also at discovering completely hidden clusters in different subspaces. Besides, SeSProC also outperforms three related baseline algorithms in most scenarios using synthetic and real data sets.  相似文献   

14.
Reducing the dimensionality of the data has been a challenging task in data mining and machine learning applications. In these applications, the existence of irrelevant and redundant features negatively affects the efficiency and effectiveness of different learning algorithms. Feature selection is one of the dimension reduction techniques, which has been used to allow a better understanding of data and improve the performance of other learning tasks. Although the selection of relevant features has been extensively studied in supervised learning, feature selection in the absence of class labels is still a challenging task. This paper proposes a novel method for unsupervised feature selection, which efficiently selects features in a greedy manner. The paper first defines an effective criterion for unsupervised feature selection that measures the reconstruction error of the data matrix based on the selected subset of features. The paper then presents a novel algorithm for greedily minimizing the reconstruction error based on the features selected so far. The greedy algorithm is based on an efficient recursive formula for calculating the reconstruction error. Experiments on real data sets demonstrate the effectiveness of the proposed algorithm in comparison with the state-of-the-art methods for unsupervised feature selection.  相似文献   

15.
The appearance of social networks provides great opportunities for people to communicate, share and disseminate information. Meanwhile, it is quite challenge for utilizing a social networks efficiently in order to increase the commercial profit or alleviate social problems. One feasible solution is to select a subset of individuals that can positively influence the maximum other ones in the given social network, and some algorithms have been proposed to solve the optimal individual subset selection problem. However, most of the existing works ignored the constraint on time. They assume that the time is either infinite or only suitable to solve the snapshot selection problems. Obviously, both of them are impractical in the real system. Due to such reason, we study the problem of selecting the optimal individual subset to diffuse the positive influence when time is bounded. We proved that such a problem is NP-hard, and a heuristic algorithm based on greedy strategy is proposed. The experimental results on both simulation and real-world social networks based on the trace data in Shanghai show that our proposed algorithm outperforms the existing algorithms significantly, especially when the network structure is sparse.  相似文献   

16.
特征选择技术是机器学习和数据挖掘任务的关键预处理技术。传统贪婪式特征选择方法仅考虑本轮最佳特征,从而导致获取的特征子集仅为局部最优,无法获得最优或者近似最优的特征集合。进化搜索方式则有效地对特征空间进行搜索,然而不同的进化算法在搜索过程中存在自身的局限。本文吸取遗传算法(GA)和粒子群优化算法(PSO)的进化优势,以信息熵度量为评价,通过协同演化的方式获取最终特征子集。并提出适用于特征选择问题特有的比特率交叉算子和信息交换策略。实验结果显示,遗传算法和粒子群协同进化(GA-PSO)在进化搜索特征子集的能力和具体分类学习任务上都优于单独的演化搜索方式。进化搜索提供的组合判断能力优于贪婪式特征选择方法。  相似文献   

17.
基于分类问题的选择性集成学习研究*   总被引:1,自引:0,他引:1  
陈凯 《计算机应用研究》2009,26(7):2457-2459
提出了一种应用于分类问题,以分类回归树为基学习器,并综合了AdaBoost.M1和Bagging算法特点,利用变相似度聚类技术和贪婪算法来进行选择性集成学习的算法——SECAdaBoostBagging Trees,并将其与几种常用的机器学习算法比较研究得出,该算法往往比其他算法具有更好的泛化性能和更高的运行效率。  相似文献   

18.
基于回归问题的选择性集成算法   总被引:1,自引:1,他引:0       下载免费PDF全文
陈凯 《计算机工程》2009,35(21):17-19
提出一种应用于回归问题,以分类回归树为基学习器,并综合Boosting和Bagging算法的特点,利用变相似度聚类技术和贪婪算法来进行选择性集成学习的算法——SER-BagBoosting Trees算法。将其与几种常用的机器学习算法进行比较研究,得出该算法往往比其他集成学习算法具有更好的泛化性能和更高的运行效率。  相似文献   

19.
An ensemble is a group of learners that work together as a committee to solve a problem. The existing ensemble learning algorithms often generate unnecessarily large ensembles, which consume extra computational resource and may degrade the generalization performance. Ensemble pruning algorithms aim to find a good subset of ensemble members to constitute a small ensemble, which saves the computational resource and performs as well as, or better than, the unpruned ensemble. This paper introduces a probabilistic ensemble pruning algorithm by choosing a set of “sparse” combination weights, most of which are zeros, to prune the ensemble. In order to obtain the set of sparse combination weights and satisfy the nonnegative constraint of the combination weights, a left-truncated, nonnegative, Gaussian prior is adopted over every combination weight. Expectation propagation (EP) algorithm is employed to approximate the posterior estimation of the weight vector. The leave-one-out (LOO) error can be obtained as a by-product in the training of EP without extra computation and is a good indication for the generalization error. Therefore, the LOO error is used together with the Bayesian evidence for model selection in this algorithm. An empirical study on several regression and classification benchmark data sets shows that our algorithm utilizes far less component learners but performs as well as, or better than, the unpruned ensemble. Our results are very competitive compared with other ensemble pruning algorithms.  相似文献   

20.
One of the most accurate types of prototype selection algorithms, preprocessing techniques that select a subset of instances from the data before applying nearest neighbor classification to it, are evolutionary approaches. These algorithms result in very high accuracy and reduction rates, but unfortunately come at a substantial computational cost. In this paper, we introduce a framework that allows to efficiently use the intermediary results of the prototype selection algorithms to further increase their accuracy performance. Instead of only using the fittest prototype subset generated by the evolutionary algorithm, we use multiple prototype subsets in an ensemble setting. Secondly, in order to classify a test instance, we only use prototype subsets that accurately classify training instances in the neighborhood of that test instance. In an experimental evaluation, we apply our new framework to four state-of-the-art prototype selection algorithms and show that, by using our framework, more accurate results are obtained after less evaluations of the prototype selection method. We also present a case study with a prototype generation algorithm, showing that our framework is easily extended to other preprocessing paradigms as well.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号