首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Cost Complexity-Based Pruning of Ensemble Classifiers   总被引:1,自引:0,他引:1  
In this paper we study methods that combine multiple classification models learned over separate data sets. Numerous studies posit that such approaches provide the means to efficiently scale learning to large data sets, while also boosting the accuracy of individual classifiers. These gains, however, come at the expense of an increased demand for run-time system resources. The final ensemble meta-classifier may consist of a large collection of base classifiers that require increased memory resources while also slowing down classification throughput. Here, we describe an algorithm for pruning (i.e., discarding a subset of the available base classifiers) the ensemble meta-classifier as a means to reduce its size while preserving its accuracy and we present a technique for measuring the trade-off between predictive performance and available run-time system resources. The algorithm is independent of the method used initially when computing the meta-classifier. It is based on decision tree pruning methods and relies on the mapping of an arbitrary ensemble meta-classifier to a decision tree model. Through an extensive empirical study on meta-classifiers computed over two real data sets, we illustrate our pruning algorithm to be a robust and competitive approach to discarding classification models without degrading the overall predictive performance of the smaller ensemble computed over those that remain after pruning. Received 30 August 2000 / Revised 7 March 2001 / Accepted in revised form 21 May 2001  相似文献   

2.
传统的数据分类算法多是基于平衡的数据集创建,对不平衡数据分类时性能下降,而实践表明组合选择能有效提高算法在不平衡数据集上的分类性能。为此,从组合选择的角度考虑不平衡类学习问题,提出一种新的组合剪枝方法,用于提升组合分类器在不平衡数据上的分类性能。使用Bagging建立分类器库,直接用正类(少数类)实例作为剪枝集,并通过MBM指标和剪枝集,从分类器库中选择一个最优或次优子组合分类器作为目标分类器,用于预测待分类实例。在12个UCI数据集上的实验结果表明,与EasyEnsemble、Bagging和C4.5算法相比,该方法不但能大幅提升组合分类器在正类上的召回率,而且还能提升总体准确率。  相似文献   

3.
选择性集成是当前机器学习领域的研究热点之一。由于选择性集成属于NP"难"问题,人们多利用启发式方法将选择性集成转化为其他问题来求得近似最优解,因为各种算法的出发点和描述角度各不相同,现有的大量选择性集成算法显得繁杂而没有规律。为便于研究人员迅速了解和应用本领域的最新进展,本文根据选择过程中核心策略的特征将选择性集成算法分为四类,即迭代优化法、排名法、分簇法、模式挖掘法;然后利用UCI数据库的20个常用数据集,从预测性能、选择时间、结果集成分类器大小三个方面对这些典型算法进行了实验比较;最后总结了各类方法的优缺点,并展望了选择性集成的未来研究重点。  相似文献   

4.
This paper proposes a new Modified Backtracking Ensemble Pruning algorithm (ModEnPBT), which is based upon the design idea of our previously proposed Ensemble Pruning via Backtracking algorithm (EnPBT), and however, aiming at overcoming its drawback of redundant solution space definition. Solution space of ModEnPBT is compact with no repeated solution vectors, therefore it possesses relatively higher searching efficiency compared with EnPBT algorithm. ModEnPBT still belongs to the category of Backtracking algorithm, which can systematically search for the solutions of a problem in a manner of depth-first, which is suitable for solving all those large-scale combinatorial optimization problems. Experimental results on three benchmark classification tasks demonstrate the validity and effectiveness of the proposed ModEnPBT.  相似文献   

5.
Several pruning strategies that can be used to reduce the size and increase the accuracy of bagging ensembles are analyzed. These heuristics select subsets of complementary classifiers that, when combined, can perform better than the whole ensemble. The pruning methods investigated are based on modifying the order of aggregation of classifiers in the ensemble. In the original bagging algorithm, the order of aggregation is left unspecified. When this order is random, the generalization error typically decreases as the number of classifiers in the ensemble increases. If an appropriate ordering for the aggregation process is devised, the generalization error reaches a minimum at intermediate numbers of classifiers. This minimum lies below the asymptotic error of bagging. Pruned ensembles are obtained by retaining a fraction of the classifiers in the ordered ensemble. The performance of these pruned ensembles is evaluated in several benchmark classification tasks under different training conditions. The results of this empirical investigation show that ordered aggregation can be used for the efficient generation of pruned ensembles that are competitive, in terms of performance and robustness of classification, with computationally more costly methods that directly select optimal or near-optimal subensembles.  相似文献   

6.
集成学习/选择性集成是当前机器学习领域的研究热点,但是大部分发表的相关数据都是基于未公开的个人实验,这种模式一方面由于大量的重复工作而降低了研究工作的效率,另一方面也对集成学习走向实用化造成负面影响.本文从减轻研究工作中实验部分工作量、提升实验的可重复性、减少不同实验的结论差异和推动选择性集成技术走向实用化的角度出发,阐述了设计一个选择性集成研究和开发平台所需要考虑的问题以及系统的结构组成,并以EPP(Ensemble Pruning Platform)为例介绍了利用C++语言实现一个选择性集成开发平台的方法和关键流程.  相似文献   

7.
主要目的是寻找到一种Bagging的快速修剪方法,以缩小算法占用的存储空间、提高运算的速度和实现提高分类精度的潜力.传统的选择性集成方法研究的重点是基学习器之间的差异化,从同质化的角度采研究这一问题,提出了一种全新的选择性集成思路.通过选择基学习器集合中的最差者来对Bagging集成进行快速层次修剪,获得了一种学习速度接近Bagging性能在其基础上得到提高的新算法.新算法的训练时间明显小于GASEN而性能与其相近.该算法同时还保留了与Bagging相同的并行处理能力.  相似文献   

8.
本文主要目的是寻找到Bagging 的一种快速修剪方法,以缩小算法占用的存储空间、提高运算速度和 实现提高分类精度的潜力;还提出一种直接计算基学习器差异度的新选择性集成思想.选择出基学习器集合中对提 升其余基学习器差异度能力最强者进行删除,通过层次修剪来加速这一算法.在不影响性能的基础上,新算法能够 大幅度缩小Bagging 的集成规模;新算法还支持并行计算,其进行选择性集成的速度明显优于GASEN.本文还给出 了集成学习分类任务的误差上界  相似文献   

9.
从海量的基因微阵列数据中提取出有价值的信息是生物信息学的研究热点.基因微阵列数据具有高维度、小样本和高冗余的特性.因此,提出一种基于相交邻域粗糙集的基因选择方法,挑选出关键基因用于对微阵列数据进行分类.首先利用pathway知识进行基因初步选择,每个pathway单元对应一个基因子集,然后采用基于粗糙集的属性约简方法筛选出无冗余的关键基因.由于pathway知识单元的数量较多,对应生成大量的基分类器,为了进一步提高基分类器之间的差异性和集成的效率,对基分类器进行选择是十分必要的.近邻传播聚类不需要提前设定聚簇数量和起始点并且可以更快速、精确地进行聚类.因此,使用近邻传播聚类方法对基分类器进行分组,产生差异性较大的聚簇,再从每个簇中选择一个分类器构建集成分类器.在拟南芥的生物和非生物胁迫响应相关的微阵列数据集上的实验结果表明:在准确率方面,提出的方法与现有的集成方法相比最多可以提高12%.  相似文献   

10.
时间序列预测(TSP)在机器学习中是一个重要问题.论文提出了一种基于核密度估计(KDE)的集成增量学习方法,用于时间序列的预测问题.算法首先根据集成学习的原理产生基学习器池.然后用基学习器池对预测样本的输出值得到核密度估计,并用得到的核密度估计来剪枝基学习器池.得到最终的剪枝集成系统后,用该剪枝集成系统来预测样本的输出.最后,算法根据样本在动态选择集上筛选出的最近邻集合进行增量学习.在数据集IAP,ICS,MCD上的试验结果表明,提出的时间序列预测算法和当前流行的算法相比效果有一定程度的提高.  相似文献   

11.
12.
13.
Ensemble selection, which aims to select a proper subset of the original whole ensemble, can be seen as a combinatorial optimization problem, and usually can achieve a pruned ensemble with better performance than the original one. Ensemble selection by greedy methods has drawn a lot of attention, and many greedy ensemble selection algorithms have been proposed, many of which focus on the design of a new evaluation measure or on the study about different search directions. It is well accepted that diversity plays a crucial role in ensemble selection methods. Many evaluation measures based on diversity have been proposed and have achieved a good success. However, most of the existing researches have neglected the substantial local optimal problem of greedy methods, which is just the central issue addressed in this paper, where a new Ensemble Selection (GraspEnS) algorithm based on Greedy Randomized Adaptive Search Procedure (GRASP) is proposed. The typical greedy ensemble selection approach is improved by the random factor incorporated into GraspEnS. Moreover, the GraspEnS algorithm realizes multi-start searching and appropriately expands the search range of the typical greedy approaches. Experimental results demonstrate that the newly devised GraspEnS algorithm is able to achieve a final pruned subensemble with comparable or better performance compared with its competitors.  相似文献   

14.
The Image Foresting Transform (IFT) is a tool for the design of image processing operators based on connectivity, which reduces image processing problems into an optimum-path forest problem in a graph derived from the image. A new image operator is presented, which solves segmentation by pruning trees of the forest. An IFT is applied to create an optimum-path forest whose roots are seed pixels, selected inside a desired object. In this forest, object and background are connected by optimum paths (leaking paths), which cross the object’s boundary through its “most weakly connected” parts (leaking pixels). These leaking pixels are automatically identified and their subtrees are eliminated, such that the remaining forest defines the object. Tree pruning runs in linear time, is extensible to multidimensional images, is free of ad hoc parameters, and requires only internal seeds, with little interference from the heterogeneity of the background. These aspects favor solutions for automatic segmentation. We present a formal definition of the obtained objects, algorithms, sufficient conditions for tree pruning, and two applications involving automatic segmentation: 3D MR-image segmentation of the human brain and image segmentation of license plates. Given that its most competitive approach is the watershed transform by markers, we also include a comparative analysis between them.  相似文献   

15.
Learning appropriate statistical models is a fundamental data analysis task which has been the topic of continuing interest. Recently, finite Dirichlet mixture models have proved to be an effective and flexible model learning technique in several machine learning and data mining applications. In this article, the problem of learning and selecting finite Dirichlet mixture models is addressed using an expectation propagation (EP) inference framework. Within the proposed EP learning method, for finite mixture models, all the involved parameters and the model complexity (i.e. the number of mixture components), can be evaluated simultaneously in a single optimization framework. Extensive simulations using synthetic data along with two challenging real-world applications involving automatic image annotation and human action videos categorization demonstrate that our approach is able to achieve better results than comparable techniques.  相似文献   

16.
Spatial verification methods permit geometrically stable image matching, but still involve a difficult trade-off between robustness as regards incorrect rejection of true correspondences and discriminative power in terms of mismatches. To address this issue, we ask whether an ensemble of weak geometric constraints that correlates with visual similarity only slightly better than a bag-of-visual-words model performs better than a single strong constraint. We consider a family of spatial verification methods and decompose them into fundamental constraints imposed on pairs of feature correspondences. Encompassing such constraints leads us to propose a new method, which takes the best of existing techniques and functions as a unified Ensemble of pAirwise GEometric Relations (EAGER), in terms of both spatial contexts and between-image transformations. We also introduce a novel and robust reranking method, in which the object instances localized by EAGER in high-ranked database images are reissued as new queries. EAGER is extended to develop a smoothness constraint where the similarity between the optimized ranking scores of two instances should be maximally consistent with their geometrically constrained similarity. Reranking is newly formulated as two label propagation problems: one is to assess the confidence of new queries and the other to aggregate new independently executed retrievals. Extensive experiments conducted on four datasets show that EAGER and our reranking method outperform most of their state-of-the-art counterparts, especially when large-scale visual vocabularies are used.  相似文献   

17.
基于改进BP网络的广义预测控制快速算法   总被引:2,自引:0,他引:2  
提出了一种改进的全局寻优自适应快速BP算法并把其应用于广义预测控制 (GPC)算法中 ,解决了限制GPC实时控制的快速性问题。仿真结果表明其有效性  相似文献   

18.
Pruning algorithms-a survey   总被引:58,自引:0,他引:58  
A rule of thumb for obtaining good generalization in systems trained by examples is that one should use the smallest system that will fit the data. Unfortunately, it usually is not obvious what size is best; a system that is too small will not be able to learn the data while one that is just big enough may learn very slowly and be very sensitive to initial conditions and learning parameters. This paper is a survey of neural network pruning algorithms. The approach taken by the methods described here is to train a network that is larger than necessary and then remove the parts that are not needed.  相似文献   

19.
Ensemble tracking   总被引:4,自引:0,他引:4  
We consider tracking as a binary classification problem, where an ensemble of weak classifiers is trained online to distinguish between the object and the background. The ensemble of weak classifiers is combined into a strong classifier using AdaBoost. The strong classifier is then used to label pixels in the next frame as either belonging to the object or the background, giving a confidence map. The peak of the map and, hence, the new position of the object, is found using mean shift. Temporal coherence is maintained by updating the ensemble with new weak classifiers that are trained online during tracking. We show a realization of this method and demonstrate it on several video sequences  相似文献   

20.
The technology space that includes ubiquitous computing, information appliances, and pervasive and situated systems is the subject of intense interest among the research community. In this article, we examine some of the principles implicit in this space and introduce the idea of "ensemble computing" to describe technologies that extend this space.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号