首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到10条相似文献,搜索用时 156 毫秒
1.
Fern  Alan  Givan  Robert 《Machine Learning》2003,53(1-2):71-109
We study resource-limited online learning, motivated by the problem of conditional-branch outcome prediction in computer architecture. In particular, we consider (parallel) time and space-efficient ensemble learners for online settings, empirically demonstrating benefits similar to those shown previously for offline ensembles. Our learning algorithms are inspired by the previously published boosting by filtering framework as well as the offline Arc-x4 boosting-style algorithm. We train ensembles of online decision trees using a novel variant of the ID4 online decision-tree algorithm as the base learner, and show empirical results for both boosting and bagging-style online ensemble methods. Our results evaluate these methods on both our branch prediction domain and online variants of three familiar machine-learning benchmarks. Our data justifies three key claims. First, we show empirically that our extensions to ID4 significantly improve performance for single trees and additionally are critical to achieving performance gains in tree ensembles. Second, our results indicate significant improvements in predictive accuracy with ensemble size for the boosting-style algorithm. The bagging algorithms we tried showed poor performance relative to the boosting-style algorithm (but still improve upon individual base learners). Third, we show that ensembles of small trees are often able to outperform large single trees with the same number of nodes (and similarly outperform smaller ensembles of larger trees that use the same total number of nodes). This makes online boosting particularly useful in domains such as branch prediction with tight space restrictions (i.e., the available real-estate on a microprocessor chip).  相似文献   

2.
Bagging, boosting, rotation forest and random subspace methods are well known re-sampling ensemble methods that generate and combine a diversity of learners using the same learning algorithm for the base-classifiers. Boosting and rotation forest algorithms are considered stronger than bagging and random subspace methods on noise-free data. However, there are strong empirical indications that bagging and random subspace methods are much more robust than boosting and rotation forest in noisy settings. For this reason, in this work we built an ensemble of bagging, boosting, rotation forest and random subspace methods ensembles with 6 sub-classifiers in each one and then a voting methodology is used for the final prediction. We performed a comparison with simple bagging, boosting, rotation forest and random subspace methods ensembles with 25 sub-classifiers, as well as other well known combining methods, on standard benchmark datasets and the proposed technique had better accuracy in most cases.  相似文献   

3.
Experiments with AdaBoost.RT, an improved boosting scheme for regression   总被引:2,自引:0,他引:2  
The application of boosting technique to regression problems has received relatively little attention in contrast to research aimed at classification problems. This letter describes a new boosting algorithm, AdaBoost.RT, for regression problems. Its idea is in filtering out the examples with the relative estimation error that is higher than the preset threshold value, and then following the AdaBoost procedure. Thus, it requires selecting the suboptimal value of the error threshold to demarcate examples as poorly or well predicted. Some experimental results using the M5 model tree as a weak learning machine for several benchmark data sets are reported. The results are compared to other boosting methods, bagging, artificial neural networks, and a single M5 model tree. The preliminary empirical comparisons show higher performance of AdaBoost.RT for most of the considered data sets.  相似文献   

4.
关于AdaBoost有效性的分析   总被引:13,自引:1,他引:12  
在机器学习领域,弱学习定理指明只要能够寻找到比随机猜测略好的弱学习算法,则可以通过一定方式,构造出任意误差精度的强学习算法.基于该理论下最常用的方法有AdaBoost和Bagging.AdaBoost和Bagging的误差分析还不统一;AdaBoost使用的训练误差并不是真正的训练误差,而是基于样本权值的一种误差,是否合理需要解释;确保AdaBoost有效的条件也需要有直观的解释以便使用.在调整Bagging错误率并采取加权投票法后,对AdaBoost和Bagging的算法流程和误差分析进行了统一,在基于大数定理对弱学习定理进行解释与证明基础之上,对AdaBoost的有效性进行了分析.指出AdaBoost采取的样本权值调整策略其目的是确保正确分类样本分布的均匀性,其使用的训练误差与真正的训练误差概率是相等的,并指出了为确保AdaBoost的有效性在训练弱学习算法时需要遵循的原则,不仅对AdaBoost的有效性进行了解释,还为构造新集成学习算法提供了方法.还仿照AdaBoost对Bagging的训练集选取策略提出了一些建议.  相似文献   

5.
AdaBoost can be derived by sequential minimization of the exponential loss function. It implements the learning process by exponentially reweighting examples according to classification results. However, weights are often too sharply tuned, so that AdaBoost suffers from the nonrobustness and overlearning. Wepropose a new boosting method that is a slight modification of AdaBoost. The loss function is defined by a mixture of the exponential loss and naive error loss functions. As a result, the proposed method incorporates the effect of forgetfulness into AdaBoost. The statistical significance of our method is discussed, and simulations are presented for confirmation.  相似文献   

6.
We study boosting algorithms from a new perspective. We show that the Lagrange dual problems of l?-norm-regularized AdaBoost, LogitBoost, and soft-margin LPBoost with generalized hinge loss are all entropy maximization problems. By looking at the dual problems of these boosting algorithms, we show that the success of boosting algorithms can be understood in terms of maintaining a better margin distribution by maximizing margins and at the same time controlling the margin variance. We also theoretically prove that approximately, l?-norm-regularized AdaBoost maximizes the average margin, instead of the minimum margin. The duality formulation also enables us to develop column-generation-based optimization algorithms, which are totally corrective. We show that they exhibit almost identical classification results to that of standard stagewise additive boosting algorithms but with much faster convergence rates. Therefore, fewer weak classifiers are needed to build the ensemble using our proposed optimization technique.  相似文献   

7.
非对称AdaBoost算法及其在目标检测中的应用   总被引:1,自引:0,他引:1  
葛俊锋  罗予频 《自动化学报》2009,35(11):1403-1409
针对目标检测中的非对称分类问题,在分析现有的由离散AdaBoost算法扩展得到的代价敏感(即非对称)学习算法的基础上,提出了以三个不同的非对称错误率上界为核心的推导非对称AdaBoost算法的统一框架. 在该框架下, 不仅现有离散型非对称AdaBoost算法之间的关系非常清晰, 而且其中不符合理论推导的部分可以很容易得到修正. 同时, 利用不同的优化方法, 最小化这三个不同上界, 推出了连续型AdaBoost算法的非对称扩展(用Asym-Real AdaBoost和Asym-Gentle AdaBoost 表示). 新的算法不仅在弱分类器组合系数的计算上比现有离散型算法更加方便, 而且实验证明, 在人脸检测和行人检测两方面都获得了比传统对称AdaBoost算法和离散型非对称AdaBoost算法更好的性能.  相似文献   

8.
Rotation Forest, an effective ensemble classifier generation technique, works by using principal component analysis (PCA) to rotate the original feature axes so that different training sets for learning base classifiers can be formed. This paper presents a variant of Rotation Forest, which can be viewed as a combination of Bagging and Rotation Forest. Bagging is used here to inject more randomness into Rotation Forest in order to increase the diversity among the ensemble membership. The experiments conducted with 33 benchmark classification data sets available from the UCI repository, among which a classification tree is adopted as the base learning algorithm, demonstrate that the proposed method generally produces ensemble classifiers with lower error than Bagging, AdaBoost and Rotation Forest. The bias–variance analysis of error performance shows that the proposed method improves the prediction error of a single classifier by reducing much more variance term than the other considered ensemble procedures. Furthermore, the results computed on the data sets with artificial classification noise indicate that the new method is more robust to noise and kappa-error diagrams are employed to investigate the diversity–accuracy patterns of the ensemble classifiers.  相似文献   

9.
分类器线性组合的有效性和最佳组合问题的研究   总被引:8,自引:0,他引:8  
通过多个分类器的组合来提升分类精度是机器学习领域主要研究内容,弱学习定理保证了这种研究的可行性.分类器的线性组合,也即加权投票.是最常用的组合方法,其中广泛使用的AdaBoost算法和Bagging算法就是采取的加权投票.分类器组合的有效性问题以及最佳组合问题均需要解决.在各单个分类器互不相关和分类器数量较多条件下,得到了分类器组合有效的组合系数选取条件以及最佳组合系数公式,给出了组合分类器的误差分析.结论表明,当各分类器分类错误率有统一的边界时,即使采取简单投票,也能确保组合分类器分类错误率随分类器个数增加而以指数级降低.在此基础上,仿照AdaBoost算法,提出了一些新的集成学习算法.特别是提出了直接面向组合分类器分类精度快速提升这一目标的集成学习算法.分析并指出了这种算法的合理性和科学性.它是对传统的以错误率最低为目标的分类器训练与选取方法的延伸和扩展.从另一个角度证明了AdaBOOSt算法中采用的组合不仅有效.而且在一定条件下等效于最佳组合.针对多分类问题.得到了与二分类问题类似的分类器组合理论与结论.包括组合有效条件、最佳组合、误差估计等.还对AdaBoOSt算法进行了一定的扩展.  相似文献   

10.
Induction of decision trees is one of the most successful approaches to supervised machine learning. Branching programs are a generalization of decision trees and, by the boosting analysis, exponentially more efficiently learnable than decision trees. However, this advantage has not been seen to materialize in experiments. Decision trees are easy to simplify using pruning. Reduced error pruning is one of the simplest decision tree pruning algorithms. For branching programs no pruning algorithms are known. In this paper we prove that reduced error pruning of branching programs is infeasible. Finding the optimal pruning of a branching program with respect to a set of pruning examples that is separate from the set of training examples is NP-complete. Because of this intractability result, we have to consider approximating reduced error pruning. Unfortunately, it turns out that even finding an approximate solution of arbitrary accuracy is computationally infeasible. In particular, reduced error pruning of branching programs is APX-hard. Our experiments show that, despite the negative theoretical results, heuristic pruning of branching programs can reduce their size without significantly altering the accuracy.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号