首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Fern  Alan  Givan  Robert 《Machine Learning》2003,53(1-2):71-109
We study resource-limited online learning, motivated by the problem of conditional-branch outcome prediction in computer architecture. In particular, we consider (parallel) time and space-efficient ensemble learners for online settings, empirically demonstrating benefits similar to those shown previously for offline ensembles. Our learning algorithms are inspired by the previously published boosting by filtering framework as well as the offline Arc-x4 boosting-style algorithm. We train ensembles of online decision trees using a novel variant of the ID4 online decision-tree algorithm as the base learner, and show empirical results for both boosting and bagging-style online ensemble methods. Our results evaluate these methods on both our branch prediction domain and online variants of three familiar machine-learning benchmarks. Our data justifies three key claims. First, we show empirically that our extensions to ID4 significantly improve performance for single trees and additionally are critical to achieving performance gains in tree ensembles. Second, our results indicate significant improvements in predictive accuracy with ensemble size for the boosting-style algorithm. The bagging algorithms we tried showed poor performance relative to the boosting-style algorithm (but still improve upon individual base learners). Third, we show that ensembles of small trees are often able to outperform large single trees with the same number of nodes (and similarly outperform smaller ensembles of larger trees that use the same total number of nodes). This makes online boosting particularly useful in domains such as branch prediction with tight space restrictions (i.e., the available real-estate on a microprocessor chip).  相似文献   

2.
Bagging, boosting, rotation forest and random subspace methods are well known re-sampling ensemble methods that generate and combine a diversity of learners using the same learning algorithm for the base-classifiers. Boosting and rotation forest algorithms are considered stronger than bagging and random subspace methods on noise-free data. However, there are strong empirical indications that bagging and random subspace methods are much more robust than boosting and rotation forest in noisy settings. For this reason, in this work we built an ensemble of bagging, boosting, rotation forest and random subspace methods ensembles with 6 sub-classifiers in each one and then a voting methodology is used for the final prediction. We performed a comparison with simple bagging, boosting, rotation forest and random subspace methods ensembles with 25 sub-classifiers, as well as other well known combining methods, on standard benchmark datasets and the proposed technique had better accuracy in most cases.  相似文献   

3.
Experiments with AdaBoost.RT, an improved boosting scheme for regression   总被引:2,自引:0,他引:2  
The application of boosting technique to regression problems has received relatively little attention in contrast to research aimed at classification problems. This letter describes a new boosting algorithm, AdaBoost.RT, for regression problems. Its idea is in filtering out the examples with the relative estimation error that is higher than the preset threshold value, and then following the AdaBoost procedure. Thus, it requires selecting the suboptimal value of the error threshold to demarcate examples as poorly or well predicted. Some experimental results using the M5 model tree as a weak learning machine for several benchmark data sets are reported. The results are compared to other boosting methods, bagging, artificial neural networks, and a single M5 model tree. The preliminary empirical comparisons show higher performance of AdaBoost.RT for most of the considered data sets.  相似文献   

4.
AdaBoost can be derived by sequential minimization of the exponential loss function. It implements the learning process by exponentially reweighting examples according to classification results. However, weights are often too sharply tuned, so that AdaBoost suffers from the nonrobustness and overlearning. Wepropose a new boosting method that is a slight modification of AdaBoost. The loss function is defined by a mixture of the exponential loss and naive error loss functions. As a result, the proposed method incorporates the effect of forgetfulness into AdaBoost. The statistical significance of our method is discussed, and simulations are presented for confirmation.  相似文献   

5.
关于AdaBoost有效性的分析   总被引:13,自引:1,他引:12  
在机器学习领域,弱学习定理指明只要能够寻找到比随机猜测略好的弱学习算法,则可以通过一定方式,构造出任意误差精度的强学习算法.基于该理论下最常用的方法有AdaBoost和Bagging.AdaBoost和Bagging的误差分析还不统一;AdaBoost使用的训练误差并不是真正的训练误差,而是基于样本权值的一种误差,是否合理需要解释;确保AdaBoost有效的条件也需要有直观的解释以便使用.在调整Bagging错误率并采取加权投票法后,对AdaBoost和Bagging的算法流程和误差分析进行了统一,在基于大数定理对弱学习定理进行解释与证明基础之上,对AdaBoost的有效性进行了分析.指出AdaBoost采取的样本权值调整策略其目的是确保正确分类样本分布的均匀性,其使用的训练误差与真正的训练误差概率是相等的,并指出了为确保AdaBoost的有效性在训练弱学习算法时需要遵循的原则,不仅对AdaBoost的有效性进行了解释,还为构造新集成学习算法提供了方法.还仿照AdaBoost对Bagging的训练集选取策略提出了一些建议.  相似文献   

6.
We study boosting algorithms from a new perspective. We show that the Lagrange dual problems of l?-norm-regularized AdaBoost, LogitBoost, and soft-margin LPBoost with generalized hinge loss are all entropy maximization problems. By looking at the dual problems of these boosting algorithms, we show that the success of boosting algorithms can be understood in terms of maintaining a better margin distribution by maximizing margins and at the same time controlling the margin variance. We also theoretically prove that approximately, l?-norm-regularized AdaBoost maximizes the average margin, instead of the minimum margin. The duality formulation also enables us to develop column-generation-based optimization algorithms, which are totally corrective. We show that they exhibit almost identical classification results to that of standard stagewise additive boosting algorithms but with much faster convergence rates. Therefore, fewer weak classifiers are needed to build the ensemble using our proposed optimization technique.  相似文献   

7.
非对称AdaBoost算法及其在目标检测中的应用   总被引:1,自引:0,他引:1  
葛俊锋  罗予频 《自动化学报》2009,35(11):1403-1409
针对目标检测中的非对称分类问题,在分析现有的由离散AdaBoost算法扩展得到的代价敏感(即非对称)学习算法的基础上,提出了以三个不同的非对称错误率上界为核心的推导非对称AdaBoost算法的统一框架. 在该框架下, 不仅现有离散型非对称AdaBoost算法之间的关系非常清晰, 而且其中不符合理论推导的部分可以很容易得到修正. 同时, 利用不同的优化方法, 最小化这三个不同上界, 推出了连续型AdaBoost算法的非对称扩展(用Asym-Real AdaBoost和Asym-Gentle AdaBoost 表示). 新的算法不仅在弱分类器组合系数的计算上比现有离散型算法更加方便, 而且实验证明, 在人脸检测和行人检测两方面都获得了比传统对称AdaBoost算法和离散型非对称AdaBoost算法更好的性能.  相似文献   

8.
Rotation Forest, an effective ensemble classifier generation technique, works by using principal component analysis (PCA) to rotate the original feature axes so that different training sets for learning base classifiers can be formed. This paper presents a variant of Rotation Forest, which can be viewed as a combination of Bagging and Rotation Forest. Bagging is used here to inject more randomness into Rotation Forest in order to increase the diversity among the ensemble membership. The experiments conducted with 33 benchmark classification data sets available from the UCI repository, among which a classification tree is adopted as the base learning algorithm, demonstrate that the proposed method generally produces ensemble classifiers with lower error than Bagging, AdaBoost and Rotation Forest. The bias–variance analysis of error performance shows that the proposed method improves the prediction error of a single classifier by reducing much more variance term than the other considered ensemble procedures. Furthermore, the results computed on the data sets with artificial classification noise indicate that the new method is more robust to noise and kappa-error diagrams are employed to investigate the diversity–accuracy patterns of the ensemble classifiers.  相似文献   

9.
分类器线性组合的有效性和最佳组合问题的研究   总被引:8,自引:0,他引:8  
通过多个分类器的组合来提升分类精度是机器学习领域主要研究内容,弱学习定理保证了这种研究的可行性.分类器的线性组合,也即加权投票.是最常用的组合方法,其中广泛使用的AdaBoost算法和Bagging算法就是采取的加权投票.分类器组合的有效性问题以及最佳组合问题均需要解决.在各单个分类器互不相关和分类器数量较多条件下,得到了分类器组合有效的组合系数选取条件以及最佳组合系数公式,给出了组合分类器的误差分析.结论表明,当各分类器分类错误率有统一的边界时,即使采取简单投票,也能确保组合分类器分类错误率随分类器个数增加而以指数级降低.在此基础上,仿照AdaBoost算法,提出了一些新的集成学习算法.特别是提出了直接面向组合分类器分类精度快速提升这一目标的集成学习算法.分析并指出了这种算法的合理性和科学性.它是对传统的以错误率最低为目标的分类器训练与选取方法的延伸和扩展.从另一个角度证明了AdaBOOSt算法中采用的组合不仅有效.而且在一定条件下等效于最佳组合.针对多分类问题.得到了与二分类问题类似的分类器组合理论与结论.包括组合有效条件、最佳组合、误差估计等.还对AdaBoOSt算法进行了一定的扩展.  相似文献   

10.
Induction of decision trees is one of the most successful approaches to supervised machine learning. Branching programs are a generalization of decision trees and, by the boosting analysis, exponentially more efficiently learnable than decision trees. However, this advantage has not been seen to materialize in experiments. Decision trees are easy to simplify using pruning. Reduced error pruning is one of the simplest decision tree pruning algorithms. For branching programs no pruning algorithms are known. In this paper we prove that reduced error pruning of branching programs is infeasible. Finding the optimal pruning of a branching program with respect to a set of pruning examples that is separate from the set of training examples is NP-complete. Because of this intractability result, we have to consider approximating reduced error pruning. Unfortunately, it turns out that even finding an approximate solution of arbitrary accuracy is computationally infeasible. In particular, reduced error pruning of branching programs is APX-hard. Our experiments show that, despite the negative theoretical results, heuristic pruning of branching programs can reduce their size without significantly altering the accuracy.  相似文献   

11.
机器学习中的隐私保护问题是目前信息安全领域的研究热点之一。针对隐私保护下的分类问题,该文提出一种基于差分隐私保护的AdaBoost集成分类算法:CART-DPsAdaBoost (CART-Differential Privacy structure of AdaBoost)。算法在Boosting过程中结合Bagging的基本思想以增加采样本的多样性,在基于随机子空间算法的特征扰动中利用指数机制选择连续特征分裂点,利用Gini指数选择最佳离散特征,构造CART提升树作为集成学习的基分类器,并根据Laplace机制添加噪声。在整个算法过程中合理分配隐私预算以满足差分隐私保护需求。在实验中分析不同树深度下隐私水平对集成分类模型的影响并得出最优树深值和隐私预算域。相比同类算法,该方法无需对数据进行离散化预处理,用Adult、Census Income两个数据集实验结果表明,模型在兼顾隐私性和可用性的同时具有较好的分类准确率。此外,样本扰动和特征扰动两类随机性方案的引入能有效处理大规模、高维度数据分类问题。  相似文献   

12.
提出了一种新的Boosting算法LAdaBoost。LAdaBoost算法利用局部错误率更新样本被选用于训练下一个分类器的概率,当对一个新的样本进行分类时,考虑了该样本与其邻域内的每个训练样本的近似度;另外,提出了有效邻域的概念。根据不同的组合方法,得到了两种LAdaBoost算法,即LAdaBoost-1和LAdaBoost-2。在UCI上部分实验数据集的实验结果表明,LAdaBoost算法比AdaBoost和Bagging算法更有效,且鲁棒性更好。  相似文献   

13.
基于分类问题的选择性集成学习研究*   总被引:1,自引:0,他引:1  
陈凯 《计算机应用研究》2009,26(7):2457-2459
提出了一种应用于分类问题,以分类回归树为基学习器,并综合了AdaBoost.M1和Bagging算法特点,利用变相似度聚类技术和贪婪算法来进行选择性集成学习的算法——SECAdaBoostBagging Trees,并将其与几种常用的机器学习算法比较研究得出,该算法往往比其他算法具有更好的泛化性能和更高的运行效率。  相似文献   

14.
AdaBoost.M2 and AdaBoost.MH are boosting algorithms for learning from multiclass datasets. They have received less attention than other boosting algorithms because they require base classifiers that can handle the pseudoloss or Hamming loss, respectively. The difficulty with these loss functions is that each example is associated with k weights, where k is the number of classes. We address this issue by transforming an m-example dataset with k weights per example into a dataset with km examples and one weight per example. Minimising error on the transformed dataset is equivalent to minimising loss on the original dataset. Resampling the transformed dataset can be used for time efficiency and base classifiers that cannot handle weighted examples. We empirically apply the transformation on several multiclass datasets using naive Bayes and decision trees as base classifiers. Our experiment shows that it is competitive with AdaBoost.ECC, a boosting algorithm using output coding.  相似文献   

15.
Learning decision tree for ranking   总被引:4,自引:3,他引:1  
Decision tree is one of the most effective and widely used methods for classification. However, many real-world applications require instances to be ranked by the probability of class membership. The area under the receiver operating characteristics curve, simply AUC, has been recently used as a measure for ranking performance of learning algorithms. In this paper, we present two novel class probability estimation algorithms to improve the ranking performance of decision tree. Instead of estimating the probability of class membership using simple voting at the leaf where the test instance falls into, our algorithms use similarity-weighted voting and naive Bayes. We design empirical experiments to verify that our new algorithms significantly outperform the recent decision tree ranking algorithm C4.4 in terms of AUC.
Liangxiao JiangEmail:
  相似文献   

16.
While rule based control (RBC) is current practice in most building automation systems that issue discrete control signals, recent simulation studies suggest that advanced, optimization based control methods such as hybrid model predictive control (HMPC) can potentially outperform RBC in terms of energy efficiency and occupancy comfort. However, HMPC requires a more complex IT infrastructure and numerical optimization in the loop, which makes commissioning, operation of the building, and error handling significantly more involved than in the rule based setting. In this paper, we suggest an automated RBC synthesis procedure for binary decisions that extracts prevalent information from simulation data with HMPC controllers. The result is a set of simple decision rules that preserves much of the control performance of HMPC. The methods are based on standard machine learning algorithms, in particular support vector machines (SVMs) and adaptive boosting (AdaBoost). We consider also the ranking and selection of measurements which are used for a decision and show that this feature selection is useful in both complexity reduction and reduction of investment costs by pruning unnecessary sensors. The suggested methods are evaluated in simulation for six different case studies and shown to maintain the performance of HMPC despite a tremendous reduction in complexity.  相似文献   

17.
We consider schemes for enacting task share changes—a process called reweighting—on real-time multiprocessor platforms. Our particular focus is reweighting schemes that are deployed in environments in which tasks may frequently request significant share changes. Prior work has shown that fair scheduling algorithms are capable of reweighting tasks with minimal allocation error and that partitioning-based scheduling algorithms can reweight tasks with better average-case performance, but greater error. However, preemption and migration overheads can be high in fair schemes. In this paper, we consider the question of whether non-fair, earliest-deadline-first ( $\mathsf{EDF}$ ) global scheduling techniques can improve the accuracy of reweighting relative to partitioning-based schemes and provide improved average-case performance relative to fair-scheduled systems. Our conclusion is that, for soft real-time systems, global $\mathsf{EDF}$ schemes provide a good mix of accuracy and average-case performance.  相似文献   

18.
Boosting视角     
AdaBoost是Boosting家族中的最基础的代表算法。本文主要介绍了AdaBoost的泛化错误分析及其与结构风险最小化和VC维、支持向量机及margin理论的关系,并从游戏理论和统计学视点分别对AdaBoost进行了理解和解释,以期提供Boosting的一个较为全面的视角。  相似文献   

19.
The decision tree method has grown fast in the past two decades and its performance in classification is promising. The tree-based ensemble algorithms have been used to improve the performance of an individual tree. In this study, we compared four basic ensemble methods, that is, bagging tree, random forest, AdaBoost tree and AdaBoost random tree in terms of the tree size, ensemble size, band selection (BS), random feature selection, classification accuracy and efficiency in ecological zone classification in Clark County, Nevada, through multi-temporal multi-source remote-sensing data. Furthermore, two BS schemes based on feature importance of the bagging tree and AdaBoost tree were also considered and compared. We conclude that random forest or AdaBoost random tree can achieve accuracies at least as high as bagging tree or AdaBoost tree with higher efficiency; and although bagging tree and random forest can be more efficient, AdaBoost tree and AdaBoost random tree can provide a significantly higher accuracy. All ensemble methods provided significantly higher accuracies than the single decision tree. Finally, our results showed that the classification accuracy could increase dramatically by combining multi-temporal and multi-source data set.  相似文献   

20.
Multi-Class Learning by Smoothed Boosting   总被引:1,自引:0,他引:1  
AdaBoost.OC has been shown to be an effective method in boosting “weak” binary classifiers for multi-class learning. It employs the Error-Correcting Output Code (ECOC) method to convert a multi-class learning problem into a set of binary classification problems, and applies the AdaBoost algorithm to solve them efficiently. One of the main drawbacks with the AdaBoost.OC algorithm is that it is sensitive to the noisy examples and tends to overfit training examples when they are noisy. In this paper, we propose a new boosting algorithm, named “MSmoothBoost”, which introduces a smoothing mechanism into the boosting procedure to explicitly address the overfitting problem with AdaBoost.OC. We proved the bounds for both the empirical training error and the marginal training error of the proposed boosting algorithm. Empirical studies with seven UCI datasets and one real-world application have indicated that the proposed boosting algorithm is more robust and effective than the AdaBoost.OC algorithm for multi-class learning. Editor: Nicolo Cesa-Bianchi  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号