首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Schapire and Singer's improved version of AdaBoost for handling weak hypotheses with confidence rated predictions represents an important advance in the theory and practice of boosting. Its success results from a more efficient use of information in weak hypotheses during updating. Instead of simple binary voting a weak hypothesis is allowed to vote for or against a classification with a variable strength or confidence. The Pool Adjacent Violators (PAV) algorithm is a method for converting a score into a probability. We show how PAV may be applied to a weak hypothesis to yield a new weak hypothesis which is in a sense an ideal confidence rated prediction and that this leads to an optimal updating for AdaBoost. The result is a new algorithm which we term PAV-AdaBoost. We give several examples illustrating problems for which this new algorithm provides advantages in performance. Editor: Robert Schapire  相似文献   

2.
AdaBoost.M2 and AdaBoost.MH are boosting algorithms for learning from multiclass datasets. They have received less attention than other boosting algorithms because they require base classifiers that can handle the pseudoloss or Hamming loss, respectively. The difficulty with these loss functions is that each example is associated with k weights, where k is the number of classes. We address this issue by transforming an m-example dataset with k weights per example into a dataset with km examples and one weight per example. Minimising error on the transformed dataset is equivalent to minimising loss on the original dataset. Resampling the transformed dataset can be used for time efficiency and base classifiers that cannot handle weighted examples. We empirically apply the transformation on several multiclass datasets using naive Bayes and decision trees as base classifiers. Our experiment shows that it is competitive with AdaBoost.ECC, a boosting algorithm using output coding.  相似文献   

3.
Boosting neural networks   总被引:15,自引:0,他引:15  
Boosting is a general method for improving the performance of learning algorithms. A recently proposed boosting algorithm, AdaBoost, has been applied with great success to several benchmark machine learning problems using mainly decision trees as base classifiers. In this article we investigate whether AdaBoost also works as well with neural networks, and we discuss the advantages and drawbacks of different versions of the AdaBoost algorithm. In particular, we compare training methods based on sampling the training set and weighting the cost function. The results suggest that random resampling of the training data is not the main explanation of the success of the improvements brought by AdaBoost. This is in contrast to bagging, which directly aims at reducing variance and for which random resampling is essential to obtain the reduction in generalization error. Our system achieves about 1.4% error on a data set of on-line handwritten digits from more than 200 writers. A boosted multilayer network achieved 1.5% error on the UCI letters and 8.1% error on the UCI satellite data set, which is significantly better than boosted decision trees.  相似文献   

4.
Logistic Regression,AdaBoost and Bregman Distances   总被引:8,自引:0,他引:8  
Collins  Michael  Schapire  Robert E.  Singer  Yoram 《Machine Learning》2002,48(1-3):253-285
We give a unified account of boosting and logistic regression in which each learning problem is cast in terms of optimization of Bregman distances. The striking similarity of the two problems in this framework allows us to design and analyze algorithms for both simultaneously, and to easily adapt algorithms designed for one problem to the other. For both problems, we give new algorithms and explain their potential advantages over existing methods. These algorithms are iterative and can be divided into two types based on whether the parameters are updated sequentially (one at a time) or in parallel (all at once). We also describe a parameterized family of algorithms that includes both a sequential- and a parallel-update algorithm as special cases, thus showing how the sequential and parallel approaches can themselves be unified. For all of the algorithms, we give convergence proofs using a general formalization of the auxiliary-function proof technique. As one of our sequential-update algorithms is equivalent to AdaBoost, this provides the first general proof of convergence for AdaBoost. We show that all of our algorithms generalize easily to the multiclass case, and we contrast the new algorithms with the iterative scaling algorithm. We conclude with a few experimental results with synthetic data that highlight the behavior of the old and newly proposed algorithms in different settings.  相似文献   

5.
Linear Programming Boosting via Column Generation   总被引:4,自引:0,他引:4  
We examine linear program (LP) approaches to boosting and demonstrate their efficient solution using LPBoost, a column generation based simplex method. We formulate the problem as if all possible weak hypotheses had already been generated. The labels produced by the weak hypotheses become the new feature space of the problem. The boosting task becomes to construct a learning function in the label space that minimizes misclassification error and maximizes the soft margin. We prove that for classification, minimizing the 1-norm soft margin error function directly optimizes a generalization error bound. The equivalent linear program can be efficiently solved using column generation techniques developed for large-scale optimization problems. The resulting LPBoost algorithm can be used to solve any LP boosting formulation by iteratively optimizing the dual misclassification costs in a restricted LP and dynamically generating weak hypotheses to make new LP columns. We provide algorithms for soft margin classification, confidence-rated, and regression boosting problems. Unlike gradient boosting algorithms, which may converge in the limit only, LPBoost converges in a finite number of iterations to a global solution satisfying mathematically well-defined optimality conditions. The optimal solutions of LPBoost are very sparse in contrast with gradient based methods. Computationally, LPBoost is competitive in quality and computational cost to AdaBoost.  相似文献   

6.
Boosting is a set of methods for the construction of classifier ensembles. The differential feature of these methods is that they allow to obtain a strong classifier from the combination of weak classifiers. Therefore, it is possible to use boosting methods with very simple base classifiers. One of the most simple classifiers are decision stumps, decision trees with only one decision node.

This work proposes a variant of the most well-known boosting method, AdaBoost. It is based on considering, as the base classifiers for boosting, not only the last weak classifier, but a classifier formed by the last r selected weak classifiers (r is a parameter of the method). If the weak classifiers are decision stumps, the combination of r weak classifiers is a decision tree.

The ensembles obtained with the variant are formed by the same number of decision stumps than the original AdaBoost. Hence, the original version and the variant produce classifiers with very similar sizes and computational complexities (for training and classification). The experimental study shows that the variant is clearly beneficial.  相似文献   


7.
Bauer  Eric  Kohavi  Ron 《Machine Learning》1999,36(1-2):105-139
Methods for voting classification algorithms, such as Bagging and AdaBoost, have been shown to be very successful in improving the accuracy of certain classifiers for artificial and real-world datasets. We review these algorithms and describe a large empirical study comparing several variants in conjunction with a decision tree inducer (three variants) and a Naive-Bayes inducer. The purpose of the study is to improve our understanding of why and when these algorithms, which use perturbation, reweighting, and combination techniques, affect classification error. We provide a bias and variance decomposition of the error to show how different methods and variants influence these two terms. This allowed us to determine that Bagging reduced variance of unstable methods, while boosting methods (AdaBoost and Arc-x4) reduced both the bias and variance of unstable methods but increased the variance for Naive-Bayes, which was very stable. We observed that Arc-x4 behaves differently than AdaBoost if reweighting is used instead of resampling, indicating a fundamental difference. Voting variants, some of which are introduced in this paper, include: pruning versus no pruning, use of probabilistic estimates, weight perturbations (Wagging), and backfitting of data. We found that Bagging improves when probabilistic estimates in conjunction with no-pruning are used, as well as when the data was backfit. We measure tree sizes and show an interesting positive correlation between the increase in the average tree size in AdaBoost trials and its success in reducing the error. We compare the mean-squared error of voting methods to non-voting methods and show that the voting methods lead to large and significant reductions in the mean-squared errors. Practical problems that arise in implementing boosting algorithms are explored, including numerical instabilities and underflows. We use scatterplots that graphically show how AdaBoost reweights instances, emphasizing not only hard areas but also outliers and noise.  相似文献   

8.
The last decade has seen an increase in the attention paid to the development of cost-sensitive learning algorithms that aim to minimize misclassification costs while still maintaining accuracy. Most of this attention has been on cost-sensitive decision tree learning, whereas relatively little attention has been paid to assess if it is possible to develop better cost-sensitive classifiers based on Bayesian networks. Hence, this paper presents EBNO, an algorithm that utilizes Genetic algorithms to learn cost-sensitive Bayesian networks, where genes are utilized to represent the links between the nodes in Bayesian networks and the expected cost is used as a fitness function. An empirical comparison of the new algorithm has been carried out with respect to (a) an algorithm that induces cost-insensitive Bayesian networks to provide a base line, (b) ICET, a well-known algorithm that uses Genetic algorithms to induce cost-sensitive decision trees, (c) use of MetaCost to induce cost-sensitive Bayesian networks via bagging (d) use of AdaBoost to induce cost-sensitive Bayesian networks, and (e) use of XGBoost, a gradient boosting algorithm, to induce cost-sensitive decision trees. An empirical evaluation on 28 data sets reveals that EBNO performs well in comparison with the algorithms that produce single interpretable models and performs just as well as algorithms that use bagging and boosting methods.  相似文献   

9.
非对称AdaBoost算法及其在目标检测中的应用   总被引:1,自引:0,他引:1  
葛俊锋  罗予频 《自动化学报》2009,35(11):1403-1409
针对目标检测中的非对称分类问题,在分析现有的由离散AdaBoost算法扩展得到的代价敏感(即非对称)学习算法的基础上,提出了以三个不同的非对称错误率上界为核心的推导非对称AdaBoost算法的统一框架. 在该框架下, 不仅现有离散型非对称AdaBoost算法之间的关系非常清晰, 而且其中不符合理论推导的部分可以很容易得到修正. 同时, 利用不同的优化方法, 最小化这三个不同上界, 推出了连续型AdaBoost算法的非对称扩展(用Asym-Real AdaBoost和Asym-Gentle AdaBoost 表示). 新的算法不仅在弱分类器组合系数的计算上比现有离散型算法更加方便, 而且实验证明, 在人脸检测和行人检测两方面都获得了比传统对称AdaBoost算法和离散型非对称AdaBoost算法更好的性能.  相似文献   

10.
In this paper we show that several known algorithms for sequential prediction problems (including Weighted Majority and the quasi-additive family of Grove, Littlestone, and Schuurmans), for playing iterated games (including Freund and Schapire's Hedge and MW, as well as the -strategies of Hart and Mas-Colell), and for boosting (including AdaBoost) are special cases of a general decision strategy based on the notion of potential. By analyzing this strategy we derive known performance bounds, as well as new bounds, as simple corollaries of a single general theorem. Besides offering a new and unified view on a large family of algorithms, we establish a connection between potential-based analysis in learning and their counterparts independently developed in game theory. By exploiting this connection, we show that certain learning problems are instances of more general game-theoretic problems. In particular, we describe a notion of generalized regret andshow its applications in learning theory.  相似文献   

11.
Boosting algorithms have been found successful in many areas of machine learning and, in particular, in ranking. For typical classes of weak learners used in boosting (such as decision stumps or trees), a large feature space can slow down the training, while a long sequence of weak hypotheses combined by boosting can result in a computationally expensive model. In this paper we propose a strategy that builds several sequences of weak hypotheses in parallel, and extends the ones that are likely to yield a good model. The weak hypothesis sequences are arranged in a boosting tree, and new weak hypotheses are added to promising nodes (both leaves and inner nodes) of the tree using some randomized method. Theoretical results show that the proposed algorithm asymptotically achieves the performance of the base boosting algorithm applied. Experiments are provided in ranking web documents and move ordering in chess, and the results indicate that the new strategy yields better performance when the length of the sequence is limited, and converges to similar performance as the original boosting algorithms otherwise.  相似文献   

12.
We study boosting algorithms from a new perspective. We show that the Lagrange dual problems of l?-norm-regularized AdaBoost, LogitBoost, and soft-margin LPBoost with generalized hinge loss are all entropy maximization problems. By looking at the dual problems of these boosting algorithms, we show that the success of boosting algorithms can be understood in terms of maintaining a better margin distribution by maximizing margins and at the same time controlling the margin variance. We also theoretically prove that approximately, l?-norm-regularized AdaBoost maximizes the average margin, instead of the minimum margin. The duality formulation also enables us to develop column-generation-based optimization algorithms, which are totally corrective. We show that they exhibit almost identical classification results to that of standard stagewise additive boosting algorithms but with much faster convergence rates. Therefore, fewer weak classifiers are needed to build the ensemble using our proposed optimization technique.  相似文献   

13.
Classification of data with imbalanced class distribution has posed a significant drawback of the performance attainable by most standard classifier learning algorithms, which assume a relatively balanced class distribution and equal misclassification costs. The significant difficulty and frequent occurrence of the class imbalance problem indicate the need for extra research efforts. The objective of this paper is to investigate meta-techniques applicable to most classifier learning algorithms, with the aim to advance the classification of imbalanced data. The AdaBoost algorithm is reported as a successful meta-technique for improving classification accuracy. The insight gained from a comprehensive analysis of the AdaBoost algorithm in terms of its advantages and shortcomings in tacking the class imbalance problem leads to the exploration of three cost-sensitive boosting algorithms, which are developed by introducing cost items into the learning framework of AdaBoost. Further analysis shows that one of the proposed algorithms tallies with the stagewise additive modelling in statistics to minimize the cost exponential loss. These boosting algorithms are also studied with respect to their weighting strategies towards different types of samples, and their effectiveness in identifying rare cases through experiments on several real world medical data sets, where the class imbalance problem prevails.  相似文献   

14.
This paper presents a strategy to improve the AdaBoost algorithm with a quadratic combination of base classifiers. We observe that learning this combination is necessary to get better performance and is possible by constructing an intermediate learner operating on the combined linear and quadratic terms. This is not trivial, as the parameters of the base classifiers are not under direct control, obstructing the application of direct optimization. We propose a new method realizing iterative optimization indirectly. First we train a classifier by randomizing the labels of training examples. Subsequently, the input learner is called repeatedly with a systematic update of the labels of the training examples in each round. We show that the quadratic boosting algorithm converges under the condition that the given base learner minimizes the empirical error. We also give an upper bound on the VC-dimension of the new classifier. Our experimental results on 23 standard problems show that quadratic boosting compares favorably with AdaBoost on large data sets at the cost of training speed. The classification time of the two algorithms, however, is equivalent.  相似文献   

15.
Multi-Class Learning by Smoothed Boosting   总被引:1,自引:0,他引:1  
AdaBoost.OC has been shown to be an effective method in boosting “weak” binary classifiers for multi-class learning. It employs the Error-Correcting Output Code (ECOC) method to convert a multi-class learning problem into a set of binary classification problems, and applies the AdaBoost algorithm to solve them efficiently. One of the main drawbacks with the AdaBoost.OC algorithm is that it is sensitive to the noisy examples and tends to overfit training examples when they are noisy. In this paper, we propose a new boosting algorithm, named “MSmoothBoost”, which introduces a smoothing mechanism into the boosting procedure to explicitly address the overfitting problem with AdaBoost.OC. We proved the bounds for both the empirical training error and the marginal training error of the proposed boosting algorithm. Empirical studies with seven UCI datasets and one real-world application have indicated that the proposed boosting algorithm is more robust and effective than the AdaBoost.OC algorithm for multi-class learning. Editor: Nicolo Cesa-Bianchi  相似文献   

16.
Support vector machines for spam categorization   总被引:44,自引:0,他引:44  
We study the use of support vector machines (SVM) in classifying e-mail as spam or nonspam by comparing it to three other classification algorithms: Ripper, Rocchio, and boosting decision trees. These four algorithms were tested on two different data sets: one data set where the number of features were constrained to the 1000 best features and another data set where the dimensionality was over 7000. SVM performed best when using binary features. For both data sets, boosting trees and SVM had acceptable test performance in terms of accuracy and speed. However, SVM had significantly less training time.  相似文献   

17.
While rule based control (RBC) is current practice in most building automation systems that issue discrete control signals, recent simulation studies suggest that advanced, optimization based control methods such as hybrid model predictive control (HMPC) can potentially outperform RBC in terms of energy efficiency and occupancy comfort. However, HMPC requires a more complex IT infrastructure and numerical optimization in the loop, which makes commissioning, operation of the building, and error handling significantly more involved than in the rule based setting. In this paper, we suggest an automated RBC synthesis procedure for binary decisions that extracts prevalent information from simulation data with HMPC controllers. The result is a set of simple decision rules that preserves much of the control performance of HMPC. The methods are based on standard machine learning algorithms, in particular support vector machines (SVMs) and adaptive boosting (AdaBoost). We consider also the ranking and selection of measurements which are used for a decision and show that this feature selection is useful in both complexity reduction and reduction of investment costs by pruning unnecessary sensors. The suggested methods are evaluated in simulation for six different case studies and shown to maintain the performance of HMPC despite a tremendous reduction in complexity.  相似文献   

18.
AdaBoost is a popular and effective leveraging procedure for improving the hypotheses generated by weak learning algorithms. AdaBoost and many other leveraging algorithms can be viewed as performing a constrained gradient descent over a potential function. At each iteration the distribution over the sample given to the weak learner is proportional to the direction of steepest descent. We introduce a new leveraging algorithm based on a natural potential function. For this potential function, the direction of steepest descent can have negative components. Therefore, we provide two techniques for obtaining suitable distributions from these directions of steepest descent. The resulting algorithms have bounds that are incomparable to AdaBoost's. The analysis suggests that our algorithm is likely to perform better than AdaBoost on noisy data and with weak learners returning low confidence hypotheses. Modest experiments confirm that our algorithm can perform better than AdaBoost in these situations.  相似文献   

19.
We introduce a multi-class generalization of AdaBoost with binary weak-learners. We use a vectorial codification to represent class labels and a multi-class exponential loss function to evaluate classifier responses. This representation produces a set of margin values that provide a range of punishments for failures and rewards for successes. Moreover, the stage-wise optimization of this model introduces an asymmetric boosting procedure whose costs depend on the number of classes separated by each weak-learner. In this way the boosting algorithm takes into account class imbalances when building the ensemble. The experiments performed compare this new approach favorably to AdaBoost.MH, GentleBoost and the SAMME algorithms.  相似文献   

20.
We improve the analysis of the decision tree boosting algorithm proposed by Mansour and McAllester. For binary classification problems, the algorithm of Mansour and McAllester constructs a multi-way branching decision tree using a set of multi-class hypotheses. Mansour and McAllester proved that it works under certain conditions. We give a rigorous analysis of the algorithm and simplify the conditions. From this simplification, we can provide a simpler algorithm, for which no prior knowledge on the quality of weak hypotheses is necessary.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号