首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Multi-Class Learning by Smoothed Boosting   总被引:1,自引:0,他引:1  
AdaBoost.OC has been shown to be an effective method in boosting “weak” binary classifiers for multi-class learning. It employs the Error-Correcting Output Code (ECOC) method to convert a multi-class learning problem into a set of binary classification problems, and applies the AdaBoost algorithm to solve them efficiently. One of the main drawbacks with the AdaBoost.OC algorithm is that it is sensitive to the noisy examples and tends to overfit training examples when they are noisy. In this paper, we propose a new boosting algorithm, named “MSmoothBoost”, which introduces a smoothing mechanism into the boosting procedure to explicitly address the overfitting problem with AdaBoost.OC. We proved the bounds for both the empirical training error and the marginal training error of the proposed boosting algorithm. Empirical studies with seven UCI datasets and one real-world application have indicated that the proposed boosting algorithm is more robust and effective than the AdaBoost.OC algorithm for multi-class learning. Editor: Nicolo Cesa-Bianchi  相似文献   

2.
提升(Boosting)是改善基分类器学习的有效手段。而研究表明,Boosting对于朴素贝叶斯的改善效果不明显。文章提出了一种新的提升算法——ActiveBoost,ActiveBoost结合主动学习挖掘未分配类别标注中样本的信息,并将不稳定性引入到朴素贝叶斯的构造过程。在UCI机器学习数据库的实验结果证明了该算法的有效性。  相似文献   

3.
The Weighted Majority Algorithm   总被引:2,自引:0,他引:2  
We study the construction of prediction algorithms in a situation in which a learner faces a sequence of trials, with a prediction to be made in each, and the goal of the learner is to make few mistakes. We are interested in the case where the learner has reason to believe that one of some pool of known algorithms will perform well, but the learner does not know which one. A simple and effective method, based on weighted voting, is introduced for constructing a compound algorithm in such a circumstance. We call this method the Weighted Majority Algorithm. We show that this algorithm is robust in the presence of errors in the data. We discuss various versions of the Weighted Majority Algorithm and prove mistake bounds for them that are closely related to the mistake bounds of the best algorithms of the pool. For example, given a sequence of trials, if there is an algorithm in the pool that makes at most m mistakes then the Weighted Majority Algorithm will make at most c(log | | + m) mistakes on that sequence, where c is fixed constant.  相似文献   

4.
间隔分布是 Boosting 算法的关键,现有的间隔分布泛化误差界难以计算,限制Boosting算法的发展.基于此问题,文中提出直接优化间隔分布的矩优化 Boosting算法(MOBoost).首先,推导基于间隔分布一阶矩和二阶矩的 Boosting 泛化误差界 (Boosting 的矩泛化界),直接刻画间隔分布对 Boosting 的影响.然后,依据Boosting 的矩泛化界,给出Boosting 的矩准则,在最大化间隔分布的一阶矩同时最小化间隔分布的二阶矩.最后,给出求解 Boosting 的矩准则凸二次优化问题的原始形式和对偶形式,为 Boosting 矩准则提供有效的计算方法.理论分析与实验表明,MOBoost有效可靠.  相似文献   

5.
增强型朴素贝叶斯产   总被引:8,自引:0,他引:8  
王实  高文 《计算机科学》2000,27(4):46-49
朴素贝叶斯是一种分类监督学习方法。在理论上,应用其前提为例子的属性值独立于例子的分类属性。这个前提在实际应用中过于严格,常常得不到满足,即使是这样,在违反该前提的情况下,朴素贝叶斯学习方法仍然取得了很大的成功。近来,一种改进的朴素贝叶斯方法,增强(Boost-ing),受到广泛的关注,AdaBoost方法是其主要方法。当AdaBoost方法被用于联合几个朴素贝叶斯分类器时,其在数学上等价于一个具有稀疏编码输入,单隐层节点,sigmoid激活函数的反馈型神经网络。  相似文献   

6.
An Adaptive Version of the Boost by Majority Algorithm   总被引:6,自引:0,他引:6  
Freund  Yoav 《Machine Learning》2001,43(3):293-318
We propose a new boosting algorithm. This boosting algorithm is an adaptive version of the boost by majority algorithm and combines bounded goals of the boost by majority algorithm with the adaptivity of AdaBoost.The method used for making boost-by-majority adaptive is to consider the limit in which each of the boosting iterations makes an infinitesimally small contribution to the process as a whole. This limit can be modeled using the differential equations that govern Brownian motion. The new boosting algorithm, named BrownBoost, is based on finding solutions to these differential equations.The paper describes two methods for finding approximate solutions to the differential equations. The first is a method that results in a provably polynomial time algorithm. The second method, based on the Newton-Raphson minimization procedure, is much more efficient in practice but is not known to be polynomial.  相似文献   

7.
文中首先分析降噪集成算法采用的样本置信度度量函数的性质,阐述此函数不适合处理多类问题的根源。进而设计更有针对性的置信度度量函数,并基于此函数提出一种增强型降噪参数集成算法。从而使鉴别式贝叶斯网络参数学习算法不但有效地抑止噪声影响,而且避免分类器的过度拟合,进一步拓展采用集群式学习算法的鉴别式贝叶斯网络分类器在多类问题上的应用。最后,实验结果及其统计假设检验分析充分验证此算法比目前的集群式贝叶斯网络参数学习方法得到的分类器在性能上有较显著提高。  相似文献   

8.
Learning Binary Relations Using Weighted Majority Voting   总被引:2,自引:0,他引:2  
In this paper we demonstrate how weighted majority voting with multiplicative weight updating can be applied to obtain robust algorithms for learning binary relations. We first present an algorithm that obtains a nearly optimal mistake bound but at the expense of using exponential computation to make each prediction. However, the time complexity of our algorithm is significantly reduced from that of previously known algorithms that have comparable mistake bounds. The second algorithm we present is a polynomial time algorithm with a non-optimal mistake bound. Again the mistake bound of our second algorithm is significantly better than previous bounds proven for polynomial time algorithms.A key contribution of our work is that we define a non-pure or noisy binary relation and then by exploiting the robustness of weighted majority voting with respect to noise, we show that both of our algorithms can learn non-pure relations. These provide the first algorithms that can learn non-pure binary relations.The first author was supported in part by NSF grant CCR-91110108 and NSF National Young Investigator Grant CCR-9357707 with matching funds provided by Xerox Corporation, Palo Alto Research Center and WUTA. The second author was supported by ONR grant NO0014-91-J-1162 and NSF grant IRI-9123692.  相似文献   

9.
AdaBoost is a method for improving the classification accuracy of a given learning algorithm by combining hypotheses created by the learning alogorithms. One of the drawbacks of AdaBoost is that it worsens its performance when training examples include noisy examples or exceptional examples, which are called hard examples. The phenomenon causes that AdaBoost assigns too high weights to hard examples. In this research, we introduce the thresholds into the weighting rule of AdaBoost in order to prevent weights from being assigned too high value. During learning process, we compare the upper bound of the classification error of our method with that of AdaBoost, and we set the thresholds such that the upper bound of our method can be superior to that of AdaBoost. Our method shows better performance than AdaBoost.  相似文献   

10.
许多实际问题涉及到多分类技术,该技术能有效地缩小用户与计算机之间的理解差异。在传统的多类Boosting方法中,多类损耗函数未必具有猜测背离性,并且多类弱学习器的结合被限制为线性的加权和。为了获得高精度的最终分类器,多类损耗函数应具有多类边缘极大化、贝叶斯一致性与猜测背离性。此外,弱学习器的缺点可能会限制线性分类器的性能,但它们的非线性结合可以提供较强的判别力。根据这两个观点,设计了一个自适应的多类Boosting分类器,即SOHPBoost算法。在每次迭代中,SOHPBoost算法能够利用向量加法或Hadamard乘积来集成最优的多类弱学习器。这个自适应的过程可以产生多类弱学习的Hadamard乘积向量和,进而挖掘出数据集的隐藏结构。实验结果表明,SOHPBoost算法可以产生较好的多分类性能。  相似文献   

11.
Boosting算法是近年来在机器学习领域中一种流行的用来提高学习精度的算法。文中以AdaBoost算法为例来介绍Boosting的基本原理。  相似文献   

12.
Semi-supervised learning has attracted a significant amount of attention in pattern recognition and machine learning. Most previous studies have focused on designing special algorithms to effectively exploit the unlabeled data in conjunction with labeled data. Our goal is to improve the classification accuracy of any given supervised learning algorithm by using the available unlabeled examples. We call this as the Semi-supervised improvement problem, to distinguish the proposed approach from the existing approaches. We design a metasemi-supervised learning algorithm that wraps around the underlying supervised algorithm and improves its performance using unlabeled data. This problem is particularly important when we need to train a supervised learning algorithm with a limited number of labeled examples and a multitude of unlabeled examples. We present a boosting framework for semi-supervised learning, termed as SemiBoost. The key advantages of the proposed semi-supervised learning approach are: 1) performance improvement of any supervised learning algorithm with a multitude of unlabeled data, 2) efficient computation by the iterative boosting algorithm, and 3) exploiting both manifold and cluster assumption in training classification models. An empirical study on 16 different data sets and text categorization demonstrates that the proposed framework improves the performance of several commonly used supervised learning algorithms, given a large number of unlabeled examples. We also show that the performance of the proposed algorithm, SemiBoost, is comparable to the state-of-the-art semi-supervised learning algorithms.  相似文献   

13.
一种结合半监督Boosting方法的迁移学习算法   总被引:1,自引:0,他引:1  
迁移学习是数据挖掘中的一个研究方向,试图重用相关领域的数据样本,将相关领域的知识”迁移”到新领域中帮助训练.当前,基于实例的迁移学习算法容易产生过度拟合的问题,不能充分利用相关领域中的有用数据,为了避免这个问题,通过引入目标领域的无标记样本参与训练,利用半监督Boosting方法,提出一种新的迁移学习算法,能够对样本的...  相似文献   

14.
本文提出结合majonty和median的表决算法,问题得到较好改善:首先使用majority表决算法进行工作,当出现无法表决的情况时,计算数据分散程度,如果小于某个设定安全阈值,则使用median表决算法来处理这种情况,若大于安全阈值,则输出一个无法表决的信号.最终的仿真测试表明:结合majority和median表决系统是一种可行的方法,能够提高表决系统输出正确率;错误率虽然有小幅增加,但能够有效改善majority无法输出的情况,同时正确率有一定程度的提高.本文提出的表决算法可以适用于需要提高输出正确率并且可以适当降低安全性的应用环境,测试结果表明对于连续信号和离散信号都有较好的效果.  相似文献   

15.
基于Boosting方法的人脸检测   总被引:3,自引:0,他引:3  
该文提出一种基于Boosting方法的人脸检测算法。先用特征脸方法构造一个基于重建图像信噪比的阈值函数用于人脸检测,在此基础上,该文利用Boosting方法构造一个基于信噪比阈值的检测函数序列,然后以一定的方式将它们组合成一个总检测函数,据此判别一幅图像是否为人脸图像。实验结果显示,这种方法明显提高了检测性能。  相似文献   

16.
基于Boosting算法的入侵检测   总被引:1,自引:1,他引:1  
提出一种基于Boosting算法的入侵检测方法。先用神经网络初步确定一个入侵检测函数,在此基础上,利用Boosting方法构造一个基于神经网络的入侵检测函数序列,然后以一定的方式将它们组合成一个加强的总检测函数,据此进行入侵检测。实验结果显示,这种方法明显提高了检测性能。  相似文献   

17.
针对实时数据的在线处理问题,提出了一种基于Boosting的在线回归算法,通过对学习机适宜度置信区间的定义,建立了对概念漂移的实时判断方法,利用最新流入的数据块,及时对集成算法中的个体学习机进行逐一迭代更新,从而起到在线学习的效果。通过对标准数据库的数据建立仿真模型,验证这种在线回归算法可以与离线Boosting回归算法达到相似的精度,同时占用较少的存储记忆单元,提高学习速度,能够对学习机参数进行及时调整;该算法还可引入到工业生产中,对生产数据起到实时监控的作用。  相似文献   

18.
基于投票机制的融合聚类算法   总被引:1,自引:0,他引:1  
以一趟聚类算法作为划分数据的基本算法,讨论聚类融合问题.通过重复使用一趟聚类算法划分数据,并随机选择阈值和数据输入顺序,得到不同的聚类结果,将这些聚类结果映射为模式间的关联矩阵,在关联矩阵上使用投票机制获得最终的数据划分.在真实数据集和人造数据集上检验了提出的聚类融合算法,并与相关聚类算法进行了对比,实验结果表明,文中提出的算法是有效可行的.  相似文献   

19.
Boosting Algorithms for Parallel and Distributed Learning   总被引:1,自引:0,他引:1  
The growing amount of available information and its distributed and heterogeneous nature has a major impact on the field of data mining. In this paper, we propose a framework for parallel and distributed boosting algorithms intended for efficient integrating specialized classifiers learned over very large, distributed and possibly heterogeneous databases that cannot fit into main computer memory. Boosting is a popular technique for constructing highly accurate classifier ensembles, where the classifiers are trained serially, with the weights on the training instances adaptively set according to the performance of previous classifiers. Our parallel boosting algorithm is designed for tightly coupled shared memory systems with a small number of processors, with an objective of achieving the maximal prediction accuracy in fewer iterations than boosting on a single processor. After all processors learn classifiers in parallel at each boosting round, they are combined according to the confidence of their prediction. Our distributed boosting algorithm is proposed primarily for learning from several disjoint data sites when the data cannot be merged together, although it can also be used for parallel learning where a massive data set is partitioned into several disjoint subsets for a more efficient analysis. At each boosting round, the proposed method combines classifiers from all sites and creates a classifier ensemble on each site. The final classifier is constructed as an ensemble of all classifier ensembles built on disjoint data sets. The new proposed methods applied to several data sets have shown that parallel boosting can achieve the same or even better prediction accuracy considerably faster than the standard sequential boosting. Results from the experiments also indicate that distributed boosting has comparable or slightly improved classification accuracy over standard boosting, while requiring much less memory and computational time since it uses smaller data sets.  相似文献   

20.
Mannor  Shie  Meir  Ron 《Machine Learning》2002,48(1-3):219-251
We consider the existence of a linear weak learner for boosting algorithms. A weak learner for binary classification problems is required to achieve a weighted empirical error on the training set which is bounded from above by 1/2 – , > 0, for any distribution on the data set. Moreover, in order that the weak learner be useful in terms of generalization, must be sufficiently far from zero. While the existence of weak learners is essential to the success of boosting algorithms, a proof of their existence based on a geometric point of view has been hitherto lacking. In this work we show that under certain natural conditions on the data set, a linear classifier is indeed a weak learner. Our results can be directly applied to generalization error bounds for boosting, leading to closed-form bounds. We also provide a procedure for dynamically determining the number of boosting iterations required to achieve low generalization error. The bounds established in this work are based on the theory of geometric discrepancy.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号