首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 218 毫秒
1.
当前,boosting 集成学习算法研究主要集中于最大化弱学习器凸组合的间隔或软间隔,该凸组合几乎使用了生成的所有弱学习器,然而这些弱学习器间存在大量的相关性和冗余,增加了训练和分类过程的时空复杂度.针对这一问题,在LPBoost基础上提出了一种选择性boosting集成学习算法,称为SelectedBoost.在每次迭代生成新的弱学习器以后,通过计算新生成的弱学习器与已有弱学习器的相关度和差异度,并结合当前集成的强学习器的准确率来判断是否选择该弱学习器.另外,当前的一系列boosting算法(如AdaBoost,LPBoost,ERLPBoost等),本质上是基于已生成的1个或者多个弱学习器来更新样本权重,但与弱学习器相比,强学习器更能代表当前的决策面.因此, SelectedBoost 通过在带约束的间隔最大化问题中引入更加严格的强学习器边界约束条件,使得该算法不仅参考弱学习器边界,同时还参考已生成的强学习器来更新样本权重,进而提高算法的收敛速度.最后,与其他有代表性的集成学习算法进行实验比较,结果表明,该方法在收敛率、分类准确性以及泛化能力等方面均具有比较明显的优势.  相似文献   

2.
从多个弱分类器重构出强分类器的集成学习方法是机器学习领域的重要研究方向之一。尽管已有多种多样性基本分类器的生成方法被提出,但这些方法的鲁棒性仍有待提高。递减样本集成学习算法综合了目前最为流行的boosting与bagging算法的学习思想,通过不断移除训练集中置信度较高的样本,使训练集空间依次递减,使得某些被低估的样本在后续的分类器中得到充分训练。该策略形成一系列递减的训练子集,因而也生成一系列多样性的基本分类器。类似于boosting与bagging算法,递减样本集成学习方法采用投票策略对基本分类器进行整合。通过严格的十折叠交叉检验,在8个UCI数据集与7种基本分类器上的测试表明,递减样本集成学习算法总体上要优于boosting与bagging算法。  相似文献   

3.
半监督学习和集成学习是目前机器学习领域中的重要方法。半监督学习利用未标记样本,而集成学习综合多个弱学习器,以提高分类精度。针对名词型数据,本文提出一种融合聚类和集成学习的半监督分类方法SUCE。在不同的参数设置下,采用多个聚类算法生成大量的弱学习器;利用已有的类标签信息,对弱学习器进行评价和选择;通过集成弱学习器对测试集进行预分类,并将置信度高的样本放入训练集;利用扩展的训练集,使用ID3、Nave Bayes、 kNN、C4.5、OneR、Logistic等基础算法对其他样本进行分类。在UCI数据集上的实验结果表明,当训练样本较少时,本方法能稳定提高多数基础算法的准确性。  相似文献   

4.
通过对数据流分类中的概念漂移问题的研究,提出了一种在线装袋(Online Bagging)算法的改进算法——自适应抽样参数的在线装袋算法AdBagging(adaptive lambda bagging)。利用在分类过程中出现的误分样本数量来调整Online Bagging算法中的泊松(Poisson)分布的抽样参数,从而可以动态调整新样本在学习器中的权重,即对于数据流中的误分类样本给予较高的学习权重因子,而对于正确分类的样本给予较低的学习权重因子,同时结合样本出现的时间顺序调整权重因子,使得集成分类器可以动态调整其多样性(adversity)。该算法具有OnlineBagging算法的高效简洁优点,并能解决数据流中具有概念漂移的问题,人工数据集和实际数据集上的实验结果表明了该算法的有效性。  相似文献   

5.
钟向阳  凌捷 《计算机工程》2009,35(11):172-174
Adaboost算法采用单阈值弱分类器,难以拟合复杂分布,其训练过程收敛速度较慢。针对该问题设计一种多阈值弱学习器,利用平方和减少最大化准则划分节点并生成弱分类器,在训练数据集上采用GAB算法将弱分类器提升为强分类器。实验结果表明,在弱分类器数目相同的情况下,该方法的正样本误报率低于Adaboost算法。  相似文献   

6.
集成学习算法的构造属于机器学习领域的重要研究内容,尽管弱学习定理指出了弱学习算法与强学习算法是等价的,但如何构造好的集成学习算法仍然是一个未得到很好解决的问题.Freund和Schapire提出的AdaBoost算法和Schapire和Singer提出的连续AdaBoost算法部分解决了该问题.提出了一种学习错误定义,以这种学习错误最小化为目标,提出了一种通用的集成学习算法,算法可以解决目前绝大多数分类需求的学习问题,如多分类、代价敏感分类、不平衡分类、多标签分类、模糊分类等问题,算法还对AdaBoost系列算法进行了统一和推广.从保证组合预测函数的泛化能力出发,提出了算法中的简单预测函数可统一基于样本的单个特征来构造.理论分析和实验结论均表明,提出的系列算法的学习错误可以任意小,同时又不用担心出现过学习现象.  相似文献   

7.
鲁淑霞  张振莲 《计算机科学》2021,48(11):184-191
为了解决非平衡数据分类问题,提出了一种基于最优间隔的AdaBoostv算法.该算法采用改进的SVM作为基分类器,在SVM的优化模型中引入间隔均值项,并根据数据非平衡比对间隔均值项和损失函数项进行加权;采用带有方差减小的随机梯度方法(Stochastic Variance Reduced Gradient,SVRG)对优化模型进行求解,以加快收敛速度.所提基于最优间隔的AdaBoostv算法在样本权重更新公式中引入了一种新的自适应代价敏感函数,赋予少数类样本、误分类的少数类样本以及靠近决策边界的少数类样本更高的代价值;另外,通过结合新的权重公式以及引入给定精度参数v下的最优间隔的估计值,推导出新的基分类器权重策略,进一步提高了算法的分类精度.对比实验表明,在线性和非线性情况下,所提基于最优间隔的Ada-Boostv算法在非平衡数据集上的分类精度优于其他算法,且能获得更大的最小间隔.  相似文献   

8.
基于Boosting的不平衡数据分类算法研究   总被引:2,自引:0,他引:2  
研究基于boosting的不平衡数据分类算法,归纳分析现有算法,在此基础上提出权重采样boosting算法。对样本进行权重采样,改变原有数据分布,从而得到适用于不平衡数据的分类器。算法本质是利用采样函数调整原始boosting损失函数形式,进一步强调正样本的分类损失,使得分类器侧重对正样本的有效判别,提高正样本的整体识别率。算法实现简单,实用性强,在UCI数据集上的实验结果表明,对于不平衡数据分类问题,权重采样boosting优于原始boosting及前人算法。  相似文献   

9.
杨浩  王宇  张中原 《计算机应用》2019,39(7):1883-1887
为了解决不均衡数据集的分类问题和一般的代价敏感学习算法无法扩展到多分类情况的问题,提出了一种基于K最近邻(KNN)样本平均距离的代价敏感算法的集成方法。首先,根据最大化最小间隔的思想提出一种降低决策边界样本密度的重采样方法;接着,采用每类样本的平均距离作为分类结果的判断依据,并提出一种符合贝叶斯决策理论的学习算法,使得改进后的算法具备代价敏感性;最后,对改进后的代价敏感算法按K值进行集成,以代价最小为原则,调整各基学习器的权重,得到一个以总体误分代价最低为目标的代价敏感AdaBoost算法。实验结果表明,与传统的KNN算法相比,改进后的算法在平均误分代价上下降了31.4个百分点,并且代价敏感性能更好。  相似文献   

10.
高锋  黄海燕 《计算机科学》2017,44(8):225-229
不平衡数据严重影响了传统分类算法的性能,导致少数类的识别率降低。提出一种基于邻域特征的混合抽样技术,该技术根据样本邻域中的类别分布特征来确定采样权重,进而采用混合抽样的方法来获得平衡的数据集;然后采用一种基于局部置信度的动态集成方法,通过分类学习生成基分类器,对于每个检验的样本,根据局部分类精度动态地选择最优的基分类器进行组合。通过UCI标准数据集上的实验表明,该方法能够同时提高不平衡数据中少数类和多数类的分类精度。  相似文献   

11.
Linear Programming Boosting via Column Generation   总被引:4,自引:0,他引:4  
We examine linear program (LP) approaches to boosting and demonstrate their efficient solution using LPBoost, a column generation based simplex method. We formulate the problem as if all possible weak hypotheses had already been generated. The labels produced by the weak hypotheses become the new feature space of the problem. The boosting task becomes to construct a learning function in the label space that minimizes misclassification error and maximizes the soft margin. We prove that for classification, minimizing the 1-norm soft margin error function directly optimizes a generalization error bound. The equivalent linear program can be efficiently solved using column generation techniques developed for large-scale optimization problems. The resulting LPBoost algorithm can be used to solve any LP boosting formulation by iteratively optimizing the dual misclassification costs in a restricted LP and dynamically generating weak hypotheses to make new LP columns. We provide algorithms for soft margin classification, confidence-rated, and regression boosting problems. Unlike gradient boosting algorithms, which may converge in the limit only, LPBoost converges in a finite number of iterations to a global solution satisfying mathematically well-defined optimality conditions. The optimal solutions of LPBoost are very sparse in contrast with gradient based methods. Computationally, LPBoost is competitive in quality and computational cost to AdaBoost.  相似文献   

12.
Boosting algorithms have been found successful in many areas of machine learning and, in particular, in ranking. For typical classes of weak learners used in boosting (such as decision stumps or trees), a large feature space can slow down the training, while a long sequence of weak hypotheses combined by boosting can result in a computationally expensive model. In this paper we propose a strategy that builds several sequences of weak hypotheses in parallel, and extends the ones that are likely to yield a good model. The weak hypothesis sequences are arranged in a boosting tree, and new weak hypotheses are added to promising nodes (both leaves and inner nodes) of the tree using some randomized method. Theoretical results show that the proposed algorithm asymptotically achieves the performance of the base boosting algorithm applied. Experiments are provided in ranking web documents and move ordering in chess, and the results indicate that the new strategy yields better performance when the length of the sequence is limited, and converges to similar performance as the original boosting algorithms otherwise.  相似文献   

13.
为避免硬间隔算法过分强调较难分类样本而导致泛化性能下降的问题,提出一种新的基于软间隔的AdaBoost-QP算法。在样本硬间隔中加入松弛项,得到软间隔的概念,以优化样本间隔分布、调整弱分类器的权重。实验结果表明,该算法能降低泛化误差,提高 AdaBoost算法的泛化性能。  相似文献   

14.
Boosting algorithms build highly accurate prediction mechanisms from a collection of low-accuracy predictors. To do so, they employ the notion of weak-learnability. The starting point of this paper is a proof which shows that weak learnability is equivalent to linear separability with ? 1 margin. The equivalence is a direct consequence of von Neumann’s minimax theorem. Nonetheless, we derive the equivalence directly using Fenchel duality. We then use our derivation to describe a family of relaxations to the weak-learnability assumption that readily translates to a family of relaxations of linear separability with margin. This alternative perspective sheds new light on known soft-margin boosting algorithms and also enables us to derive several new relaxations of the notion of linear separability. Last, we describe and analyze an efficient boosting framework that can be used for minimizing the loss functions derived from our family of relaxations. In particular, we obtain efficient boosting algorithms for maximizing hard and soft versions of the ? 1 margin.  相似文献   

15.
We study boosting algorithms from a new perspective. We show that the Lagrange dual problems of l?-norm-regularized AdaBoost, LogitBoost, and soft-margin LPBoost with generalized hinge loss are all entropy maximization problems. By looking at the dual problems of these boosting algorithms, we show that the success of boosting algorithms can be understood in terms of maintaining a better margin distribution by maximizing margins and at the same time controlling the margin variance. We also theoretically prove that approximately, l?-norm-regularized AdaBoost maximizes the average margin, instead of the minimum margin. The duality formulation also enables us to develop column-generation-based optimization algorithms, which are totally corrective. We show that they exhibit almost identical classification results to that of standard stagewise additive boosting algorithms but with much faster convergence rates. Therefore, fewer weak classifiers are needed to build the ensemble using our proposed optimization technique.  相似文献   

16.
Improved Boosting Algorithms Using Confidence-rated Predictions   总被引:55,自引:0,他引:55  
We describe several improvements to Freund and Schapire's AdaBoost boosting algorithm, particularly in a setting in which hypotheses may assign confidences to each of their predictions. We give a simplified analysis of AdaBoost in this setting, and we show how this analysis can be used to find improved parameter settings as well as a refined criterion for training weak hypotheses. We give a specific method for assigning confidences to the predictions of decision trees, a method closely related to one used by Quinlan. This method also suggests a technique for growing decision trees which turns out to be identical to one proposed by Kearns and Mansour. We focus next on how to apply the new boosting algorithms to multiclass classification problems, particularly to the multi-label case in which each example may belong to more than one class. We give two boosting methods for this problem, plus a third method based on output coding. One of these leads to a new method for handling the single-label case which is simpler but as effective as techniques suggested by Freund and Schapire. Finally, we give some experimental results comparing a few of the algorithms discussed in this paper.  相似文献   

17.
It has been shown that the Universum data, which do not belong to either class of the classification problem of interest, may contain useful prior domain knowledge for training a classifier [1], [2]. In this work, we design a novel boosting algorithm that takes advantage of the available Universum data, hence the name UBoost. UBoost is a boosting implementation of Vapnik's alternative capacity concept to the large margin approach. In addition to the standard regularization term, UBoost also controls the learned model's capacity by maximizing the number of observed contradictions. Our experiments demonstrate that UBoost can deliver improved classification accuracy over standard boosting algorithms that use labeled data alone.  相似文献   

18.
Feature-based ensemble learning, where weak hypotheses are learned within the associated feature subspaces constructed by repeated random feature selection, is described. The proposed ensemble approach is less affected by noisy features or outliers unique to the training set than the bagging and boosting algorithms due to the randomized selection of feature subsets from the entire training set. The individual weak hypotheses perform their own generalization processes, within the associated feature subspaces, independently of each other. This allows the proposed ensemble to provide improved performance on unseen data over other ensemble learning methods that randomly choose subsets of training samples in an input space. The weak hypotheses are combined through three different aggregating strategies: majority voting, weighted average and neural network-based aggregation. The proposed ensemble technique has been applied to hyperspectral chemical plume data and a performance comparison of the proposed and other existing ensemble methods is presented.  相似文献   

19.
侯勇  郑雪峰 《计算机应用》2013,33(4):998-1000
核主成分分析(KPCA)与多层感知器(MLP)是流行的特征提取算法,但这些算法存在效率低下与易陷于局部最优解等问题。针对KPCA与MLP算法存在的问题,提出了一个新颖的特征提取算法--基于最大间隔超平面的增强的特征提取算法(EFE)。该算法独立于输入样本的概率分布,通过采用隔间最大化且两两正交的最大分割超平面,将输入样本映射到超平面的法线所张成的子空间中,实现输入样本的特征提取。在对现实世界数据集wine与AR的特征提取的实验表明,基于最大间隔超平面的增强特征提取算法在执行效率、识别准确率方面均超出了KPCA与MLP的执行效率与识别准确率。  相似文献   

20.
η-one-class问题和η-outlier及其LP学习算法   总被引:1,自引:0,他引:1  
陶卿  齐红威  吴高巍  章显 《计算机学报》2004,27(8):1102-1108
用SVM方法研究one-class和outlier问题.在将one-class问题理解为一种函数估计问题的基础上,作者首次定义了η-one-class和η-outlier问题的泛化错误,进而定义了线性可分性和边缘,得到了求解one-class问题的最大边缘、软边缘和v-软边缘算法.这些学习算法具有统计学习理论依据并可归结为求解线性规划问题.算法的实现采用与boosting类似的思路.实验结果表明该文的算法是有实际意义的.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号