首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Boosting Algorithms for Parallel and Distributed Learning   总被引:1,自引:0,他引:1  
The growing amount of available information and its distributed and heterogeneous nature has a major impact on the field of data mining. In this paper, we propose a framework for parallel and distributed boosting algorithms intended for efficient integrating specialized classifiers learned over very large, distributed and possibly heterogeneous databases that cannot fit into main computer memory. Boosting is a popular technique for constructing highly accurate classifier ensembles, where the classifiers are trained serially, with the weights on the training instances adaptively set according to the performance of previous classifiers. Our parallel boosting algorithm is designed for tightly coupled shared memory systems with a small number of processors, with an objective of achieving the maximal prediction accuracy in fewer iterations than boosting on a single processor. After all processors learn classifiers in parallel at each boosting round, they are combined according to the confidence of their prediction. Our distributed boosting algorithm is proposed primarily for learning from several disjoint data sites when the data cannot be merged together, although it can also be used for parallel learning where a massive data set is partitioned into several disjoint subsets for a more efficient analysis. At each boosting round, the proposed method combines classifiers from all sites and creates a classifier ensemble on each site. The final classifier is constructed as an ensemble of all classifier ensembles built on disjoint data sets. The new proposed methods applied to several data sets have shown that parallel boosting can achieve the same or even better prediction accuracy considerably faster than the standard sequential boosting. Results from the experiments also indicate that distributed boosting has comparable or slightly improved classification accuracy over standard boosting, while requiring much less memory and computational time since it uses smaller data sets.  相似文献   

2.
《Information Fusion》2002,3(4):245-258
In classifier combination, it is believed that diverse ensembles have a better potential for improvement on the accuracy than non-diverse ensembles. We put this hypothesis to a test for two methods for building the ensembles: Bagging and Boosting, with two linear classifier models: the nearest mean classifier and the pseudo-Fisher linear discriminant classifier. To estimate diversity, we apply nine measures proposed in the recent literature on combining classifiers. Eight combination methods were used: minimum, maximum, product, average, simple majority, weighted majority, Naive Bayes and decision templates. We carried out experiments on seven data sets for different sample sizes, different number of classifiers in the ensembles, and the two linear classifiers. Altogether, we created 1364 ensembles by the Bagging method and the same number by the Boosting method. On each of these, we calculated the nine measures of diversity and the accuracy of the eight different combination methods, averaged over 50 runs. The results confirmed in a quantitative way the intuitive explanation behind the success of Boosting for linear classifiers for increasing training sizes, and the poor performance of Bagging in this case. Diversity measures indicated that Boosting succeeds in inducing diversity even for stable classifiers whereas Bagging does not.  相似文献   

3.
Bagging, boosting, rotation forest and random subspace methods are well known re-sampling ensemble methods that generate and combine a diversity of learners using the same learning algorithm for the base-classifiers. Boosting and rotation forest algorithms are considered stronger than bagging and random subspace methods on noise-free data. However, there are strong empirical indications that bagging and random subspace methods are much more robust than boosting and rotation forest in noisy settings. For this reason, in this work we built an ensemble of bagging, boosting, rotation forest and random subspace methods ensembles with 6 sub-classifiers in each one and then a voting methodology is used for the final prediction. We performed a comparison with simple bagging, boosting, rotation forest and random subspace methods ensembles with 25 sub-classifiers, as well as other well known combining methods, on standard benchmark datasets and the proposed technique had better accuracy in most cases.  相似文献   

4.
Classification with imbalanced data-sets has become one of the most challenging problems in Data Mining. Being one class much more represented than the other produces undesirable effects in both the learning and classification processes, mainly regarding the minority class. Such a problem needs accurate tools to be undertaken; lately, ensembles of classifiers have emerged as a possible solution. Among ensemble proposals, the combination of Bagging and Boosting with preprocessing techniques has proved its ability to enhance the classification of the minority class.In this paper, we develop a new ensemble construction algorithm (EUSBoost) based on RUSBoost, one of the simplest and most accurate ensemble, which combines random undersampling with Boosting algorithm. Our methodology aims to improve the existing proposals enhancing the performance of the base classifiers by the usage of the evolutionary undersampling approach. Besides, we promote diversity favoring the usage of different subsets of majority class instances to train each base classifier. Centered on two-class highly imbalanced problems, we will prove, supported by the proper statistical analysis, that EUSBoost is able to outperform the state-of-the-art methods based on ensembles. We will also analyze its advantages using kappa-error diagrams, which we adapt to the imbalanced scenario.  相似文献   

5.
"Fuzzy" versus "nonfuzzy" in combining classifiers designed by Boosting   总被引:1,自引:0,他引:1  
Boosting is recognized as one of the most successful techniques for generating classifier ensembles. Typically, the classifier outputs are combined by the weighted majority vote. The purpose of this study is to demonstrate the advantages of some fuzzy combination methods for ensembles of classifiers designed by Boosting. We ran two-fold cross-validation experiments on six benchmark data sets to compare the fuzzy and nonfuzzy combination methods. On the "fuzzy side" we used the fuzzy integral and the decision templates with different similarity measures. On the "nonfuzzy side" we tried the weighted majority vote as well as simple combiners such as the majority vote, minimum, maximum, average, product, and the Naive-Bayes combination. In our experiments, the fuzzy combination methods performed consistently better than the nonfuzzy methods. The weighted majority vote showed a stable performance, though slightly inferior to the performance of the fuzzy combiners.  相似文献   

6.
An efficient approximation of L2 Boosting with component-wise smoothing splines is considered. Smoothing spline base-learners are replaced by P-spline base-learners, which yield similar prediction errors but are more advantageous from a computational point of view. A detailed analysis of the effect of various P-spline hyper-parameters on the boosting fit is given. In addition, a new theoretical result on the relationship between the boosting stopping iteration and the step length factor used for shrinking the boosting estimates is derived.  相似文献   

7.
Training set resampling based ensemble design techniques are successfully used to reduce the classification errors of the base classifiers. Boosting is one of the techniques used for this purpose where each training set is obtained by drawing samples with replacement from the available training set according to a weighted distribution which is modified for each new classifier to be included in the ensemble. The weighted resampling results in a classifier set, each being accurate in different parts of the input space mainly specified the sample weights. In this study, a dynamic integration of boosting based ensembles is proposed so as to take into account the heterogeneity of the input sets. An evidence-theoretic framework is developed for this purpose so as to take into account the weights and distances of the neighboring training samples in both training and testing boosting based ensembles. The effectiveness of the proposed technique is compared to the AdaBoost algorithm using three different base classifiers.  相似文献   

8.
列车精确停车是实现轨道交通自动控制系统的关键技术之一。传统的精确停车技术需要依赖于复杂的物理模型及昂贵的传感设备,且难以达到较高的精度。从数据本身出发,利用机器学习中高斯过程回归和Boosting回归算法对列车精确停车问题进行了研究,并与线性回归方法进行了比较,实验表明,机器学习的方法对于解决列车精确停车问题是行之有效的。其中以高斯过程回归的性能最优,而基于梯度的Boosting回归方法在缺乏先验知识的条件下达到接近高斯过程回归的性能,在实际应用中具有更大的灵活性和适应性。  相似文献   

9.
As the use of meta-models to replace computationally-intensive simulations for estimating real system behaviors increases, there is an increasing need to select appropriate meta-models that well represent real system behaviors. Since in most cases designers do not know the behavior of the real system a priori, however, they often have trouble selecting a suitable meta-model. In order to provide robust prediction performance, ensembles of meta-models have been developed which linearly combines stand-alone meta-models. In this study, we propose a new pointwise ensemble of meta-models whose weights vary according to the prediction point of interest. The suggested method can include all kinds of stand-alone meta-models for ensemble construction, and can interpolate real system response values at training points, even if regression models are included as stand-alone meta-models. To evaluate the effectiveness of the proposed method, its prediction performance is compared with those of existing ensembles of meta-models using well-known mathematical functions. The results show that our pointwise ensemble of meta-models provides more robust and accurate predictions than existing models for a majority of test problems.  相似文献   

10.
利用充分降维的思想对L2Boosting算法进行改进,提出基于局部相关性的L2Boosting(LCBoosting)算法。在每次迭代中,该算法根据响应变量与协变量的局部相关性充分提取信息,得到响应变量的线性组合来参与Boosting迭代,无须逐个分析所有变量。模拟结果表明,与L2Boosting算法相比,LCBoosting算法收敛速度快、预测精度高。  相似文献   

11.
With the recent financial crisis and European debt crisis, corporate bankruptcy prediction has become an increasingly important issue for financial institutions. Many statistical and intelligent methods have been proposed, however, there is no overall best method has been used in predicting corporate bankruptcy. Recent studies suggest ensemble learning methods may have potential applicability in corporate bankruptcy prediction. In this paper, a new and improved Boosting, FS-Boosting, is proposed to predict corporate bankruptcy. Through injecting feature selection strategy into Boosting, FS-Booting can get better performance as base learners in FS-Boosting could get more accuracy and diversity. For the testing and illustration purposes, two real world bankruptcy datasets were selected to demonstrate the effectiveness and feasibility of FS-Boosting. Experimental results reveal that FS-Boosting could be used as an alternative method for the corporate bankruptcy prediction.  相似文献   

12.
Boosting algorithms are a class of general methods used to improve the general performance of regression analysis. The main idea is to maintain a distribution over the train set. In order to use the given distribution directly, a modified PLS algorithm is proposed and used as the base learner to deal with the nonlinear multivariate regression problems. Experiments on gasoline octane number prediction demonstrate that boosting the modified PLS algorithm has better general performance over the PLS algorithm.  相似文献   

13.
Boosting algorithms are a class of general methods used to improve the general performance of regression analysis. The main idea is to maintain a distribution over the train set. In order to use the given distribution directly, a modified PLS algorithm is proposed and used as the base learner to deal with the nonlinear multivariate regression problems. Experiments on gasoline octane number prediction demonstrate that boosting the modified PLS algorithm has better general performance over the PLS algorithm.  相似文献   

14.
The ability to accurately predict business failure is a very important issue in financial decision-making. Incorrect decision-making in financial institutions is very likely to cause financial crises and distress. Bankruptcy prediction and credit scoring are two important problems facing financial decision support. As many related studies develop financial distress models by some machine learning techniques, more advanced machine learning techniques, such as classifier ensembles and hybrid classifiers, have not been fully assessed. The aim of this paper is to develop a novel hybrid financial distress model based on combining the clustering technique and classifier ensembles. In addition, single baseline classifiers, hybrid classifiers, and classifier ensembles are developed for comparisons. In particular, two clustering techniques, Self-Organizing Maps (SOMs) and k-means and three classification techniques, logistic regression, multilayer-perceptron (MLP) neural network, and decision trees, are used to develop these four different types of bankruptcy prediction models. As a result, 21 different models are compared in terms of average prediction accuracy and Type I & II errors. By using five related datasets, combining Self-Organizing Maps (SOMs) with MLP classifier ensembles performs the best, which provides higher predication accuracy and lower Type I & II errors.  相似文献   

15.
重采样方法与机器学习   总被引:5,自引:0,他引:5  
Boosting算法试图用弱学习器的线性组合逼近复杂的自然模型,以其优秀的可解释性和预测能力,得到了计算机界的高度关注.但Boosting只被看作是一种特定损失下的优化问题,其统计学本质未曾得到充分的关注.作者追根溯源,提出从统计学角度看待Boosting 方法:在统计学框架下,Boosting算法仅仅是重采样方法的一个有趣的特例.作者希望改变计算机科学家只重视算法性能忽略数据性质的现状,以期找到更适合解决"高维海量不可控数据"问题的方法.  相似文献   

16.
Boosting家族AdaBoost系列代表算法   总被引:13,自引:0,他引:13  
1 引言 Boosting由Freund和Schapire于1990年提出,是提高预测学习系统预测能力的有效工具,也是组合学习中最具代表性的方法。其代表算法可分为Boost-by-majority和AdaBoost两个系列。Boosting操纵训练例子以产生多个假设,从而建立通过投票结合的预测器集合。AdaBoost在训练例子上维护一套概率分布,在每一回迭代中AdaBoost在每个例子上调整这种分布,成员分类器在训练例子上的错误率被计算出来并以此在训练例子上调整概率分布。权重改变的作用是在被误分的例子上放置更多的权重,在分类正确的例子上减  相似文献   

17.
1 引言 Boosting由Freund和Schapire于1990年提出,是提高预测学习系统预测能力的有效工具,也是组合学习中最具代表性的方法,其代表算法可分为Boost-by-majority和AdaBoost两个系列。Boosting操纵训练例子以产生多个假设。从而建立通过投票结合的预测器集合。Boosting在训练例子上维护一套概率分布。Boost-by-majority通过在每一回迭  相似文献   

18.
软件缺陷预测可以有效提高软件的可靠性,修复系统存在的漏洞。Boosting重抽样是解决软件缺陷预测样本数量不足问题的常用方法,但常规Boosting方法在处理领域类不平衡问题时效果不佳。为此,提出一种代价敏感的Boosting软件缺陷预测方法CSBst。针对缺陷模块漏报和误报代价不同的问题,利用代价敏感的Boosting方法更新样本权重,增大产生第一类错误的样本权重,使之大于无缺陷类样本权重与第二类错误样本的权重,从而提高模块的预测率。采用阈值移动方法对多个决策树基分类器的分类结果进行集成,以解决过拟合问题。在此基础上,通过分析给出模型构建过程中权重和阈值的最优化设置。在NASA软件缺陷预测数据集上进行实验,结果表明,在小样本的情况下,与CSBKNN、CSCE方法相比,CSBst方法的BAL预测指标分别提升7%和3%,且时间复杂度降低一个数量级。  相似文献   

19.
Accurate prediction of electricity consumption is essential for providing actionable insights to decision-makers for managing volume and potential trends in future energy consumption for efficient resource management. A single model might not be sufficient to solve the challenges that result from linear and non-linear problems that occur in electricity consumption prediction. Moreover, these models cannot be applied in practice because they are either not interpretable or poorly generalized. In this paper, a stacking ensemble model for short-term electricity consumption is proposed. We experimented with machine learning and deep models like Random Forests, Long Short Term Memory, Deep Neural Networks, and Evolutionary Trees as our base models. Based on the experimental observations, two different ensemble models are proposed, where the predictions of the base models are combined using Gradient Boosting and Extreme Gradient Boosting (XGB). The proposed ensemble models were tested on a standard dataset that contains around 500,000 electricity consumption values, measured at periodic intervals, over the span of 9 years. Experimental validation revealed that the proposed ensemble model built on XGB reduces the training time of the second layer of the ensemble by a factor of close to 10 compared to the state-of-the-art , and also is more accurate. An average reduction of approximately 39% was observed in the Root mean square error.  相似文献   

20.
AdaBoost算法研究进展与展望   总被引:21,自引:0,他引:21  
AdaBoost是最优秀的Boosting算法之一, 有着坚实的理论基础, 在实践中得到了很好的推广和应用. 算法能够将比随机猜测略好的弱分类器提升为分类精度高的强分类器, 为学习算法的设计提供了新的思想和新的方法. 本文首先介绍Boosting猜想提出以及被证实的过程, 在此基础上, 引出AdaBoost算法的起源与最初设计思想;接着, 介绍AdaBoost算法训练误差与泛化误差分析方法, 解释了算法能够提高学习精度的原因;然后, 分析了AdaBoost算法的不同理论分析模型, 以及从这些模型衍生出的变种算法;之后, 介绍AdaBoost算法从二分类到多分类的推广. 同时, 介绍了AdaBoost及其变种算法在实际问题中的应用情况. 本文围绕AdaBoost及其变种算法来介绍在集成学习中有着重要地位的Boosting理论, 探讨Boosting理论研究的发展过程以及未来的研究方向, 为相关研究人员提供一些有用的线索. 最后,对今后研究进行了展望, 对于推导更紧致的泛化误差界、多分类问题中的弱分类器条件、更适合多分类问题的损失函数、 更精确的迭代停止条件、提高算法抗噪声能力以及从子分类器的多样性角度优化AdaBoost算法等问题值得进一步深入与完善.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号