首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
相比于集成学习,集成剪枝方法是在多个分类器中搜索最优子集从而改善分类器的泛化性能,简化集成过程。帕累托集成剪枝方法同时考虑了分类器的精准度及集成规模两个方面,并将二者均作为优化的目标。然而帕累托集成剪枝算法只考虑了基分类器的精准度与集成规模,忽视了分类器之间的差异性,从而导致了分类器之间的相似度比较大。本文提出了融入差异性的帕累托集成剪枝算法,该算法将分类器的差异性与精准度综合为第1个优化目标,将集成规模作为第2个优化目标,从而实现多目标优化。实验表明,当该改进的集成剪枝算法与帕累托集成剪枝算法在集成规模相当的前提下,由于差异性的融入该改进算法能够获得较好的性能。  相似文献   

2.
AdaBoost is a highly effective ensemble learning method that combines several weak learners to produce a strong committee with higher accuracy. However, similar to other ensemble methods, AdaBoost uses a large number of base learners to produce the final outcome while addressing high-dimensional data. Thus, it poses a critical challenge in the form of high memory-space consumption. Feature selection methods can significantly reduce dimensionality in regression and have been established to be applicable in ensemble pruning. By pruning the ensemble, it is possible to generate a simpler ensemble with fewer base learners but a higher accuracy. In this article, we propose the minimax concave penalty (MCP) function to prune an AdaBoost ensemble to simplify the model and improve its accuracy simultaneously. The MCP penalty function is compared with LASSO and SCAD in terms of performance in pruning the ensemble. Experiments performed on real datasets demonstrate that MCP-pruning outperforms the other two methods. It can reduce the ensemble size effectively, and generate marginally more accurate predictions than the unpruned AdaBoost model.  相似文献   

3.
主要目的是寻找到一种Bagging的快速修剪方法,以缩小算法占用的存储空间、提高运算的速度和实现提高分类精度的潜力.传统的选择性集成方法研究的重点是基学习器之间的差异化,从同质化的角度采研究这一问题,提出了一种全新的选择性集成思路.通过选择基学习器集合中的最差者来对Bagging集成进行快速层次修剪,获得了一种学习速度接近Bagging性能在其基础上得到提高的新算法.新算法的训练时间明显小于GASEN而性能与其相近.该算法同时还保留了与Bagging相同的并行处理能力.  相似文献   

4.
为了提高分类器集成性能,提出了一种基于聚类算法与排序修剪结合的分类器集成方法。首先将混淆矩阵作为量化基分类器间差异度的工具,通过聚类将分类器划分为若干子集;然后提出一种排序修剪算法,以距离聚类中心最近的分类器为起点,根据分类器的距离对差异度矩阵动态加权,以加权差异度作为排序标准对子集中的分类器进行按比例修剪;最后使用投票法对选出的基分类器进行集成。同时与多种集成方法在UCI数据库中的10组数据集上进行对比与分析,实验结果表明基于聚类与排序修剪的分类器选择方法有效提升了集成系统的分类能力。  相似文献   

5.
Ensemble pruning deals with the reduction of base classifiers prior to combination in order to improve generalization and prediction efficiency. Existing ensemble pruning algorithms require much pruning time. This paper presents a fast pruning approach: pattern mining based ensemble pruning (PMEP). In this algorithm, the prediction results of all base classifiers are organized as a transaction database, and FP-Tree structure is used to compact the prediction results. Then a greedy pattern mining method is explored to find the ensemble of size k. After obtaining the ensembles of all possible sizes, the one with the best accuracy is outputted. Compared with Bagging, GASEN, and Forward Selection, experimental results show that PMEP achieves the best prediction accuracy and keeps the size of the final ensemble small, more importantly, its pruning time is much less than other ensemble pruning algorithms.  相似文献   

6.
烧结终点位置(BTP)是烧结过程至关重要的参数,直接决定着最终烧结矿的质量.由于BTP难以直接在线检测,因此,通过智能学习建模来实现BTP的在线预测并在此基础上进行操作参数调节对提高烧结矿质量具有重要意义.针对这一实际工程问题,首先提出一种基于遗传优化的Wrapper特征选择方法,可选取使后续预测建模性能最优的特征组合;在此基础上,为了解决单一学习器容易过拟合的问题,提出了基于随机权神经网络(RVFLNs)的稀疏表示剪枝(SRP)集成建模算法,即SRP-ERVFLNs算法.所提算法采用建模速度快、泛化性能好的RVFLNs作为个体基学习器,采用对基学习器基函数与隐层节点数等参数进行扰动的方式来增加集成学习子模型间的差异性;同时,为了进一步提高集成模型的泛化性能与计算效率,引入稀疏表示剪枝算法,实现对集成模型的高效剪枝;最后,将所提算法用于烧结过程BTP的预测建模.工业数据实验表明,所提方法相比于其他方法具有更好的预测精度、泛化性能和计算效率.  相似文献   

7.
在集成学习中使用平均法、投票法作为结合策略无法充分利用基分类器的有效信息,且根据波动性设置基分类器的权重不精确、不恰当。以上问题会降低集成学习的效果,为了进一步提高集成学习的性能,提出将证据推理(evidence reasoning, ER)规则作为结合策略,并使用多样性赋权法设置基分类器的权重。首先,由多个深度学习模型作为基分类器、ER规则作为结合策略,构建集成学习的基本结构;然后,通过多样性度量方法计算每个基分类器相对于其他基分类器的差异性;最后,将差异性归一化实现基分类器的权重设置。通过多个图像数据集的分类实验,结果表明提出的方法较实验选取的其他方法准确率更高且更稳定,证明了该方法可以充分利用基分类器的有效信息,且多样性赋权法更精确。  相似文献   

8.
The One-vs-One strategy is among the most used techniques to deal with multi-class problems in Machine Learning. This way, any binary classifier can be used to address the original problem, since one classifier is learned for each possible pair of classes. As in every ensemble method, classifier combination becomes a vital step in the classification process. Even though many combination models have been developed in the literature, none of them have dealt with the possibility of reducing the number of generated classifiers after the training phase, i.e., ensemble pruning, since every classifier is supposed to be necessary.On this account, our objective in this paper is two-fold: (1) We propose a transformation of the aggregation step, which lead us to a new combination strategy where instances are classified on the basis of the similarities among score-matrices. (2) This fact allows us to introduce the possibility of reducing the number of binary classifiers without affecting the final accuracy. We will show that around 50% of classifiers can be removed (depending on the base learner and the specific problem) and that the confidence degrees obtained by these base classifiers have a strong influence on the improvement in the final accuracy.A thorough experimental study is carried out in order to show the behavior of the proposed approach in comparison with the state-of-the-art combination models in the One-vs-One strategy. Different classifiers from various Machine Learning paradigms are considered as base classifiers and the results obtained are contrasted with the proper statistical analysis.  相似文献   

9.
During the last few years there has been marked attention towards hybrid and ensemble systems development, having proved their ability to be more accurate than single classifier models. However, among the hybrid and ensemble models developed in the literature there has been little consideration given to: 1) combining data filtering and feature selection methods 2) combining classifiers of different algorithms; and 3) exploring different classifier output combination techniques other than the traditional ones found in the literature. In this paper, the aim is to improve predictive performance by presenting a new hybrid ensemble credit scoring model through the combination of two data pre-processing methods based on Gabriel Neighbourhood Graph editing (GNG) and Multivariate Adaptive Regression Splines (MARS) in the hybrid modelling phase. In addition, a new classifier combination rule based on the consensus approach (ConsA) of different classification algorithms during the ensemble modelling phase is proposed. Several comparisons will be carried out in this paper, as follows: 1) Comparison of individual base classifiers with the GNG and MARS methods applied separately and combined in order to choose the best results for the ensemble modelling phase; 2) Comparison of the proposed approach with all the base classifiers and ensemble classifiers with the traditional combination methods; and 3) Comparison of the proposed approach with recent related studies in the literature. Five of the well-known base classifiers are used, namely, neural networks (NN), support vector machines (SVM), random forests (RF), decision trees (DT), and naïve Bayes (NB). The experimental results, analysis and statistical tests prove the ability of the proposed approach to improve prediction performance against all the base classifiers, hybrid and the traditional combination methods in terms of average accuracy, the area under the curve (AUC) H-measure and the Brier Score. The model was validated over seven real world credit datasets.  相似文献   

10.
An ensemble is a group of learners that work together as a committee to solve a problem. The existing ensemble learning algorithms often generate unnecessarily large ensembles, which consume extra computational resource and may degrade the generalization performance. Ensemble pruning algorithms aim to find a good subset of ensemble members to constitute a small ensemble, which saves the computational resource and performs as well as, or better than, the unpruned ensemble. This paper introduces a probabilistic ensemble pruning algorithm by choosing a set of “sparse” combination weights, most of which are zeros, to prune the ensemble. In order to obtain the set of sparse combination weights and satisfy the nonnegative constraint of the combination weights, a left-truncated, nonnegative, Gaussian prior is adopted over every combination weight. Expectation propagation (EP) algorithm is employed to approximate the posterior estimation of the weight vector. The leave-one-out (LOO) error can be obtained as a by-product in the training of EP without extra computation and is a good indication for the generalization error. Therefore, the LOO error is used together with the Bayesian evidence for model selection in this algorithm. An empirical study on several regression and classification benchmark data sets shows that our algorithm utilizes far less component learners but performs as well as, or better than, the unpruned ensemble. Our results are very competitive compared with other ensemble pruning algorithms.  相似文献   

11.
Learning from imperfect (noisy) information sources is a challenging and reality issue for many data mining applications. Common practices include data quality enhancement by applying data preprocessing techniques or employing robust learning algorithms to avoid developing overly complicated structures that overfit the noise. The essential goal is to reduce noise impact and eventually enhance the learners built from noise-corrupted data. In this paper, we propose a novel corrective classification (C2) design, which incorporates data cleansing, error correction, Bootstrap sampling and classifier ensembling for effective learning from noisy data sources. C2 differs from existing classifier ensembling or robust learning algorithms in two aspects. On one hand, a set of diverse base learners of C2 constituting the ensemble are constructed via a Bootstrap sampling process; on the other hand, C2 further improves each base learner by unifying error detection, correction and data cleansing to reduce noise impact. Being corrective, the classifier ensemble is built from data preprocessed/corrected by the data cleansing and correcting modules. Experimental comparisons demonstrate that C2 is not only more accurate than the learner built from original noisy sources, but also more reliable than Bagging [4] or aggressive classifier ensemble (ACE) [56], which are two degenerated components/variants of C2. The comparisons also indicate that C2 is more stable than Boosting and DECORATE, which are two state-of-the-art ensembling methods. For real-world imperfect information sources (i.e. noisy training and/or test data), C2 is able to deliver more accurate and reliable prediction models than its other peers can offer.  相似文献   

12.
在集成分类中,如何对基分类器实现动态更新和为基分类器分配合适的权值一直是研究的重点。针对以上两点,提出了BIE和BIWE算法。BIE算法通过最新训练的基分类器的准确率确定集成是否需要替换性能较差的基分类器及需替换的个数,实现对集成分类器的动态迭代更新;BIWE算法在此基础上提出了一个加权函数,对具有不同参数特征的数据流可以有针对性地获得基分类器的最佳权值,从而提升集成分类器的整体性能。实验结果表明,BIE算法相较对比算法在准确率持平或略高的情况下,可以减少生成树的叶子数、节点数和树的深度;BIWE算法相较对比算法不仅准确率较高,而且能大幅度减少生成树的规模。  相似文献   

13.
集成分类通过将若干个弱分类器依据某种规则进行组合,能有效改善分类性能。在组合过程中,各个弱分类器对分类结果的重要程度往往不一样。极限学习机是最近提出的一个新的训练单隐层前馈神经网络的学习算法。以极限学习机为基分类器,提出了一个基于差分进化的极限学习机加权集成方法。提出的方法通过差分进化算法来优化集成方法中各个基分类器的权值。实验结果表明,该方法与基于简单投票集成方法和基于Adaboost集成方法相比,具有较高的分类准确性和较好的泛化能力。  相似文献   

14.
章宁  陈钦 《计算机应用》2019,39(4):935-939
针对借贷过程中的信息不对称问题,为更有效地整合不同的数据源和贷款违约预测模型,提出一种集成学习的训练方法,使用AUC(Area Under Curve)值和Q统计值对学习器的准确性和多样性进行度量,并实现了基于AUC和Q统计值的集成学习训练算法(TABAQ)。基于个人对个(P2P)贷款数据进行实证分析,发现集成学习的效果与基学习器的准确性和多样性关系密切,而与所集成的基学习器数量相关性较低,并且各种集成学习方法中统计集成表现最好。实验还发现,通过融合借款人端和投资人端的信息,可以有效地降低贷款违约预测中的信息不对称性。TABAQ能有效发挥数据源融合和学习器集成两方面的优势,在保持预测准确性稳步提升的同时,预测的一类错误数量更是进一步下降了4.85%。  相似文献   

15.
理论及实验表明,在训练集上具有较大边界分布的组合分类器泛化能力较强。文中将边界概念引入到组合剪枝中,并用它指导组合剪枝方法的设计。基于此,构造一个度量标准(MBM)用于评估基分类器相对于组合分类器的重要性,进而提出一种贪心组合选择方法(MBMEP)以降低组合分类器规模并提高它的分类准确率。在随机选择的30个UCI数据集上的实验表明,与其它一些高级的贪心组合选择算法相比,MBMEP选择出的子组合分类器具有更好的泛化能力。  相似文献   

16.
In this work, we formalise and evaluate an ensemble of classifiers that is designed for the resolution of multi-class problems. To achieve a good accuracy rate, the base learners are built with pairwise coupled binary and multi-class classifiers. Moreover, to reduce the computational cost of the ensemble and to improve its performance, these classifiers are trained using a specific attribute subset. This proposal offers the opportunity to capture the advantages provided by binary decomposition methods, by attribute partitioning methods, and by cooperative characteristics associated with a combination of redundant base learners. To analyse the quality of this architecture, its performance has been tested on different domains, and the results have been compared to other well-known classification methods. This experimental evaluation indicates that our model is, in most cases, as accurate as these methods, but it is much more efficient.  相似文献   

17.
在动态的数据流中,由于其不稳定性以及存在概念漂移等问题,集成分类模型需要有及时适应新环境的能力.目前通常使用监督信息对基分类器的权重进行更新,以此来赋予符合当前环境的基分类器更高的权重,然而监督信息在真实数据流环境下无法立即获得.为了解决这个问题,文中提出了一种基于信息熵更新基分类器权重的数据流集成分类算法.首先使用随...  相似文献   

18.
This paper presents a method for improved ensemble learning, by treating the optimization of an ensemble of classifiers as a compressed sensing problem. Ensemble learning methods improve the performance of a learned predictor by integrating a weighted combination of multiple predictive models. Ideally, the number of models needed in the ensemble should be minimized, while optimizing the weights associated with each included model. We solve this problem by treating it as an example of the compressed sensing problem, in which a sparse solution must be reconstructed from an under-determined linear system. Compressed sensing techniques are then employed to find an ensemble which is both small and effective. An additional contribution of this paper, is to present a new performance evaluation method (a new pairwise diversity measurement) called the roulette-wheel kappa-error. This method takes into account the different weightings of the classifiers, and also reduces the total number of pairs of classifiers needed in the kappa-error diagram, by selecting pairs through a roulette-wheel selection method according to the weightings of the classifiers. This approach can greatly improve the clarity and informativeness of the kappa-error diagram, especially when the number of classifiers in the ensemble is large. We use 25 different public data sets to evaluate and compare the performance of compressed sensing ensembles using four different sparse reconstruction algorithms, combined with two different classifier learning algorithms and two different training data manipulation techniques. We also give the comparison experiments of our method against another five state-of-the-art pruning methods. These experiments show that our method produces comparable or better accuracy, while being significantly faster than the compared methods.  相似文献   

19.
Financial distress prediction methods based on combination classifier become a rising trend in this field. This paper applies Choquet integral to ensemble single classifiers and proposes a Choquet integral-based combination classifier for financial distress early warning. Also, as the conditions between training and pattern recognition cannot be completely consistent, so this paper proposes an adaptive fuzzy measure by using the dynamic information in the single classifier pattern recognition results which is more reasonable than the static prior fuzzy density. Finally, a comparative analysis based on Chinese listed companies’ real data is conducted to verify prediction accuracy and stability of the combination classifier. The experiment results indicate that financial distress prediction using Choquet integral-based combination classifier has higher average accuracy and stability than single classifiers.  相似文献   

20.
Prediction of wind speed can provide a reference for the reliable utilization of wind energy. This study focuses on 1-hour, 1-step ahead deterministic wind speed prediction with only wind speed as input. To consider the time-varying characteristics of wind speed series, a dynamic ensemble wind speed prediction model based on deep reinforcement learning is proposed. It includes ensemble learning, multi-objective optimization, and deep reinforcement learning to ensure effectiveness. In part A, deep echo state network enhanced by real-time wavelet packet decomposition is used to construct base models with different vanishing moments. The variety of vanishing moments naturally guarantees the diversity of base models. In part B, multi-objective optimization is adopted to determine the combination weights of base models. The bias and variance of ensemble model are synchronously minimized to improve generalization ability. In part C, the non-dominated solutions of combination weights are embedded into a deep reinforcement learning environment to achieve dynamic selection. By reasonably designing the reinforcement learning environment, it can dynamically select non-dominated solution in each prediction according to the time-varying characteristics of wind speed. Four actual wind speed series are used to validate the proposed dynamic ensemble model. The results show that: (a) The proposed dynamic ensemble model is competitive for wind speed prediction. It significantly outperforms five classic intelligent prediction models and six ensemble methods; (b) Every part of the proposed model is indispensable to improve the prediction accuracy.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号