首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
One of the most widely used approaches to the class-imbalanced issue is ensemble learning. The base classifier is trained using an unbalanced training set in the conventional ensemble learning approach. We are unable to select the best suitable resampling method or base classifier for the training set, despite the fact that researchers have examined employing resampling strategies to balance the training set. A multi-armed bandit heterogeneous ensemble framework was developed as a solution to these issues. This framework employs the multi-armed bandit technique to pick the best base classifier and resampling techniques to build a heterogeneous ensemble model. To obtain training sets, we first employ the bagging technique. Then, we use the instances from the out-of-bag set as the validation set. In general, we consider the basic classifier combination with the highest validation set score to be the best model on the bagging subset and add it to the pool of model. The classification performance of the multi-armed bandit heterogeneous ensemble model is then assessed using 30 real-world imbalanced data sets that were gathered from UCI, KEEL, and HDDT. The experimental results demonstrate that, under the two assessment metrics of AUC and Kappa, the proposed heterogeneous ensemble model performs competitively with other nine state-of-the-art ensemble learning methods. At the same time, the findings of the experiment are confirmed by the statistical findings of the Friedman test and Holm's post-hoc test.  相似文献   

2.
Bagging and boosting negatively correlated neural networks.   总被引:2,自引:0,他引:2  
In this paper, we propose two cooperative ensemble learning algorithms, i.e., NegBagg and NegBoost, for designing neural network (NN) ensembles. The proposed algorithms incrementally train different individual NNs in an ensemble using the negative correlation learning algorithm. Bagging and boosting algorithms are used in NegBagg and NegBoost, respectively, to create different training sets for different NNs in the ensemble. The idea behind using negative correlation learning in conjunction with the bagging/boosting algorithm is to facilitate interaction and cooperation among NNs during their training. Both NegBagg and NegBoost use a constructive approach to automatically determine the number of hidden neurons for NNs. NegBoost also uses the constructive approach to automatically determine the number of NNs for the ensemble. The two algorithms have been tested on a number of benchmark problems in machine learning and NNs, including Australian credit card assessment, breast cancer, diabetes, glass, heart disease, letter recognition, satellite, soybean, and waveform problems. The experimental results show that NegBagg and NegBoost require a small number of training epochs to produce compact NN ensembles with good generalization.  相似文献   

3.
In the class imbalanced learning scenario, traditional machine learning algorithms focusing on optimizing the overall accuracy tend to achieve poor classification performance especially for the minority class in which we are most interested. To solve this problem, many effective approaches have been proposed. Among them, the bagging ensemble methods with integration of the under-sampling techniques have demonstrated better performance than some other ones including the bagging ensemble methods integrated with the over-sampling techniques, the cost-sensitive methods, etc. Although these under-sampling techniques promote the diversity among the generated base classifiers with the help of random partition or sampling for the majority class, they do not take any measure to ensure the individual classification performance, consequently affecting the achievability of better ensemble performance. On the other hand, evolutionary under-sampling EUS as a novel undersampling technique has been successfully applied in searching for the best majority class subset for training a good-performance nearest neighbor classifier. Inspired by EUS, in this paper, we try to introduce it into the under-sampling bagging framework and propose an EUS based bagging ensemble method EUS-Bag by designing a new fitness function considering three factors to make EUS better suited to the framework. With our fitness function, EUS-Bag could generate a set of accurate and diverse base classifiers. To verify the effectiveness of EUS-Bag, we conduct a series of comparison experiments on 22 two-class imbalanced classification problems. Experimental results measured using recall, geometric mean and AUC all demonstrate its superior performance.  相似文献   

4.
Observational Learning Algorithm for an Ensemble of Neural Networks   总被引:3,自引:0,他引:3  
We propose Observational Learning Algorithm (OLA), an ensemble learning algorithm with T and O steps alternating. In the T-step, an ensemble of networks is trained with a training data set. In the O-step, ‘virtual’ data are generated in which each target pattern is determined by observing the member networks’ output for the input pattern. These virtual data are added to the training data and the two steps are repeatedly executed. The virtual data was found to play the role of a regularisation term as well as that of temporary hints having the auxiliary information regarding the target function extracted from the ensemble. From numerical experiments involving both regression and classification problems, the OLA was shown to provide better generalisation performance than simple committee, boosting and bagging approaches, when insufficient and noisy training data are given. We examined the characteristics of the OLA in terms of ensemble diversity and robustness to noise variance. The OLA was found to balance between ensemble diversity and the average error of individual networks, and to be robust to the variance of noise distribution. Also, OLA was applied to five real world problems from the UCI repository, and its performance was compared with bagging and boosting methods. Received: 15 November 2000, Received in revised form: 07 November 2001, Accepted: 13 November 2001  相似文献   

5.
Classifier Ensembles with a Random Linear Oracle   总被引:1,自引:0,他引:1  
We propose a combined fusion-selection approach to classifier ensemble design. Each classifier in the ensemble is replaced by a miniensemble of a pair of subclassifiers with a random linear oracle to choose between the two. It is argued that this approach encourages extra diversity in the ensemble while allowing for high accuracy of the individual ensemble members. Experiments were carried out with 35 data sets from UCI and 11 ensemble models. Each ensemble model was examined with and without the oracle. The results showed that all ensemble methods benefited from the new approach, most markedly so random subspace and bagging. A further experiment with seven real medical data sets demonstrates the validity of these findings outside the UCI data collection  相似文献   

6.
Constructing support vector machine ensemble   总被引:30,自引:0,他引:30  
Hyun-Chul  Shaoning  Hong-Mo  Daijin  Sung 《Pattern recognition》2003,36(12):2757-2767
Even the support vector machine (SVM) has been proposed to provide a good generalization performance, the classification result of the practically implemented SVM is often far from the theoretically expected level because their implementations are based on the approximated algorithms due to the high complexity of time and space. To improve the limited classification performance of the real SVM, we propose to use the SVM ensemble with bagging (bootstrap aggregating) or boosting. In bagging, each individual SVM is trained independently using the randomly chosen training samples via a bootstrap technique. In boosting, each individual SVM is trained using the training samples chosen according to the sample's probability distribution that is updated in proportional to the errorness of the sample. In both bagging and boosting, the trained individual SVMs are aggregated to make a collective decision in several ways such as the majority voting, least-squares estimation-based weighting, and the double-layer hierarchical combining. Various simulation results for the IRIS data classification and the hand-written digit recognition, and the fraud detection show that the proposed SVM ensemble with bagging or boosting outperforms a single SVM in terms of classification accuracy greatly.  相似文献   

7.
A comparison of decision tree ensemble creation techniques   总被引:3,自引:0,他引:3  
We experimentally evaluate bagging and seven other randomization-based approaches to creating an ensemble of decision tree classifiers. Statistical tests were performed on experimental results from 57 publicly available data sets. When cross-validation comparisons were tested for statistical significance, the best method was statistically more accurate than bagging on only eight of the 57 data sets. Alternatively, examining the average ranks of the algorithms across the group of data sets, we find that boosting, random forests, and randomized trees are statistically significantly better than bagging. Because our results suggest that using an appropriate ensemble size is important, we introduce an algorithm that decides when a sufficient number of classifiers has been created for an ensemble. Our algorithm uses the out-of-bag error estimate, and is shown to result in an accurate ensemble for those methods that incorporate bagging into the construction of the ensemble  相似文献   

8.
The ensemble method is a powerful data mining paradigm, which builds a classification model by integrating multiple diversified component learners. Bagging is one of the most successful ensemble methods. It is made of bootstrap-inspired classifiers and uses these classifiers to get an aggregated classifier. However, in bagging, bootstrapped training sets become more and more similar as redundancy is increasing. Besides redundancy, any training set is usually subject to noise. Moreover, the training set might be imbalanced. Thus, each training instance has a different impact on the learning process. This paper explores some properties of the ensemble margin and its use in improving the performance of bagging. We introduce a new approach to measure the importance of training data in learning, based on the margin theory. Then, a new bagging method concentrating on critical instances is proposed. This method is more accurate than bagging and more robust than boosting. Compared to bagging, it reduces the bias while generally keeping the same variance. Our findings suggest that (a) examples with low margins tend to be more critical for the classifier performance; (b) examples with higher margins tend to be more redundant; (c) misclassified examples with high margins tend to be noisy examples. Our experimental results on 15 various data sets show that the generalization error of bagging can be reduced up to 2.5% and its resilience to noise strengthened by iteratively removing both typical and noisy training instances, reducing the training set size by up to 75%.  相似文献   

9.
Ensemble methods have proven to be highly effective in improving the performance of base learners under most circumstances. In this paper, we propose a new algorithm that combines the merits of some existing techniques, namely, bagging, arcing, and stacking. The basic structure of the algorithm resembles bagging. However, the misclassification cost of each training point is repeatedly adjusted according to its observed out-of-bag vote margin. In this way, the method gains the advantage of arcing-building the classifier the ensemble needs - without fixating on potentially noisy points. Computational experiments show that this algorithm performs consistently better than bagging and arcing with linear and nonlinear base classifiers. In view of the characteristics of bacing, a hybrid ensemble learning strategy, which combines bagging and different versions of bacing, is proposed and studied empirically.  相似文献   

10.
集成学习被广泛用于提高分类精度, 近年来的研究表明, 通过多模态扰乱策略来构建集成分类器可以进一步提高分类性能. 本文提出了一种基于近似约简与最优采样的集成剪枝算法(EPA_AO). 在EPA_AO中, 我们设计了一种多模态扰乱策略来构建不同的个体分类器. 该扰乱策略可以同时扰乱属性空间和训练集, 从而增加了个体分类器的多样性. 我们利用证据KNN (K-近邻)算法来训练个体分类器, 并在多个UCI数据集上比较了EPA_AO与现有同类型算法的性能. 实验结果表明, EPA_AO是一种有效的集成学习方法.  相似文献   

11.
Ensemble learning strategies, especially boosting and bagging decision trees, have demonstrated impressive capacities to improve the prediction accuracy of base learning algorithms. Further gains have been demonstrated by strategies that combine simple ensemble formation approaches. We investigate the hypothesis that the improvement in accuracy of multistrategy approaches to ensemble learning is due to an increase in the diversity of ensemble members that are formed. In addition, guided by this hypothesis, we develop three new multistrategy ensemble learning techniques. Experimental results in a wide variety of natural domains suggest that these multistrategy ensemble learning techniques are, on average, more accurate than their component ensemble learning techniques.  相似文献   

12.
Classification performance of an ensemble method can be deciphered by studying the bias and variance contribution to its classification error. Statistically, the bias and variance of a single classifier is controlled by the size of the training set and the complexity of the classifier. It has been both theoretically and empirically established that the classification performance (hence bias and variance) of a single classifier can be improved partially by using a suitable ensemble method of the classifier and resampling the original training set. In this paper, we have empirically examined the bias-variance decomposition of three different types of ensemble methods with different training sample sizes consisting of 10% to maximum 63% of the observations from the original training sample. First ensemble is bagging, second one is a boosting type ensemble named adaboost and the last one is a bagging type hybrid ensemble method, called bundling. All the ensembles are trained on training samples constructed with small subsampling ratios (SSR) 0.10, 0.20, 0.30, 0.40, 0.50 and bootstrapping. The experiments are all done on 20 UCI Machine Learning repository datasets and designed to find out the optimal training sample size (smaller than the original training sample) for each ensemble and then find out the optimal ensemble with smaller trianing sets with respect to the bias-variance performance. The bias-variance decomposition of bundling shows that this ensemble method with small subsamples has significantly lower bias and variance than subsampled and bootstrapped version of bagging and adaboost.  相似文献   

13.
An extension of cellular genetic programming for data classification (CGPC) to induce an ensemble of predictors is presented. Two algorithms implementing the bagging and boosting techniques are described and compared with CGPC. The approach is able to deal with large data sets that do not fit in main memory since each classifier is trained on a subset of the overall training data. The predictors are then combined to classify new tuples. Experiments on several data sets show that, by using a training set of reduced size, better classification accuracy can be obtained, but at a much lower computational cost.  相似文献   

14.
A theoretical analysis of bagging as a linear combination of classifiers   总被引:1,自引:0,他引:1  
We apply an analytical framework for the analysis of linearly combined classifiers to ensembles generated by bagging. This provides an analytical model of bagging misclassification probability as a function of the ensemble size, which is a novel result in the literature. Experimental results on real data sets confirm the theoretical predictions. This allows us to derive a novel and theoretically grounded guideline for choosing bagging ensemble size. Furthermore, our results are consistent with explanations of bagging in terms of classifier instability and variance reduction, support the optimality of the simple average over the weighted average combining rule for ensembles generated by bagging, and apply to other randomization-based methods for constructing classifier ensembles. Although our results do not allow to compare bagging misclassification probability with the one of an individual classifier trained on the original training set, we discuss how the considered theoretical framework could be exploited to this aim.  相似文献   

15.
Transfer learning aims to enhance performance in a target domain by exploiting useful information from auxiliary or source domains when the labeled data in the target domain are insufficient or difficult to acquire. In some real-world applications, the data of source domain are provided in advance, but the data of target domain may arrive in a stream fashion. This kind of problem is known as online transfer learning. In practice, there can be several source domains that are related to the target domain. The performance of online transfer learning is highly associated with selected source domains, and simply combining the source domains may lead to unsatisfactory performance. In this paper, we seek to promote classification performance in a target domain by leveraging labeled data from multiple source domains in online setting. To achieve this, we propose a new online transfer learning algorithm that merges and leverages the classifiers of the source and target domain with an ensemble method. The mistake bound of the proposed algorithm is analyzed, and the comprehensive experiments on three real-world data sets illustrate that our algorithm outperforms the compared baseline algorithms.  相似文献   

16.
The problem of model selection to compose a heterogeneous bagging ensemble was addressed in the paper. To solve the problem, three self-adapting genetic algorithms were proposed with different control parameters of mutation, crossover, and selection adjusted during the execution. The algorithms were applied to create heterogeneous ensembles comprising regression fuzzy models to aid in real estate appraisals. The results of experiments revealed that the self-adaptive algorithms converged faster than the classic genetic algorithms. The heterogeneous ensembles created by self-adapting methods showed a very good predictive accuracy when compared with the homogeneous ensembles obtained in earlier research.  相似文献   

17.
软件缺陷预测通过预先识别出被测项目内的潜在缺陷程序模块,可以优化测试资源的分配并提高软件产品的质量。论文对跨项目缺陷预测问题展开了深入研究,在源项目实例选择时,考虑了三种不同的实例相似度计算方法,并发现这些方法的缺陷预测结果存在多样性,因此提出了一种基于Box-Cox转换的集成跨项目软件缺陷预测方法BCEL,具体来说,首先基于不同的实例相似度计算方法,从候选集中选出不同的训练集,随后针对这些数据集,进行针对性的Box-Cox转化,并借助特定分类方法构造出不同的基分类器,最后将这三个基分类器进行有效集成。基于实际项目的数据集,验证了BCEL方法的有效性,并深入分析了BCEL方法内的影响因素对缺陷预测性能的影响。  相似文献   

18.
Ensembles of classifiers that are trained on different parts of the input space provide good results in general. As a popular boosting technique, AdaBoost is an iterative and gradient based deterministic method used for this purpose where an exponential loss function is minimized. Bagging is a random search based ensemble creation technique where the training set of each classifier is arbitrarily selected. In this paper, a genetic algorithm based ensemble creation approach is proposed where both resampled training sets and classifier prototypes evolve so as to maximize the combined accuracy. The objective function based random search procedure of the resultant system guided by both ensemble accuracy and diversity can be considered to share the basic properties of bagging and boosting. Experimental results have shown that the proposed approach provides better combined accuracies using a fewer number of classifiers than AdaBoost.  相似文献   

19.
网络作弊检测是搜索引擎的重要挑战之一,该文提出基于遗传规划的集成学习方法 (简记为GPENL)来检测网络作弊。该方法首先通过欠抽样技术从原训练集中抽样得到t个不同的训练集;然后使用c个不同的分类算法对t个训练集进行训练得到t*c个基分类器;最后利用遗传规划得到t*c个基分类器的集成方式。新方法不仅将欠抽样技术和集成学习融合起来提高非平衡数据集的分类性能,还能方便地集成不同类型的基分类器。在WEBSPAM-UK2006数据集上所做的实验表明无论是同态集成还是异态集成,GPENL均能提高分类的性能,且异态集成比同态集成更加有效;GPENL比AdaBoost、Bagging、RandomForest、多数投票集成、EDKC算法和基于Prediction Spamicity的方法取得更高的F-度量值。  相似文献   

20.
The ensemble of evolving neural networks, which employs neural networks and genetic algorithms, is developed for classification problems in data mining. This network meets data mining requirements such as smart architecture, user interaction, and performance. The evolving neural network has a smart architecture in that it is able to select inputs from the environment and controls its topology. A built-in objective function of the network offers user interaction for customized classification. The bagging technique, which uses a portion of the training set in multiple networks, is applied to the ensemble of evolving neural networks in order to improve classification performance. The ensemble of evolving neural networks is tested by various data sets and produces better performance than both classical neural networks and simple ensemble methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号