首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
    
Two new methods for tree ensemble construction are presented: G-Forest and GAR-Forest. In a similar way to Random Forest, the tree construction process entails a degree of randomness.The same strategy used in the GRASP metaheuristic for generating random and adaptive solutions is used at each node of the trees. The source of diversity of the ensemble is the randomness of the solution generation method of GRASP. A further key feature of the tree construction method for GAR-Forest is a decreasing level of randomness during the process of constructing the tree: maximum randomness at the root and minimum randomness at the leaves. The method is therefore named “GAR”, GRASP with annealed randomness.The results conclusively demonstrate that G-Forest and GAR-Forest outperform Bagging, AdaBoost, MultiBoost, Random Forest and Random Subspaces. The results are even more convincing in the presence of noise, demonstrating the robustness of the method.The relationship between base classifier accuracy and their diversity is analysed by application of kappa-error diagrams and a variant of these called kappa-error relative movement diagrams.  相似文献   

2.
This paper proposes a method for constructing ensembles of decision trees, random feature weights (RFW). The method is similar to Random Forest, they are methods that introduce randomness in the construction method of the decision trees. In Random Forest only a random subset of attributes are considered for each node, but RFW considers all of them. The source of randomness is a weight associated with each attribute. All the nodes in a tree use the same set of random weights but different from the set of weights in other trees. So, the importance given to the attributes will be different in each tree and that will differentiate their construction. The method is compared to Bagging, Random Forest, Random-Subspaces, AdaBoost and MultiBoost, obtaining favourable results for the proposed method, especially when using noisy data sets. RFW can be combined with these methods. Generally, the combination of RFW with other method produces better results than the combined methods. Kappa-error diagrams and Kappa-error movement diagrams are used to analyse the relationship between the accuracies of the base classifiers and their diversity.  相似文献   

3.
We present a new classifier fusion method to combine soft-level classifiers with a new approach, which can be considered as a generalized decision templates method. Previous combining methods based on decision templates employ a single prototype for each class, but this global point of view mostly fails to properly represent the decision space. This drawback extremely affects the classification rate in such cases: insufficient number of training samples, island-shaped decision space distribution, and classes with highly overlapped decision spaces. To better represent the decision space, we utilize a prototype selection method to obtain a set of local decision prototypes for each class. Afterward, to determine the class of a test pattern, its decision profile is computed and then compared to all decision prototypes. In other words, for each class, the larger the numbers of decision prototypes near to the decision profile of a given pattern, the higher the chance for that class. The efficiency of our proposed method is evaluated over some well-known classification datasets suggesting superiority of our method in comparison with other proposed techniques.  相似文献   

4.
Naive Bayesian分类器是一种有效的文本分类方法,但由于具有较强的稳定性,很难通过Boosting机制提高其性能。因此用Naive Bayesian分类器作为Boosting的基分类器需要解决的最大问题,就是如何破坏Naive Bayesian分类器的稳定性。提出了3种破坏Naive Bayesian学习器稳定性的方法。第一种方法改变训练集样本,第二种方法采用随机属性选择社团,第三种方法是在Boosting的每次迭代中利用不同的文本特征提取方法建立不同的特征词集。实验表明,这几种方法各有其优缺点,但都比原有方法准确、高效。  相似文献   

5.
Ensembles of classifiers that are trained on different parts of the input space provide good results in general. As a popular boosting technique, AdaBoost is an iterative and gradient based deterministic method used for this purpose where an exponential loss function is minimized. Bagging is a random search based ensemble creation technique where the training set of each classifier is arbitrarily selected. In this paper, a genetic algorithm based ensemble creation approach is proposed where both resampled training sets and classifier prototypes evolve so as to maximize the combined accuracy. The objective function based random search procedure of the resultant system guided by both ensemble accuracy and diversity can be considered to share the basic properties of bagging and boosting. Experimental results have shown that the proposed approach provides better combined accuracies using a fewer number of classifiers than AdaBoost.  相似文献   

6.
Training set resampling based ensemble design techniques are successfully used to reduce the classification errors of the base classifiers. Boosting is one of the techniques used for this purpose where each training set is obtained by drawing samples with replacement from the available training set according to a weighted distribution which is modified for each new classifier to be included in the ensemble. The weighted resampling results in a classifier set, each being accurate in different parts of the input space mainly specified the sample weights. In this study, a dynamic integration of boosting based ensembles is proposed so as to take into account the heterogeneity of the input sets. An evidence-theoretic framework is developed for this purpose so as to take into account the weights and distances of the neighboring training samples in both training and testing boosting based ensembles. The effectiveness of the proposed technique is compared to the AdaBoost algorithm using three different base classifiers.  相似文献   

7.
Handwritten text recognition is one of the most difficult problems in the field of pattern recognition. Recently, a number of classifier creation and combination methods, known as ensemble methods, have been proposed in the field of machine learning. They have shown improved recognition performance over single classifiers. In this paper the application of some of those ensemble methods in the domain of offline cursive handwritten word recognition is described. The basic word recognizers are given by hidden Markov models (HMMs). It is demonstrated through experiments that ensemble methods have the potential of improving recognition accuracy also in the domain of handwriting recognition.Received: 23 November 2001, Accepted: 19 September 2002, Published online: 6 June 2003  相似文献   

8.
Several researchers have shown that substantial improvements can be achieved in difficult pattern recognition problems by combining the outputs of multiple neural networks. In this work, we present and test a pattern classification multi-net system based on both supervised and unsupervised learning. Following the ‘divide-and-conquer’ framework, the input space is partitioned into overlapping subspaces and neural networks are subsequently used to solve the respective classification subtasks. Finally, the outputs of individual classifiers are appropriately combined to obtain the final classification decision. Two clustering methods have been applied for input space partitioning and two schemes have been considered for combining the outputs of the multiple classifiers. Experiments on well-known data sets indicate that the multi-net classification system exhibits promising performance compared with the case of single network training, both in terms of error rates and in terms of training speed (especially if the training of the classifiers is done in parallel). ID="A1"Correspondance and offprint requests to: D. Frosyniotis, National Technical University of Athens, Department of Electrical and Computer Engineering, Zographou 157 73, Athens, Greece. E-mail: andreas@cs.ntua.gr  相似文献   

9.
Generalized rules for combination and joint training of classifiers   总被引:1,自引:0,他引:1  
Classifier combination has repeatedly been shown to provide significant improvements in performance for a wide range of classification tasks. In this paper, we focus on the problem of combining probability distributions generated by different classifiers. Specifically, we present a set of new combination rules that generalize the most commonly used combination functions, such as the mean, product, min, and max operations. These new rules have continuous and differentiable forms, and can thus not only be used for combination of independently trained classifiers, but also as objective functions in a joint classifier training scheme. We evaluate both of these schemes by applying them to the combination of phone classifiers in a speech recognition system. We find a significant performance improvement over previously used combination schemes when jointly training and combining multiple systems using a generalization of the product rule.  相似文献   

10.
Incremental construction of classifier and discriminant ensembles   总被引:2,自引:0,他引:2  
We discuss approaches to incrementally construct an ensemble. The first constructs an ensemble of classifiers choosing a subset from a larger set, and the second constructs an ensemble of discriminants, where a classifier is used for some classes only. We investigate criteria including accuracy, significant improvement, diversity, correlation, and the role of search direction. For discriminant ensembles, we test subset selection and trees. Fusion is by voting or by a linear model. Using 14 classifiers on 38 data sets, incremental search finds small, accurate ensembles in polynomial time. The discriminant ensemble uses a subset of discriminants and is simpler, interpretable, and accurate. We see that an incremental ensemble has higher accuracy than bagging and random subspace method; and it has a comparable accuracy to AdaBoost, but fewer classifiers.  相似文献   

11.
In this work, a new method for the creation of classifier ensembles is introduced. The patterns are partitioned into clusters to group together similar patterns, a training set is built using the patterns that belong to a cluster. Each of the new sets is used to train a classifier. We show that the approach here presented, called FuzzyBagging, obtains performance better than Bagging.  相似文献   

12.
Multi-Class Learning by Smoothed Boosting   总被引:1,自引:0,他引:1  
AdaBoost.OC has been shown to be an effective method in boosting “weak” binary classifiers for multi-class learning. It employs the Error-Correcting Output Code (ECOC) method to convert a multi-class learning problem into a set of binary classification problems, and applies the AdaBoost algorithm to solve them efficiently. One of the main drawbacks with the AdaBoost.OC algorithm is that it is sensitive to the noisy examples and tends to overfit training examples when they are noisy. In this paper, we propose a new boosting algorithm, named “MSmoothBoost”, which introduces a smoothing mechanism into the boosting procedure to explicitly address the overfitting problem with AdaBoost.OC. We proved the bounds for both the empirical training error and the marginal training error of the proposed boosting algorithm. Empirical studies with seven UCI datasets and one real-world application have indicated that the proposed boosting algorithm is more robust and effective than the AdaBoost.OC algorithm for multi-class learning. Editor: Nicolo Cesa-Bianchi  相似文献   

13.
There are several possible approaches to combining ridge regression with boosting techniques. In the simple or naive approach the ridge estimator is used to fit iteratively the current residuals yielding an alternative to the usual ridge estimator. In partial boosting only part of the regression parameters are reestimated within one step of the iterative procedure. The technique allows to distinguish between mandatory variables that are always included in the analysis and optional variables that are chosen only if relevant. The resulting procedure selects optional variables in a similar way as the Lasso, yielding a reduced set of influential variables, while allowing for regularized estimation of the mandatory parameters. The suggested procedures are investigated within the classical framework of continuous response variables as well as in the case of generalized linear models. The performance in terms of prediction and the identification of relevant variables is compared to several competitors as the Lasso and the more recently proposed elastic net. For the evaluation of the identification of relevant variables pseudo ROC curves are introduced.  相似文献   

14.
In this paper we present a new approach for boosting methods for the construction of ensembles of classifiers. The approach is based on using the distribution given by the weighting scheme of boosting to construct a non-linear supervised projection of the original variables, instead of using the weights of the instances to train the next classifier. With this method we construct ensembles that are able to achieve a better generalization error and are more robust to noise presence.It has been proved that AdaBoost method is able to improve the margin of the instances achieved by the ensemble. Moreover, its practical success has been partially explained by this margin maximization property. However, in noisy problems, likely to occur in real-world applications, the maximization of the margin of wrong instances or outliers can lead to poor generalization. We propose an alternative approach, where the distribution of the weights given by the boosting algorithm is used to get a supervised projection. Then, the supervised projection is used to train the next classifier using a uniform distribution of the training instances.The proposed approach is compared with three boosting techniques, namely AdaBoost, GentleBoost and MadaBoost, showing an improved performance on a large set of 55 problems from the UCI Machine Learning Repository, and less sensitiveness to noise in the class labels. The behavior of the proposed algorithm in terms of margin distribution and bias-variance decomposition is also studied.  相似文献   

15.
根据样本容量适当选取正则参数可以使得多分类Boosting算法具有一致性。通过分析正则参数对多分类Boosting算法推广能力的影响,建立了正则参数与算法一致性之间的联系。据此得到了Boosting算法具有一致性的充分条件。在样本集确定时,该条件可作为多分类Boosting算法选择正则参数的依据。  相似文献   

16.
提升(Boosting)是改善基分类器学习的有效手段。而研究表明,Boosting对于朴素贝叶斯的改善效果不明显。文章提出了一种新的提升算法——ActiveBoost,ActiveBoost结合主动学习挖掘未分配类别标注中样本的信息,并将不稳定性引入到朴素贝叶斯的构造过程。在UCI机器学习数据库的实验结果证明了该算法的有效性。  相似文献   

17.
We propose an algorithm to approximate each class region by a small number of approximated convex hulls and to use these for classification. The classifier is one of non-kernel maximum margin classifiers. It keeps the maximum margin in the original feature space, unlike support vector machines with a kernel. The construction of an exact convex hull requires an exponential time in dimension, so we find an approximate convex hull (a polyhedron) instead, which is constructed in linear time in dimension. We also propose a model selection procedure to control the number of faces of convex hulls for avoiding over-fitting, in which a fast procedure is adopted to calculate an upper-bound of the leave-one-out error. In comparison with support vector machines, the proposed approach is shown to be comparable in performance but more natural in the extension to multi-class problems.  相似文献   

18.
基于Boosting算法的入侵检测   总被引:1,自引:1,他引:1  
提出一种基于Boosting算法的入侵检测方法。先用神经网络初步确定一个入侵检测函数,在此基础上,利用Boosting方法构造一个基于神经网络的入侵检测函数序列,然后以一定的方式将它们组合成一个加强的总检测函数,据此进行入侵检测。实验结果显示,这种方法明显提高了检测性能。  相似文献   

19.
杜晓旭  钱沄涛 《计算机工程》2005,31(22):164-166
在很多应用中,组合使用多个分类器可以降低分类错误率。该文就是基于这个思想提出了新的人脸识别算法,即加强概率推理模型。在该算法中,将分类任务划分成多个子分类器,每个子分类器集中于一些难分类的样本,然后组合这些子分类器形成一个强的分类器。试验结果表明算法的识别率比原来的概率推理模型的识别率提高了1.8%。  相似文献   

20.
一种结合半监督Boosting方法的迁移学习算法   总被引:1,自引:0,他引:1  
迁移学习是数据挖掘中的一个研究方向,试图重用相关领域的数据样本,将相关领域的知识”迁移”到新领域中帮助训练.当前,基于实例的迁移学习算法容易产生过度拟合的问题,不能充分利用相关领域中的有用数据,为了避免这个问题,通过引入目标领域的无标记样本参与训练,利用半监督Boosting方法,提出一种新的迁移学习算法,能够对样本的...  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号