首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
New Applications of Ensembles of Classifiers   总被引:2,自引:0,他引:2  
Combination (ensembles) of classifiers is now a well established research line. It has been observed that the predictive accuracy of a combination of independent classifiers excels that of the single best classifier. While ensembles of classifiers have been mostly employed to achieve higher recognition accuracy, this paper focuses on the use of combinations of individual classifiers for handling several problems from the practice in the machine learning, pattern recognition and data mining domains. In particular, the study presented concentrates on managing the imbalanced training sample problem, scaling up some preprocessing algorithms and filtering the training set. Here, all these situations are examined mainly in connection with the nearest neighbour classifier. Experimental results show the potential of multiple classifier systems when applied to those situations.  相似文献   

2.
Cost Complexity-Based Pruning of Ensemble Classifiers   总被引:1,自引:0,他引:1  
In this paper we study methods that combine multiple classification models learned over separate data sets. Numerous studies posit that such approaches provide the means to efficiently scale learning to large data sets, while also boosting the accuracy of individual classifiers. These gains, however, come at the expense of an increased demand for run-time system resources. The final ensemble meta-classifier may consist of a large collection of base classifiers that require increased memory resources while also slowing down classification throughput. Here, we describe an algorithm for pruning (i.e., discarding a subset of the available base classifiers) the ensemble meta-classifier as a means to reduce its size while preserving its accuracy and we present a technique for measuring the trade-off between predictive performance and available run-time system resources. The algorithm is independent of the method used initially when computing the meta-classifier. It is based on decision tree pruning methods and relies on the mapping of an arbitrary ensemble meta-classifier to a decision tree model. Through an extensive empirical study on meta-classifiers computed over two real data sets, we illustrate our pruning algorithm to be a robust and competitive approach to discarding classification models without degrading the overall predictive performance of the smaller ensemble computed over those that remain after pruning. Received 30 August 2000 / Revised 7 March 2001 / Accepted in revised form 21 May 2001  相似文献   

3.
Programming and Computer Software - A method of stance detection in text is proposed. This method is based on the machine learning of ensembles of classifiers. It is known that ensembles have...  相似文献   

4.
Local Averaging of Ensembles of LVQ-Based Nearest Neighbor Classifiers   总被引:1,自引:0,他引:1  
Ensemble learning is a well-established method for improving the generalization performance of learning machines. The idea is to combine a number of learning systems that have been trained in the same task. However, since all the members of the ensemble are operating at the same time, large amounts of memory and long execution times are needed, limiting its practical application. This paper presents a new method (called local averaging) in the context of nearest neighbor (NN) classifiers that generates a classifier from the ensemble with the same complexity as the individual members. Once a collection of prototypes is generated from different learning sessions using a Kohonen's LVQ algorithm, a single set of prototypes is computed by applying a cluster algorithm (such as K-means) to this collection. Local averaging can be viewed either as a technique to reduce the variance of the prototypes or as the result of averaging a series of particular bootstrap replicates. Experimental results using several classification problems confirm the utility of the method and show that local averaging can compute a single classifier that achieves a similar (or even better) accuracy than ensembles generated with voting.  相似文献   

5.
6.
7.
特征选择有助于增强集成分类器成员间的随机差异性,从而提高泛化精度。研究了随机子空间法(RandomSub-space)和旋转森林法(RotationForest)两种基于特征选择的集成分类器构造算法,分析讨论了两算法特征选择的方式与随机差异程度之间的关系。通过对UCI数据集引入噪声,比较两者在噪声环境下的分类精度。实验结果表明:当噪声增加及特征关联度下降时,基本学习算法及噪声程度对集成效果均有影响,当噪声增强到一定程度后。集成效果和单分类器的性能趋于一致。  相似文献   

8.
统计机器翻译从诞生至今获得了长足的发展,目前已经成为机器翻译的主流.但是作为基础模块之一的翻译模型却随训练语料的增大而呈现飞速增大的趋势.为了使统计机器翻译更加实用,翻译模型的约简一直是研究热点之一.概述了统计机器翻译中翻译模型约简的研究现状,相关方法主要围绕解码过程统计分析、训练语料中的统计分析、翻译模型中的短语对自身特点分析等三个类别.结合相关分析,最后也探讨了这个方向的未来发展趋势.  相似文献   

9.
Advances in Instance Selection for Instance-Based Learning Algorithms   总被引:6,自引:0,他引:6  
The basic nearest neighbour classifier suffers from the indiscriminate storage of all presented training instances. With a large database of instances classification response time can be slow. When noisy instances are present classification accuracy can suffer. Drawing on the large body of relevant work carried out in the past 30 years, we review the principle approaches to solving these problems. By deleting instances, both problems can be alleviated, but the criterion used is typically assumed to be all encompassing and effective over many domains. We argue against this position and introduce an algorithm that rivals the most successful existing algorithm. When evaluated on 30 different problems, neither algorithm consistently outperforms the other: consistency is very hard. To achieve the best results, we need to develop mechanisms that provide insights into the structure of class definitions. We discuss the possibility of these mechanisms and propose some initial measures that could be useful for the data miner.  相似文献   

10.
In this report two programs for statistical analysis of concordance lines are described. The programs have been developed for analyzing he lexical context of a given word. It is shown how different parameter settings influence the outcome of collocational analysis, and how the concept of collocation can be extended to allow the extraction of lines typical for a word from a set of concordance lines. Even though all the examples are for English, the software is completely language independent and only requires minimal linguistic resources. This revised version was published online in July 2006 with corrections to the Cover Date.  相似文献   

11.
基于实例学习的可适应性并行任务负荷分配算法能根据应用程序的静态特征估计其运算负荷,选定好的任务负荷分配方案使其多线程并行接近甚至达到最优,它具有低成本和高效率的特点.通过一系列实验,分析研究训练实例的选择对基于实例学习优化的效果的影响,从中总结一些有益的经验,以便进一步提高算法性能.  相似文献   

12.
Minimum classification error learning realized via generalized probabilistic descent, usually referred to as (MCE/GPD), is a very popular and powerful framework for building classifiers. This paper first presents a theoretical analysis of MCE/GPD. The focus is on a simple classification problem for estimating the means of two Gaussian classes. For this simple algorithm, we derive difference equations for the class means and decision threshold during learning, and develop closed form expressions for the evolution of both the smoothed and true error. In addition, we show that the decision threshold converges to its optimal value, and provide an estimate of the number of iterations needed to approach convergence. After convergence the class means drift towards increasing their distance to infinity without contributing to the decrease of the classification error. This behavior, referred to as mean drift, is then related to the increase of the variance of the classifier. The theoretical results perfectly agree with simulations carried out for a two-class Gaussian classification problem. In addition to the obtained theoretical results we experimentally verify, in speech recognition experiments, that MCE/GPD learning of Gaussian mixture hidden Markov models qualitatively follows the pattern suggested by the theoretical analysis. We also discuss links between MCE/GPD learning and both batch gradient descent and extended Baum-Welch re-estimation. The latter two approaches are known to be popular in large scale implementations of discriminative training. Hence, the proposed analysis can be used, at least as a rough guideline, for better understanding of the properties of discriminative training algorithms for speech recognition.  相似文献   

13.
Ensembles of relational classifiers   总被引:1,自引:1,他引:0  
Relational classification aims at including relations among entities into the classification process, for example taking relations among documents such as common authors or citations into account. However, considering more than one relation can further improve classification accuracy. Here we introduce a new approach to make use of several relations as well as both, relations and local attributes for classification using ensemble methods. To accomplish this, we present a generic relational ensemble model that can use different relational and local classifiers as components. Furthermore, we discuss solutions for several problems concerning relational data such as heterogeneity, sparsity, and multiple relations. Especially the sparsity problem will be discussed in more detail. We introduce a new method called PRNMultiHop that tries to handle this problem. Furthermore we categorize relational methods in a systematic way. Finally, we provide empirical evidence, that our relational ensemble methods outperform existing relational classification methods, even rather complex models such as relational probability trees (RPTs), relational dependency networks (RDNs) and relational Bayesian classifiers (RBCs).  相似文献   

14.
The Bayesian classifier is a fundamental classification technique. In this work, we focus on programming Bayesian classifiers in SQL. We introduce two classifiers: Naive Bayes and a classifier based on class decomposition using K-means clustering. We consider two complementary tasks: model computation and scoring a data set. We study several layouts for tables and several indexing alternatives. We analyze how to transform equations into efficient SQL queries and introduce several query optimizations. We conduct experiments with real and synthetic data sets to evaluate classification accuracy, query optimizations, and scalability. Our Bayesian classifier is more accurate than Naive Bayes and decision trees. Distance computation is significantly accelerated with horizontal layout for tables, denormalization, and pivoting. We also compare Naive Bayes implementations in SQL and C++: SQL is about four times slower. Our Bayesian classifier in SQL achieves high classification accuracy, can efficiently analyze large data sets, and has linear scalability.  相似文献   

15.
To recognize speech, handwriting or sign language, many hybrid approaches have been proposed that combine Dynamic Time Warping (DTW) or Hidden Markov Models (HMM) with discriminative classifiers. However, all methods rely directly on the likelihood models of DTW/HMM. We hypothesize that time warping and classification should be separated because of conflicting likelihood modelling demands. To overcome these restrictions, we propose to use Statistical DTW (SDTW) only for time warping, while classifying the warped features with a different method. Two novel statistical classifiers are proposed (CDFD and Q-DFFM), both using a selection of discriminative features (DF), and are shown to outperform HMM and SDTW. However, we have found that combining likelihoods of multiple models in a second classification stage degrades performance of the proposed classifiers, while improving performance with HMM and SDTW. A proof-of-concept experiment, combining DFFM mappings of multiple SDTW models with SDTW likelihoods, shows that also for model-combining, hybrid classification can provide significant improvement over SDTW. Although recognition is mainly based on 3D hand motion features, these results can be expected to generalize to recognition with more detailed measurements such as hand/body pose and facial expression.  相似文献   

16.
模糊多分类器组合   总被引:4,自引:0,他引:4  
本文提出了一种多分类器的模糊组合方法,它利用了参与组合的分类器提供的两类信息:(1)在度量层次上,对未知模式的分类信息,(2)在符号层次上,训练样本的错分类分布信息.对参与组合的分类器提供的这两类信息进行模糊集成,组合分类器输出未知模式来自各类别的可能性度量.用该方法对手写体汉字作分类识别,实验结果显示,较之其它几种方法,它有更高的可靠性.  相似文献   

17.
The ensembling of classifiers tends to improve predictive accuracy. To obtain an ensemble with N classifiers, one typically needs to run N learning processes. In this paper we introduce and explore Model Jittering Ensembling, where one single model is perturbed in order to obtain variants that can be used as an ensemble. We use as base classifiers sets of classification association rules. The two methods of jittering ensembling we propose are Iterative Reordering Ensembling (IRE) and Post Bagging (PB). Both methods start by learning one rule set over a single run, and then produce multiple rule sets without relearning. Empirical results on 36 data sets are positive and show that both strategies tend to reduce error with respect to the single model association rule classifier. A bias–variance analysis reveals that while both IRE and PB are able to reduce the variance component of the error, IRE is particularly effective in reducing the bias component. We show that Model Jittering Ensembling can represent a very good speed-up w.r.t. multiple model learning ensembling. We also compare Model Jittering with various state of the art classifiers in terms of predictive accuracy and computational efficiency.  相似文献   

18.
This paper introduces a new parameterization of diffeomorphic deformations for the characterization of the variability in image ensembles. Dense diffeomorphic deformations are built by interpolating the motion of a finite set of control points that forms a Hamiltonian flow of self-interacting particles. The proposed approach estimates a template image representative of a given image set, an optimal set of control points that focuses on the most variable parts of the image, and template-to-image registrations that quantify the variability within the image set. The method automatically selects the most relevant control points for the characterization of the image variability and estimates their optimal positions in the template domain. The optimization in position is done during the estimation of the deformations without adding any computational cost at each step of the gradient descent. The selection of the control points is done by adding a L 1 prior to the objective function, which is optimized using the FISTA algorithm.  相似文献   

19.
为了达到应用模式识别对储层岩性进行准确的识别,根据不同性质分类器的组合可以较全面描述一个模式从而降低识别错误及增强识别鲁棒性这种特性,结合模糊数学中的隶属度实现了多个分类器的定量组合.应用了几种不同的分类器进行组合,同时应用某油田的岩性识别作为实例进行验证,并与单个分类器进行比较,结果表明在间等条件下组合分类器的性能要比单个分类器的性能要好得多.这说明了应用不同性质的分类器进行组合识别的分类方法足可行性和有效性.  相似文献   

20.
聚类集成中的差异性度量研究   总被引:14,自引:0,他引:14  
集体的差异性被认为是影响集成学习的一个关键因素.在分类器集成中有许多的差异性度量被提出,但是在聚类集成中如何测量聚类集体的差异性,目前研究得很少.作者研究了7种聚类集体差异性度量方法,并通过实验研究了这7种度量在不同的平均成员聚类准确度、不同的集体大小和不同的数据分布情况下与各种聚类集成算法性能之间的关系.实验表明:这些差异性度量与聚类集成性能间并没有单调关系,但是在平均成员准确度较高、聚类集体大小适中和数据中有均匀簇分布的情况下,它们与集成性能间的相关度还是比较高的.最后给出了一些差异性度量用于指导聚类集体生成的可行性建议.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号