首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 546 毫秒
1.
目前模式识别领域中缺乏有效的多类概率建模方法,对此提出利用纠错输出编码作为多类概率建模框架,将二元纠错输出编码研究的概率输出问题转化为线性超定方程的求解问题,通过线性最小二乘法来求解并获取多类后验概率的结果;而对于三元纠错输出编码的等价非线性超定方程组,提出一种迭代法则来求解多类概率输出.实验中通过与3种经典方法相比较可以发现,新方法求取的概率输出具有更好的分布形态,并且该方法具有较好的分类性能.  相似文献   

2.
基于KNN模型的层次纠错输出编码算法   总被引:2,自引:0,他引:2  
辛轶  郭躬德  陈黎飞  黄杰 《计算机应用》2009,29(11):3051-3055
纠错输出编码是一种解决多类分类问题的有效方法,但其编码矩阵只对类进行编码且都采用事先构造出来的统一形式,适应性较差。为此,提出一种新颖的层次纠错输出编码算法。该算法在训练阶段先通过KNN模型算法在数据集上构建多个同类簇,选取各类中最具代表性的簇形成层次编码矩阵,然后再根据编码矩阵进行单分类器训练。在测试阶段,该算法通过模型融合进一步发挥KNN模型和纠错输出编码各自的优点。在UCI公共数据集上的实验结果表明,新方法的性能优于KNN模型算法和纠错输出编码算法。  相似文献   

3.
Ternary Error-Correcting Output Codes (ECOC), which can unify most of the state-of-the-art decomposition frameworks such as one-versus-one, one-versus-all, sparse coding, dense coding, etc., is considered more flexible to model multiclass classification problems than Binary ECOC. Meanwhile, there are many corresponding decoding strategies that have been proposed for Ternary ECOC in earlier literatures. Note that there is few working by posterior probabilities, which can be considered as a Bayes decision rule and hence obtain a better performance in usual. Passerini et al. (2004) [16] have recently proposed a decoding strategy based on posterior probabilities. However, according to the analyses of this paper, Passerini et al.'s (2004) [16] method suffers some defects and result in bias. To overcome that, we proposed a variation of it by refining the decomposition process of probability to get smoother estimates. Our bias–variance analysis shows that the decrease in error by our variant is due to a decrease in variance. Besides, we extended an efficient method of obtaining posterior probabilities based on the linear rule for decoding process in Binary ECOC to Ternary ECOC. On ten benchmark datasets, we observe that the two decoding strategies based on posterior probabilities in this paper obtain better performance than other ones in earlier references.  相似文献   

4.
从概率密度函数的角度出发,利用Parzen窗法估计总体样本的概率密度分布,将核方法和Parzen窗法引入最大后验概率方法中,提出一种基于Parzen核估计的最大后验概率的高性能多分类方法。该方法不需要考虑样本数据的具体分布情况,能够得到分类的可信度,给出推理的不确定性依据。在3个国际标准UCI数据集和3个人脸数据集上的实验结果表明,该方法具有较好的分类效果。  相似文献   

5.
A common way to model multiclass classification problems is to design a set of binary classifiers and to combine them. Error-Correcting Output Codes (ECOC) represent a successful framework to deal with these type of problems. Recent works in the ECOC framework showed significant performance improvements by means of new problem-dependent designs based on the ternary ECOC framework. The ternary framework contains a larger set of binary problems because of the use of a “do not care” symbol that allows us to ignore some classes by a given classifier. However, there are no proper studies that analyze the effect of the new symbol at the decoding step. In this paper, we present a taxonomy that embeds all binary and ternary ECOC decoding strategies into four groups. We show that the zero symbol introduces two kinds of biases that require redefinition of the decoding design. A new type of decoding measure is proposed, and two novel decoding strategies are defined. We evaluate the state-of-the-art coding and decoding strategies over a set of UCI Machine Learning Repository data sets and into a real traffic sign categorization problem. The experimental results show that, following the new decoding strategies, the performance of the ECOC design is significantly improved.  相似文献   

6.
ECOC is a widely used and successful technique, which implements a multi-class classification system by decomposing the original problem into several two-class problems. In this paper, we study the possibility to provide ECOC systems with a tailored reject option carried out through different schemes that can be grouped under two different categories: an external and an internal approach. The first one is based on the reliability of the entire system output and does not require any change in its structure. The second scheme, instead, estimates the reliability of the internal dichotomizers and implies a slight modification in the decoding stage. Experimental results on popular benchmark data sets are reported to show the behavior of the different schemes.  相似文献   

7.
类别不平衡问题广泛存在于现实生活中,多数传统分类器假定类分布平衡或误分类代价相等,因此类别不平衡数据严重影响了传统分类器的分类性能。针对不平衡数据集的分类问题,提出了一种处理不平衡数据的概率阈值Bagging分类方法-PT Bagging。将阈值移动技术与Bagging集成算法结合起来,在训练阶段使用原始分布的训练集进行训练,在预测阶段引入决策阈值移动方法,利用校准的后验概率估计得到对不平衡数据分类的最大化性能测量。实验结果表明,PT Bagging算法具有更好的处理不平衡数据的分类优势。  相似文献   

8.
An approach that aims to enhance error resilience in pattern classification problems is proposed. The new approach combines the spread spectrum technique, specifically its selectivity and sensitivity, with error-correcting output codes (ECOC) for pattern classification. This approach combines both the coding gain of ECOC and the spreading gain of the spread spectrum technique to improve error resilience. ECOC is a well-established technique for general purpose pattern classification, which reduces the multi-class learning problem to an ensemble of two-class problems and uses special codewords to improve the error resilience of pattern classification. The direct sequence code division multiple access (DS-CDMA) technique is a spread spectrum technique that provides high user selectivity and high signal detection sensitivity, resulting in a reliable connection through a noisy radio communication channel shared by multiple users. Using DS-CDMA to spread the codeword, assigned to each pattern class by the ECOC technique, gives codes with coding properties that enable better correction of classification errors than ECOC alone. Results of performance assessment experiments show that the use of DS-CDMA alongside ECOC boosts error-resilience significantly, by yielding better classification accuracy than ECOC by itself.  相似文献   

9.
In the process of learning the naive Bayes, estimating probabilities from a given set of training samples is crucial. However, when the training samples are not adequate, probability estimation method will inevitably suffer from the zero-frequency problem. To avoid this problem, Laplace-estimate and M-estimate are the two main methods used to estimate probabilities. The estimation of two important parameters m (integer variable) and p (probability variable) in these methods has a direct impact on the underlying experimental results. In this paper, we study the existing probability estimation methods and carry out a parameter Cross-test by experimentally analyzing the performance of M-estimate with different settings for the two parameters m and p. This part of experimental result shows that the optimal parameter values vary corresponding to different data sets. Motivated by these analysis results, we propose an estimation model based on self-adaptive differential evolution. Then we propose an approach to calculate the optimal m and p value for each conditional probability to avoid the zero-frequency problem. We experimentally test our approach in terms of classification accuracy using the 36 benchmark machine learning repository data sets, and compare it to a naive Bayes with Laplace-estimate and M-estimate with a variety of setting of parameters from literature and those possible optimal settings via our experimental analysis. The experimental results show that the estimation model is efficient and our proposed approach significantly outperforms the traditional probability estimation approaches especially for large data sets (large number of instances and attributes).  相似文献   

10.
In volume visualization, the definition of the regions of interest is inherently an iterative trial‐and‐error process finding out the best parameters to classify and render the final image. Generally, the user requires a lot of expertise to analyze and edit these parameters through multi‐dimensional transfer functions. In this paper, we present a framework of intelligent methods to label on‐demand multiple regions of interest. These methods can be split into a two‐level GPU‐based labelling algorithm that computes in time of rendering a set of labelled structures using the Machine Learning Error‐Correcting Output Codes (ECOC) framework. In a pre‐processing step, ECOC trains a set of Adaboost binary classifiers from a reduced pre‐labelled data set. Then, at the testing stage, each classifier is independently applied on the features of a set of unlabelled samples and combined to perform multi‐class labelling. We also propose an alternative representation of these classifiers that allows to highly parallelize the testing stage. To exploit that parallelism we implemented the testing stage in GPU‐OpenCL. The empirical results on different data sets for several volume structures shows high computational performance and classification accuracy.  相似文献   

11.
网络信息浩如烟海又纷繁芜杂,从中掌握最有效的信息是信息处理的一大目标,而文本分类是组织和管理数据的有力手段.由于最大熵模型可以综合观察到的各种相关或不相关的概率知识,具有对许多问题的处理都可以达到较好的结果的优势,将最大熵模型引入到中文文本分类的研究中,并通过使用一种特征聚合的算法改进特征选择的有效性.实验表明与Bayes、KNN和SVM这三种性能优越的算法相比,基于最大熵的文本分类算法可取得较之更优的分类精度.  相似文献   

12.
Abstract Error Correcting Output Coding (ECOC) methods for multiclass classification present several open problems ranging from the trade-off between their error recovering capabilities and the learnability of the induced dichotomies to the selection of proper base learners and to the design of well-separated codes for a given multiclass problem. We experimentally analyse some of the main factors affecting the effectiveness of ECOC methods. We show that the architecture of ECOC learning machines influences the accuracy of the ECOC classifier, highlighting that ensembles of parallel and independent dichotomic Multi-Layer Perceptrons are well-suited to implement ECOC methods. We quantitatively evaluate the dependence among codeword bit errors using mutual information based measures, experimentally showing that a low dependence enhances the generalisation capabilities of ECOC. Moreover we show that the proper selection of the base learner and the decoding function of the reconstruction stage significantly affects the performance of the ECOC ensemble. The analysis of the relationships between the error recovering power, the accuracy of the base learners, and the dependence among codeword bits show that all these factors concur to the effectiveness of ECOC methods in a not straightforward way, very likely dependent on the distribution and complexity of the data.An erratum to this article can be found at  相似文献   

13.
It is a general viewpoint that AdaBoost classifier has excellent performance on classification problems but could not produce good probability estimations. In this paper we put forward a theoretical analysis of probability estimation model and present some verification experiments, which indicate that AdaBoost can be used for probability estimation. With the theory, we suggest some useful measures for using AdaBoost algorithms properly. And then we deduce a probability estimation model for multi-class classification by pairwise coupling. Unlike previous approximate methods, we provide an analytical solution instead of a special iterative procedure. Moreover, a new problem that how to get a robust prediction with classifier scores is proposed. Experiments show that the traditional predict framework, which chooses one with the highest score from all classes as the prediction, is not always good while our model performs well.  相似文献   

14.
一种搜索编码法及其在监督分类中的应用   总被引:3,自引:0,他引:3  
蒋艳凰  赵强利  杨学军 《软件学报》2005,16(6):1081-1089
纠错输出码作为监督分类领域中的一个新的研究方向,是提高分类器泛化能力的一种有效方法,但目前还没有通用的确定性编码方法.分析了现有纠错输出码的性质,提出一种搜索编码法,该方法通过对整数空间的顺序搜索,获得满足任意类别数目与最小汉明距离要求的输出码;然后探讨了基于搜索编码的监督分类技术.对简单贝叶斯与BP神经网络算法进行实验,结果表明,搜索编码法可作为一种通用的编码方法用于提高监督分类器的泛化能力.  相似文献   

15.
基于证据理论的纠错输出编码解决多类分类问题   总被引:1,自引:0,他引:1  
针对多类分类问题,利用纠错输出编码作为分解框架,把多类问题转化为多个二类问题加以解决;同时提出一种基于证据理论的解码策略,把每一个二分器的输出作为证据之一进行融合,并讨论在两种编码类型(二元和三元编码矩阵)下证据融合的不同策略.通过实验分别对UCI数据集和3种一维距离像数据集进行测试,并与几种经典的解码方法进行比较,验证了所提出的方法能有效提高纠错输出编码特别是三元编码矩阵的分类正确率.  相似文献   

16.
实际的分类数据往往是分布不均衡的.传统的分类器大都会倾向多数类而忽略少数类,导致分类性能恶化.针对该问题提出一种基于变分贝叶斯推断最优高斯混合模型(varition Bayesian-optimized optimal Gaussian mixture model, VBoGMM)的自适应不均衡数据综合采样法. VBoGMM可自动衰减到真实的高斯成分数,实现任意数据的最优分布估计;进而基于所获得的分布特性对少数类样本进行自适应综合过采样,并采用Tomek-link对准则对采样数据进行清洗以获得相对均衡的数据集用于后续的分类模型学习.在多个公共不均衡数据集上进行大量的验证和对比实验,结果表明:所提方法能在实现样本均衡化的同时,维持多数类与少数类样本空间分布特性,因而能有效提升传统分类模型在不均衡数据集上的分类性能.  相似文献   

17.
《Pattern recognition》2014,47(2):865-884
Genetic Algorithms (GA) have been previously applied to Error-Correcting Output Codes (ECOC) in state-of-the-art works in order to find a suitable coding matrix. Nevertheless, none of the presented techniques directly take into account the properties of the ECOC matrix. As a result the considered search space is unnecessarily large. In this paper, a novel Genetic strategy to optimize the ECOC coding step is presented. This novel strategy redefines the usual crossover and mutation operators in order to take into account the theoretical properties of the ECOC framework. Thus, it reduces the search space and lets the algorithm to converge faster. In addition, a novel operator that is able to enlarge the code in a smart way is introduced. The novel methodology is tested on several UCI datasets and four challenging computer vision problems. Furthermore, the analysis of the results done in terms of performance, code length and number of Support Vectors shows that the optimization process is able to find very efficient codes, in terms of the trade-off between classification performance and the number of classifiers. Finally, classification performance per dichotomizer results shows that the novel proposal is able to obtain similar or even better results while defining a more compact number of dichotomies and SVs compared to state-of-the-art approaches.  相似文献   

18.
This paper presents a new study on a method of designing a multi-class classifier: Data-driven Error Correcting Output Coding (DECOC). DECOC is based on the principle of Error Correcting Output Coding (ECOC), which uses a code matrix to decompose a multi-class problem into multiple binary problems. ECOC for multi-class classification hinges on the design of the code matrix. We propose to explore the distribution of data classes and optimize both the composition and the number of base learners to design an effective and compact code matrix. Two real world applications are studied: (1) the holistic recognition (i.e., recognition without segmentation) of touching handwritten numeral pairs and (2) the classification of cancer tissue types based on microarray gene expression data. The results show that the proposed DECOC is able to deliver competitive accuracy compared with other ECOC methods, using parsimonious base learners than the pairwise coupling (one-vs-one) decomposition scheme. With a rejection scheme defined by a simple robustness measure, high reliabilities of around 98% are achieved in both applications.  相似文献   

19.
A prototype reduction algorithm is proposed, which simultaneously trains both a reduced set of prototypes and a suitable local metric for these prototypes. Starting with an initial selection of a small number of prototypes, it iteratively adjusts both the position (features) of these prototypes and the corresponding local-metric weights. The resulting prototypes/metric combination minimizes a suitable estimation of the classification error probability. Good performance of this algorithm is assessed through experiments with a number of benchmark data sets and with a real task consisting in the verification of images of human faces.  相似文献   

20.
Online error correcting output codes   总被引:1,自引:0,他引:1  
This article proposes a general extension of the error correcting output codes framework to the online learning scenario. As a result, the final classifier handles the addition of new classes independently of the base classifier used. In particular, this extension supports the use of both online example incremental and batch classifiers as base learners. The extension of the traditional problem independent codings one-versus-all and one-versus-one is introduced. Furthermore, two new codings are proposed, unbalanced online ECOC and a problem dependent online ECOC. This last online coding technique takes advantage of the problem data for minimizing the number of dichotomizers used in the ECOC framework while preserving a high accuracy. These techniques are validated on an online setting of 11 data sets from UCI database and applied to two real machine vision applications: traffic sign recognition and face recognition. As a result, the online ECOC techniques proposed provide a feasible and robust way for handling new classes using any base classifier.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号