首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This paper presents a new study on a method of designing a multi-class classifier: Data-driven Error Correcting Output Coding (DECOC). DECOC is based on the principle of Error Correcting Output Coding (ECOC), which uses a code matrix to decompose a multi-class problem into multiple binary problems. ECOC for multi-class classification hinges on the design of the code matrix. We propose to explore the distribution of data classes and optimize both the composition and the number of base learners to design an effective and compact code matrix. Two real world applications are studied: (1) the holistic recognition (i.e., recognition without segmentation) of touching handwritten numeral pairs and (2) the classification of cancer tissue types based on microarray gene expression data. The results show that the proposed DECOC is able to deliver competitive accuracy compared with other ECOC methods, using parsimonious base learners than the pairwise coupling (one-vs-one) decomposition scheme. With a rejection scheme defined by a simple robustness measure, high reliabilities of around 98% are achieved in both applications.  相似文献   

2.
Error-correcting output coding (ECOC) is a strategy to create classifier ensembles which reduces a multi-class problem into some binary sub-problems. A key issue in designing any ECOC classifier refers to defining optimal codematrix having maximum discrimination power and minimum number of columns. This paper proposes a heuristic method for application-dependent design of optimal ECOC matrix based on a thinning algorithm. The main idea of the proposed Thinned-ECOC method is to successively remove some redundant and unnecessary columns of any initial codematrix based on a metric defined for each column. As a result, computational cost of the ensemble is reduced while preserving its accuracy. Proposed method has been validated using the UCI machine learning database and further applied to a couple of real-world pattern recognition problems (the face recognition and gene expression based cancer classification). Experimental results emphasize the robustness of Thinned-ECOC in comparison with existing state-of-the-art code generation methods.  相似文献   

3.
Physical activity recognition using wearable sensors has gained significant interest from researchers working in the field of ambient intelligence and human behavior analysis. The problem of multi-class classification is an important issue in the applications which naturally has more than two classes. A well-known strategy to convert a multi-class classification problem into binary sub-problems is the error-correcting output coding (ECOC) method. Since existing methods use a single classifier with ECOC without considering the dependency among multiple classifiers, it often fails to generalize the performance and parameters in a real-life application, where different numbers of devices, sensors and sampling rates are used. To address this problem, we propose a unique hierarchical classification model based on the combination of two base binary classifiers using selective learning of slacked hierarchy and integrating the training of binary classifiers into a unified objective function. Our method maps the multi-class classification problem to multi-level classification. A multi-tier voting scheme has been introduced to provide a final classification label at each level of the solicited model. The proposed method is evaluated on two publicly available datasets and compared with independent base classifiers. Furthermore, it has also been tested on real-life sensor readings for 3 different subjects to recognize four activities i.e. Walking, Standing, Jogging and Sitting. The presented method uses same hierarchical levels and parameters to achieve better performance on all three datasets having different number of devices, sensors and sampling rates. The average accuracies on publicly available dataset and real-life sensor readings were recorded to be 95% and 85%, respectively. The experimental results validate the effectiveness and generality of the proposed method in terms of performance and parameters.  相似文献   

4.
基于KNN模型的层次纠错输出编码算法   总被引:2,自引:0,他引:2  
辛轶  郭躬德  陈黎飞  黄杰 《计算机应用》2009,29(11):3051-3055
纠错输出编码是一种解决多类分类问题的有效方法,但其编码矩阵只对类进行编码且都采用事先构造出来的统一形式,适应性较差。为此,提出一种新颖的层次纠错输出编码算法。该算法在训练阶段先通过KNN模型算法在数据集上构建多个同类簇,选取各类中最具代表性的簇形成层次编码矩阵,然后再根据编码矩阵进行单分类器训练。在测试阶段,该算法通过模型融合进一步发挥KNN模型和纠错输出编码各自的优点。在UCI公共数据集上的实验结果表明,新方法的性能优于KNN模型算法和纠错输出编码算法。  相似文献   

5.
The best-known decomposition schemes of multiclass learning problems are one per class coding (OPC) and error-correcting output coding (ECOC). Both methods perform a prior decomposition, that is, before training of the classifier takes place. The impact of output codes on the inferred decision rules can be experienced only after learning. Therefore, we present a novel algorithm for the code design of multiclass learning problems. This algorithm applies a maximum-likelihood objective function in conjunction with the expectation-maximization (EM) algorithm. Minimizing the augmented objective function yields the optimal decomposition of the multiclass learning problem in two-class problems. Experimental results show the potential gain of the optimized output codes over OPC or ECOC methods.  相似文献   

6.
The One-vs-One strategy is one of the most commonly used decomposition technique to overcome multi-class classification problems; this way, multi-class problems are divided into easier-to-solve binary classification problems considering pairs of classes from the original problem, which are then learned by independent base classifiers.The way of performing the division produces the so-called non-competence. This problem occurs whenever an instance is classified, since it is submitted to all the base classifiers although the outputs of some of them are not meaningful (they were not trained using the instances from the class of the instance to be classified). This issue may lead to erroneous classifications, because in spite of their incompetence, all classifiers' decisions are usually considered in the aggregation phase.In this paper, we propose a dynamic classifier selection strategy for One-vs-One scheme that tries to avoid the non-competent classifiers when their output is probably not of interest. We consider the neighborhood of each instance to decide whether a classifier may be competent or not. In order to verify the validity of the proposed method, we will carry out a thorough experimental study considering different base classifiers and comparing our proposal with the best performer state-of-the-art aggregation within each base classifier from the five Machine Learning paradigms selected. The findings drawn from the empirical analysis are supported by the appropriate statistical analysis.  相似文献   

7.
《Information Fusion》2003,4(1):11-21
It is known that the error correcting output code (ECOC) technique, when applied to multi-class learning problems, can improve generalisation performance. One reason for the improvement is its ability to decompose the original problem into complementary two-class problems. Binary classifiers trained on the sub-problems are diverse and can benefit from combining using a simple distance-based strategy. However there is some discussion about why ECOC performs as well as it does, particularly with respect to the significance of the coding/decoding strategy. In this paper we consider the binary (0,1) code matrix conditions necessary for reduction of error in the ECOC framework, and demonstrate the desirability of equidistant codes. It is shown that equidistant codes can be generated by using properties related to the number of 1’s in each row and between any pair of rows. Experimental results on synthetic data and a few popular benchmark problems show how performance deteriorates as code length is reduced for six decoding strategies.  相似文献   

8.
《Information Fusion》2001,2(2):103-112
Two binary labelling techniques for decision-level fusion are considered for reducing correlation in the context of multiple classifier systems. First, we describe a method based on error correcting coding that uses binary code words to decompose a multi-class problem into a set of complementary two-class problems. We look at the conditions necessary for reduction of error and introduce a modified version that is less sensitive to code word selection. Second, we describe a partitioning method for two-class problems that transforms each training pattern into a vertex of the binary hypercube. A constructive algorithm for binary-to-binary mappings identifies a set of inconsistently classified patterns, random subsets of which are used to perturb base classifier training sets. Experimental results on artificial and real data, using a combination of simple neural network classifiers, demonstrate improvement in performance for these techniques, the first suitable for k-class problems, k>2 and the second for k=2.  相似文献   

9.
A common way to model multiclass classification problems is to design a set of binary classifiers and to combine them. Error-Correcting Output Codes (ECOC) represent a successful framework to deal with these type of problems. Recent works in the ECOC framework showed significant performance improvements by means of new problem-dependent designs based on the ternary ECOC framework. The ternary framework contains a larger set of binary problems because of the use of a “do not care” symbol that allows us to ignore some classes by a given classifier. However, there are no proper studies that analyze the effect of the new symbol at the decoding step. In this paper, we present a taxonomy that embeds all binary and ternary ECOC decoding strategies into four groups. We show that the zero symbol introduces two kinds of biases that require redefinition of the decoding design. A new type of decoding measure is proposed, and two novel decoding strategies are defined. We evaluate the state-of-the-art coding and decoding strategies over a set of UCI Machine Learning Repository data sets and into a real traffic sign categorization problem. The experimental results show that, following the new decoding strategies, the performance of the ECOC design is significantly improved.  相似文献   

10.
杨鹤标  王健 《计算机工程》2010,36(20):52-54
针对多关系多分类的非平衡数据,提出一种分类模型。在预处理阶段,建立目标类纠错输出编码(ECOC)、目标关系与背景关系间的虚拟连接并完成属性聚集处理,进而划分训练集和验证集。在训练阶段,依据一对多划分思想,结合CrossMine算法构造多个子分类器,采用AUC法评估验证各子分类器。在验证阶段,比较目标类ECOC与各子分类器分类结果连接字的海明距离,选择最小海明距离的目标类为最终分类。经合成和真实数据的实验,验证了模型有效性及分类效果。  相似文献   

11.
Several supervised machine learning applications are commonly represented as multi-class problems, but it is harder to distinguish several classes rather than just two classes. In contrast to the approaches one-against-all and all-pairs that transform a multi-class problem into a set of binary problems, Dichotomy Transformation (DT) converts a multi-class problem into a different problem where the goal is to verify if a pair of documents belongs to the same class or not. To perform this task, DT generates a dichotomy set obtained by combining a pair of documents, each belongs to either a positive class (documents in the pair that have the same class) or a negative class (documents in the pair that come from different classes). The definition of this dichotomy set plays an important role in the overall accuracy of the system. So, an alternative to avoid searching for the best dichotomy set is using multiple classifier systems because we can have many different sets where each one is used to train one binary classifier instead of having only one dichotomy set. Herein we propose Combined Dichotomy Transformations (CoDiT), a Text Categorization system that combines binary classifiers that are trained with different dichotomy sets using DT. By using DT, the number of training examples increases exponentially when compared with the original training set. This is a desirable property because each classifier can be trained with different data without reducing the number of examples or features. Therefore, it is possible to compose an ensemble with diverse and strong classifiers. Experiments using 14 databases show that CoDiT achieves statistically better results in comparison to SVM, Bagging, Random Subspace, BoosTexter, and Random Forest.  相似文献   

12.
Multi-Class Learning by Smoothed Boosting   总被引:1,自引:0,他引:1  
AdaBoost.OC has been shown to be an effective method in boosting “weak” binary classifiers for multi-class learning. It employs the Error-Correcting Output Code (ECOC) method to convert a multi-class learning problem into a set of binary classification problems, and applies the AdaBoost algorithm to solve them efficiently. One of the main drawbacks with the AdaBoost.OC algorithm is that it is sensitive to the noisy examples and tends to overfit training examples when they are noisy. In this paper, we propose a new boosting algorithm, named “MSmoothBoost”, which introduces a smoothing mechanism into the boosting procedure to explicitly address the overfitting problem with AdaBoost.OC. We proved the bounds for both the empirical training error and the marginal training error of the proposed boosting algorithm. Empirical studies with seven UCI datasets and one real-world application have indicated that the proposed boosting algorithm is more robust and effective than the AdaBoost.OC algorithm for multi-class learning. Editor: Nicolo Cesa-Bianchi  相似文献   

13.
A multi-class classifier based on the Bradley-Terry model predicts the multi-class label of an input by combining the outputs from multiple binary classifiers, where the combination should be a priori designed as a code word matrix. The code word matrix was originally designed to consist of +1 and ?1 codes, and was later extended into deal with ternary code {+1,0,?1}, that is, allowing 0 codes. This extension has seemed to work effectively but, in fact, contains a problem: a binary classifier forcibly categorizes examples with 0 codes into either +1 or ?1, but this forcible decision makes the prediction of the multi-class label obscure. In this article, we propose a Boosting algorithm that deals with three categories by allowing a ??don??t care?? category corresponding to 0 codes, and present a modified decoding method called a ??ternary?? Bradley-Terry model. In addition, we propose a couple of fast decoding schemes that reduce the heavy computation by the existing Bradley-Terry model-based decoding.  相似文献   

14.
Multi-class classification is one of the major challenges in real world application. Classification algorithms are generally binary in nature and must be extended for multi-class problems. Therefore, in this paper, we proposed an enhanced Genetically Optimized Neural Network (GONN) algorithm, for solving multi-class classification problems. We used a multi-tree GONN representation which integrates multiple GONN trees; each individual is a single GONN classifier. Thus enhanced classifier is an integrated version of individual GONN classifiers for all classes. The integrated version of classifiers is evolved genetically to optimize its architecture for multi-class classification. To demonstrate our results, we had taken seven datasets from UCI Machine Learning repository and compared the classification accuracy and training time of enhanced GONN with classical Koza’s model and classical Back propagation model. Our algorithm gives better classification accuracy of almost 5% and 8% than Koza’s model and Back propagation model respectively even for complex and real multi-class data in lesser amount of time. This enhanced GONN algorithm produces better results than popular classification algorithms like Genetic Algorithm, Support Vector Machine and Neural Network which makes it a good alternative to the well-known machine learning methods for solving multi-class classification problems. Even for datasets containing noise and complex features, the results produced by enhanced GONN is much better than other machine learning algorithms. The proposed enhanced GONN can be applied to expert and intelligent systems for effectively classifying large, complex and noisy real time multi-class data.  相似文献   

15.
基于证据理论的纠错输出编码解决多类分类问题   总被引:1,自引:0,他引:1  
针对多类分类问题,利用纠错输出编码作为分解框架,把多类问题转化为多个二类问题加以解决;同时提出一种基于证据理论的解码策略,把每一个二分器的输出作为证据之一进行融合,并讨论在两种编码类型(二元和三元编码矩阵)下证据融合的不同策略.通过实验分别对UCI数据集和3种一维距离像数据集进行测试,并与几种经典的解码方法进行比较,验证了所提出的方法能有效提高纠错输出编码特别是三元编码矩阵的分类正确率.  相似文献   

16.
In many remote-sensing projects, one is usually interested in a small number of land-cover classes present in a study area and not in all the land-cover classes that make-up the landscape. Previous studies in supervised classification of satellite images have tackled specific class mapping problem by isolating the classes of interest and combining all other classes into one large class, usually called others, and by developing a binary classifier to discriminate the class of interest from the others. Here, this approach is called focused approach. The strength of the focused approach is to decompose the original multi-class supervised classification problem into a binary classification problem, focusing the process on the discrimination of the class of interest. Previous studies have shown that this method is able to discriminate more accurately the classes of interest when compared with the standard multi-class supervised approach. However, it may be susceptible to data imbalance problems present in the training data set, since the classes of interest are often a small part of the training set. A result the classification may be biased towards the largest classes and, thus, be sub-optimal for the discrimination of the classes of interest. This study presents a way to minimize the effects of data imbalance problems in specific class mapping using cost-sensitive learning. In this approach errors committed in the minority class are treated as being costlier than errors committed in the majority class. Cost-sensitive approaches are typically implemented by weighting training data points accordingly to their importance to the analysis. By changing the weight of individual data points, it is possible to shift the weight from the larger classes to the smaller ones, balancing the data set. To illustrate the use of the cost-sensitive approach to map specific classes of interest, a series of experiments with weighted support vector machines classifier and Landsat Thematic Mapper data were conducted to discriminate two types of mangrove forest (high-mangrove and low-mangrove) in Saloum estuary, Senegal, a United Nations Educational, Scientific and Cultural Organisation World Heritage site. Results suggest an increase in overall classification accuracy with the use of cost-sensitive method (97.3%) over the standard multi-class (94.3%) and the focused approach (91.0%). In particular, cost-sensitive method yielded higher sensitivity and specificity values on the discrimination of the classes of interest when compared with the standard multi-class and focused approaches.  相似文献   

17.
New results on error correcting output codes of kernel machines   总被引:1,自引:0,他引:1  
We study the problem of multiclass classification within the framework of error correcting output codes (ECOC) using margin-based binary classifiers. Specifically, we address two important open problems in this context: decoding and model selection. The decoding problem concerns how to map the outputs of the classifiers into class codewords. In this paper we introduce a new decoding function that combines the margins through an estimate of their class conditional probabilities. Concerning model selection, we present new theoretical results bounding the leave-one-out (LOO) error of ECOC of kernel machines, which can be used to tune kernel hyperparameters. We report experiments using support vector machines as the base binary classifiers, showing the advantage of the proposed decoding function over other functions of I he margin commonly used in practice. Moreover, our empirical evaluations on model selection indicate that the bound leads to good estimates of kernel parameters.  相似文献   

18.
The One-vs-One strategy is among the most used techniques to deal with multi-class problems in Machine Learning. This way, any binary classifier can be used to address the original problem, since one classifier is learned for each possible pair of classes. As in every ensemble method, classifier combination becomes a vital step in the classification process. Even though many combination models have been developed in the literature, none of them have dealt with the possibility of reducing the number of generated classifiers after the training phase, i.e., ensemble pruning, since every classifier is supposed to be necessary.On this account, our objective in this paper is two-fold: (1) We propose a transformation of the aggregation step, which lead us to a new combination strategy where instances are classified on the basis of the similarities among score-matrices. (2) This fact allows us to introduce the possibility of reducing the number of binary classifiers without affecting the final accuracy. We will show that around 50% of classifiers can be removed (depending on the base learner and the specific problem) and that the confidence degrees obtained by these base classifiers have a strong influence on the improvement in the final accuracy.A thorough experimental study is carried out in order to show the behavior of the proposed approach in comparison with the state-of-the-art combination models in the One-vs-One strategy. Different classifiers from various Machine Learning paradigms are considered as base classifiers and the results obtained are contrasted with the proper statistical analysis.  相似文献   

19.
ContextNumerous software design patterns have been introduced and cataloged either as a canonical or a variant solution to solve a design problem. The existing automatic techniques for design pattern(s) selection aid novice software developers to select the more appropriate design pattern(s) from the list of applicable patterns to solve a design problem in the designing phase of software development life cycle.GoalHowever, the existing automatic techniques are limited to the semi-formal specification, multi-class problem, an adequate sample size to make precise learning and individual classifier training in order to determine a candidate design pattern class and suggest more appropriate pattern(s).MethodTo address these issues, we exploit a text categorization based approach via Fuzzy c-means (unsupervised learning technique) that targets to present a systematic way to group the similar design patterns and suggest the appropriate design pattern(s) to developers related to the specification of a given design problem. We also propose an evaluation model to assess the effectiveness of the proposed approach in the context of several real design problems and design pattern collections. Subsequently, we also propose a new feature selection method Ensemble-IG to overcome the multi-class problem and improve the classification performance of the proposed approach.ResultsThe promising experimental results suggest the applicability of the proposed approach in the domain of classification and selection of appropriate design patterns. Subsequently, we also observed the significant improvement in learning precision of the proposed approach through Ensemble-IG.ConclusionThe proposed approach has four advantages as compared to previous work. First, the semi-formal specification of design patterns is not required as a prerequisite; second, the ground reality of class label assignment is not mandatory; third, lack of classifier’s training for each design pattern class and fourth, an adequate sample size is not required to make precise learning.  相似文献   

20.
基于结构风险最小化原则的支持向量机(SVM)对小样本决策具有较好的学习推广性。但由于常规SVM算法是从2类分类问题推导出的,在解决故障诊断这种典型的多类分类问题时存在因雄,因而提出一种依赖故障优先级的基于SVM的二叉树多级分类器实现(2PTMC)方法,该方法具有简单、直观,重复训练样本少的优点。通过将其应用于柴油机振动信号的故障诊断,获得了令人满意的效果。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号