首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 210 毫秒
1.
Prototype classifiers are a type of pattern classifiers, whereby a number of prototypes are designed for each class so as they act as representatives of the patterns of the class. Prototype classifiers are considered among the simplest and best performers in classification problems. However, they need careful positioning of prototypes to capture the distribution of each class region and/or to define the class boundaries. Standard methods, such as learning vector quantization (LVQ), are sensitive to the initial choice of the number and the locations of the prototypes and the learning rate. In this article, a new prototype classification method is proposed, namely self-generating prototypes (SGP). The main advantage of this method is that both the number of prototypes and their locations are learned from the training set without much human intervention. The proposed method is compared with other prototype classifiers such as LVQ, self-generating neural tree (SGNT) and K-nearest neighbor (K-NN) as well as Gaussian mixture model (GMM) classifiers. In our experiments, SGP achieved the best performance in many measures of performance, such as training speed, and test or classification speed. Concerning number of prototypes, and test classification accuracy, it was considerably better than the other methods, but about equal on average to the GMM classifiers. We also implemented the SGP method on the well-known STATLOG benchmark, and it beat all other 21 methods (prototype methods and non-prototype methods) in classification accuracy.  相似文献   

2.
A conventional way to discriminate between objects represented by dissimilarities is the nearest neighbor method. A more efficient and sometimes a more accurate solution is offered by other dissimilarity-based classifiers. They construct a decision rule based on the entire training set, but they need just a small set of prototypes, the so-called representation set, as a reference for classifying new objects. Such alternative approaches may be especially advantageous for non-Euclidean or even non-metric dissimilarities.The choice of a proper representation set for dissimilarity-based classifiers is not yet fully investigated. It appears that a random selection may work well. In this paper, a number of experiments has been conducted on various metric and non-metric dissimilarity representations and prototype selection methods. Several procedures, like traditional feature selection methods (here effectively searching for prototypes), mode seeking and linear programming are compared to the random selection. In general, we find out that systematic approaches lead to better results than the random selection, especially for a small number of prototypes. Although there is no single winner as it depends on data characteristics, the k-centres works well, in general. For two-class problems, an important observation is that our dissimilarity-based discrimination functions relying on significantly reduced prototype sets (3-10% of the training objects) offer a similar or much better classification accuracy than the best k-NN rule on the entire training set. This may be reached for multi-class data as well, however such problems are more difficult.  相似文献   

3.
We present a new classifier fusion method to combine soft-level classifiers with a new approach, which can be considered as a generalized decision templates method. Previous combining methods based on decision templates employ a single prototype for each class, but this global point of view mostly fails to properly represent the decision space. This drawback extremely affects the classification rate in such cases: insufficient number of training samples, island-shaped decision space distribution, and classes with highly overlapped decision spaces. To better represent the decision space, we utilize a prototype selection method to obtain a set of local decision prototypes for each class. Afterward, to determine the class of a test pattern, its decision profile is computed and then compared to all decision prototypes. In other words, for each class, the larger the numbers of decision prototypes near to the decision profile of a given pattern, the higher the chance for that class. The efficiency of our proposed method is evaluated over some well-known classification datasets suggesting superiority of our method in comparison with other proposed techniques.  相似文献   

4.
A novel neuralnet-based method of constructing optimized prototypes for nearest-neighbor classifiers is proposed. Based on an effective classification oriented error function containing class classification and class separation components, the corresponding prototype and feature weight update rules are derived. The proposed method consists of several distinguished properties. First, not only prototypes but also feature weights are constructed during the optimization process. Second, several instead of one prototype not belonging to the genuine class of input sample x are updated when x is classified incorrectly. Third, it intrinsically distinguishes different learning contribution from training samples, which enables a large amount of learning from constructive samples, and limited learning from outliers. Experiments have shown the superiority of this method compared with LVQ2 and other previous works.  相似文献   

5.
This paper presents a new approach to Particle Swarm Optimization, called Michigan Approach PSO (MPSO), and its application to continuous classification problems as a Nearest Prototype (NP) classifier. In Nearest Prototype classifiers, a collection of prototypes has to be found that accurately represents the input patterns. The classifier then assigns classes based on the nearest prototype in this collection. The MPSO algorithm is used to process training data to find those prototypes. In the MPSO algorithm each particle in a swarm represents a single prototype in the solution and it uses modified movement rules with particle competition and cooperation that ensure particle diversity. The proposed method is tested both with artificial problems and with real benchmark problems and compared with several algorithms of the same family. Results show that the particles are able to recognize clusters, find decision boundaries and reach stable situations that also retain adaptation potential. The MPSO algorithm is able to improve the accuracy of 1-NN classifiers, obtains results comparable to the best among other classifiers, and improves the accuracy reported in literature for one of the problems.
Pedro IsasiEmail:
  相似文献   

6.
Prototype classifiers have been studied for many years. However, few methods can realize incremental learning. On the other hand, most prototype classifiers need users to predetermine the number of prototypes; an improper prototype number might undermine the classification performance. To deal with these issues, in the paper we propose an online supervised algorithm named Incremental Learning Vector Quantization (ILVQ) for classification tasks. The proposed method has three contributions. (1) By designing an insertion policy, ILVQ incrementally learns new prototypes, including both between-class incremental learning and within-class incremental learning. (2) By employing an adaptive threshold scheme, ILVQ automatically learns the number of prototypes needed for each class dynamically according to the distribution of training data. Therefore, unlike most current prototype classifiers, ILVQ needs no prior knowledge of the number of prototypes or their initial value. (3) A technique for removing useless prototypes is used to eliminate noise interrupted into the input data. Results of experiments show that the proposed ILVQ can accommodate the incremental data environment and provide good recognition performance and storage efficiency.  相似文献   

7.
The k-nearest neighbour rule is commonly considered for classification tasks given its straightforward implementation and good performance in many applications. However, its efficiency represents an obstacle in real-case scenarios because the classification requires computing a distance to every single prototype of the training set. Prototype Selection (PS) is a typical approach to alleviate this problem, which focuses on reducing the size of the training set by selecting the most interesting prototypes. In this context, rank methods have been postulated as a good solution: following some heuristics, these methods perform an ordering of the prototypes according to their relevance in the classification task, which is then used to select the most relevant ones. This work presents a significant improvement of existing rank methods by proposing two extensions: (i) a greater robustness against noise at label level by considering the parameter ‘k’ of the classification in the selection process; and (ii) a new parameter-free rule to select the prototypes once they have been ordered. The experiments performed in different scenarios and datasets demonstrate the goodness of these extensions. Also, it is empirically proved that the new full approach is competitive with respect to existing PS algorithms.  相似文献   

8.
In this paper, we propose a prototype classification method that employs a learning process to determine both the number and the location of prototypes. This learning process decides whether to stop adding prototypes according to a certain termination condition, and also adjusts the location of prototypes using either the K-means (KM) or the fuzzy c-means (FCM) clustering algorithms. When the prototype classification method is applied, the support vector machine (SVM) method can be used to post-process the top-rank candidates obtained during the prototype learning or matching process. We apply this hybrid solution to handwriting recognition and address the convergence behavior and runtime consumption of the prototype construction process, and discuss how to combine our prototype classifier with SVM classifiers to form an effective hybrid classifier.  相似文献   

9.
The k nearest neighbor (k-NN) classifier has been a widely used nonparametric technique in Pattern Recognition, because of its simplicity and good performance. In order to decide the class of a new prototype, the k-NN classifier performs an exhaustive comparison between the prototype to classify and the prototypes in the training set T. However, when T is large, the exhaustive comparison is expensive. For this reason, many fast k-NN classifiers have been developed, some of them are based on a tree structure, which is created during a preprocessing phase using the prototypes in T. Then, in a search phase, the tree is traversed to find the nearest neighbor. The speed up is obtained, while the exploration of some parts of the tree is avoided using pruning rules which are usually based on the triangle inequality. However, in soft sciences as Medicine, Geology, Sociology, etc., the prototypes are usually described by numerical and categorical attributes (mixed data), and sometimes the comparison function for computing the similarity between prototypes does not satisfy metric properties. Therefore, in this work an approximate fast k most similar neighbor classifier, for mixed data and similarity functions that do not satisfy metric properties, based on a tree structure (Tree k-MSN) is proposed. Some experiments with synthetic and real data are presented.  相似文献   

10.
This paper presents some new approaches for computing graph prototypes in the context of the design of a structural nearest prototype classifier. Four kinds of prototypes are investigated and compared: set median graphs, generalized median graphs, set discriminative graphs and generalized discriminative graphs. They differ according to (i) the graph space where they are searched for and (ii) the objective function which is used for their computation. The first criterion allows to distinguish set prototypes which are selected in the initial graph training set from generalized prototypes which are generated in an infinite set of graphs. The second criterion allows to distinguish median graphs which minimize the sum of distances to all input graphs of a given class from discriminative graphs, which are computed using classification performance as criterion, taking into account the inter-class distribution. For each kind of prototype, the proposed approach allows to identify one or many prototypes per class, in order to manage the trade-off between the classification accuracy and the classification time.Each graph prototype generation/selection is performed through a genetic algorithm which can be specialized to each case by setting the appropriate encoding scheme, fitness and genetic operators.An experimental study performed on several graph databases shows the superiority of the generation approach over the selection one. On the other hand, discriminative prototypes outperform the generative ones. Moreover, we show that the classification rates are improved while the number of prototypes increases. Finally, we show that discriminative prototypes give better results than the median graph based classifier.  相似文献   

11.
The problem of selecting of prototypes is to select a subset in the learning sample for which the set of minimum cardinality would provide the optimum of a given learning quality functional. In this article the problem of classification is considered in two classes, the method of classification by nearest neighbor, and three functional characteristics: the frequency of errors on the entire sample, a cross validation with one separated object, and a complete cross validation with k separated objects. It is shown that the problem of selection of prototypes in all three cases is NP-complete, which justifies the use of well-known heuristic methods for the prototype search.  相似文献   

12.
In this paper, an efficient K-medians clustering (unsupervised) algorithm for prototype selection and Supervised K-medians (SKM) classification technique for protein sequences are presented. For sequence data sets, a median string/sequence can be used as the cluster/group representative. In K-medians clustering technique, a desired number of clusters, K, each represented by a median string/sequence, is generated and these median sequences are used as prototypes for classifying the new/test sequence whereas in SKM classification technique, median sequence in each group/class of labelled protein sequences is determined and the set of median sequences is used as prototypes for classification purpose. It is found that the K-medians clustering technique outperforms the leader based technique and also SKM classification technique performs better than that of motifs based approach for the data sets used. We further use a simple technique to reduce time and space requirements during protein sequence clustering and classification. During training and testing phase, the similarity score value between a pair of sequences is determined by selecting a portion of the sequence instead of the entire sequence. It is like selecting a subset of features for sequence data sets. The experimental results of the proposed method on K-medians, SKM and Nearest Neighbour Classifier (NNC) techniques show that the Classification Accuracy (CA) using the prototypes generated/used does not degrade much but the training and testing time are reduced significantly. Thus the experimental results indicate that the similarity score does not need to be calculated by considering the entire length of the sequence for achieving a good CA. Even space requirement is reduced during both training and classification.  相似文献   

13.
This paper proposes a non-parametric method for the classification of thin-layer chromatographic (TLC) images from patterns represented in a dissimilarity space. Each pattern corresponds to a mixture of Gaussian approximation of the intensity profile. The methodology comprises various phases, including image processing and analysis steps to extract the chromatographic profiles and a classification phase to discriminate among two groups, one corresponding to normal cases and the other to three pathological classes. We present an extensive study of several dissimilarity-based approaches analysing the influence of the dissimilarity measure and the prototype selection method on the classification performance. The main conclusions of this paper are that, Match and Profile-difference dissimilarity measures present better results, and a new prototype selection methodology achieves a performance similar or even better than conventional methods. Furthermore, we also concluded that simplest classifiers, such as k-NN and linear discriminant classifiers (LDCs), present good performance being the overall classification error less than 10% for the four-class problem.  相似文献   

14.
A simple yet effective learning algorithm, k locally constrained line (k-LCL), is presented for pattern classification. In k-LCL, any two prototypes of the same class are extended to a constrained line (CL), through which the representational capacity of the training set is largely improved. Because each CL is adjustable in length, k-LCL can well avoid the “intersecting” of training subspaces in most traditional feature classifiers. Moreover, to speed up the calculation, k-LCL classifies an unknown sample focusing only on its local CLs in each class. Experimental results, obtained on both synthetic and real-world benchmark data sets, show that the proposed method has better accuracy and efficiency than most existing feature line methods.  相似文献   

15.
In solving pattern recognition problems, many classification methods, such as the nearest-neighbor (NN) rule, need to determine prototypes from a training set. To improve the performance of these classifiers in finding an efficient set of prototypes, this paper introduces a training sample sequence planning method. In particular, by estimating the relative nearness of the training samples to the decision boundary, the approach proposed here incrementally increases the number of prototypes until the desired classification accuracy has been reached. This approach has been tested with a NN classification method and a neural network training approach. Studies based on both artificial and real data demonstrate that higher classification accuracy can be achieved with fewer prototypes.  相似文献   

16.
Active learning is understood as any form of learning in which the learning algorithm has some control over the input samples due to a specific sample selection process based on which it builds up the model. In this paper, we propose a novel active learning strategy for data-driven classifiers, which is based on unsupervised criterion during off-line training phase, followed by a supervised certainty-based criterion during incremental on-line training. In this sense, we call the new strategy hybrid active learning. Sample selection in the first phase is conducted from scratch (i.e. no initial labels/learners are needed) based on purely unsupervised criteria obtained from clusters: samples lying near cluster centers and near the borders of clusters are expected to represent the most informative ones regarding the distribution characteristics of the classes. In the second phase, the task is to update already trained classifiers during on-line mode with the most important samples in order to dynamically guide the classifier to more predictive power. Both strategies are essential for reducing the annotation and supervision effort of operators in off-line and on-line classification systems, as operators only have to label an exquisite subset of the off-line training data resp. give feedback only on specific occasions during on-line phase. The new active learning strategy is evaluated based on real-world data sets from UCI repository and collected at on-line quality control systems. The results show that an active learning based selection of training samples (1) does not weaken the classification accuracies compared to when using all samples in the training process and (2) can out-perform classifiers which are built on randomly selected data samples.  相似文献   

17.
Soft nearest prototype classification   总被引:3,自引:0,他引:3  
We propose a new method for the construction of nearest prototype classifiers which is based on a Gaussian mixture ansatz and which can be interpreted as an annealed version of learning vector quantization (LVQ). The algorithm performs a gradient descent on a cost-function minimizing the classification error on the training set. We investigate the properties of the algorithm and assess its performance for several toy data sets and for an optical letter classification task. Results show 1) that annealing in the dispersion parameter of the Gaussian kernels improves classification accuracy; 2) that classification results are better than those obtained with standard learning vector quantization (LVQ 2.1, LVQ 3) for equal numbers of prototypes; and 3) that annealing of the width parameter improved the classification capability. Additionally, the principled approach provides an explanation of a number of features of the (heuristic) LVQ methods.  相似文献   

18.
A method of prototype sample selection from a training set for a classifier of K nearest neighbors (KNN), based on minimization of the complete cross validation functional, is proposed. The optimization leads to reduction of the training set to the minimum sufficient number of prototypes, removal (censoring) of noise samples, and improvement of the generalization ability, simultaneously.  相似文献   

19.
In supervised learning, a training set providing previously known information is used to classify new instances. Commonly, several instances are stored in the training set but some of them are not useful for classifying therefore it is possible to get acceptable classification rates ignoring non useful cases; this process is known as instance selection. Through instance selection the training set is reduced which allows reducing runtimes in the classification and/or training stages of classifiers. This work is focused on presenting a survey of the main instance selection methods reported in the literature.  相似文献   

20.

In dynamic ensemble selection (DES) techniques, only the most competent classifiers, for the classification of a specific test sample, are selected to predict the sample’s class labels. The key in DES techniques is estimating the competence of the base classifiers for the classification of each specific test sample. The classifiers’ competence is usually estimated according to a given criterion, which is computed over the neighborhood of the test sample defined on the validation data, called the region of competence. A problem arises when there is a high degree of noise in the validation data, causing the samples belonging to the region of competence to not represent the query sample. In such cases, the dynamic selection technique might select the base classifier that overfitted the local region rather than the one with the best generalization performance. In this paper, we propose two modifications in order to improve the generalization performance of any DES technique. First, a prototype selection technique is applied over the validation data to reduce the amount of overlap between the classes, producing smoother decision borders. During generalization, a local adaptive K-Nearest Neighbor algorithm is used to minimize the influence of noisy samples in the region of competence. Thus, DES techniques can better estimate the classifiers’ competence. Experiments are conducted using 10 state-of-the-art DES techniques over 30 classification problems. The results demonstrate that the proposed scheme significantly improves the classification accuracy of dynamic selection techniques.

  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号