首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 33 毫秒
1.
The concept of multiobjective optimization (MOO) has been integrated with variable length chromosomes for the development of a nonparametric genetic classifier which can overcome the problems, like overfitting/overlearning and ignoring smaller classes, as faced by single objective classifiers. The classifier can efficiently approximate any kind of linear and/or nonlinear class boundaries of a data set using an appropriate number of hyperplanes. While designing the classifier the aim is to simultaneously minimize the number of misclassified training points and the number of hyperplanes, and to maximize the product of class wise recognition scores. The concepts of validation set (in addition to training and test sets) and validation functional are introduced in the multiobjective classifier for selecting a solution from a set of nondominated solutions provided by the MOO algorithm. This genetic classifier incorporates elitism and some domain specific constraints in the search process, and is called the CEMOGA-Classifier (constrained elitist multiobjective genetic algorithm based classifier). Two new quantitative indices, namely, the purity and minimal spacing, are developed for evaluating the performance of different MOO techniques. These are used, along with classification accuracy, required number of hyperplanes and the computation time, to compare the CEMOGA-Classifier with other related ones.  相似文献   

2.
Genetic algorithms for generation of class boundaries.   总被引:3,自引:0,他引:3  
A method is described for finding decision boundaries, approximated by piecewise linear segments, for classifying patterns in R(N),N>/=2, using an elitist model of genetic algorithms. It involves generation and placement of a set of hyperplanes (represented by strings) in the feature space that yields minimum misclassification. A scheme for the automatic deletion of redundant hyperplanes is also developed in case the algorithm starts with an initial conservative estimate of the number of hyperplanes required for modeling the decision boundary. The effectiveness of the classification methodology, along with the generalization ability of the decision boundary, is demonstrated for different parameter values on both artificial data and real life data sets having nonlinear/overlapping class boundaries. Results are compared extensively with those of the Bayes classifier, k-NN rule and multilayer perceptron.  相似文献   

3.
On piecewise-linear classification   总被引:2,自引:0,他引:2  
The authors make use of a real data set containing 9-D measurements of fine needle aspirates of a patient's breast for the purpose of classifying a tumor's malignancy for which early stopping in the generation of the separating hyperplanes is not appropriate. They compare a piecewise-linear classification method with classification based on a single linear separator. A precise methodology for comparing the relative efficacy of two classification methods for a particular task is described and is applied to the comparison on the breast cancer data of the relative performances of the two versions of the piecewise-linear classifier and the classification based on an optimal linear separator. It is found that for this data set, the piecewise-linear classifier that uses all the hyperplanes needed to separate the training set outperforms the other two methods and that these differences in performance are significant at the 0.001 level. There is no statistically significant difference between the performance of the other two methods. The authors discuss the relevance of these results for this and other applications  相似文献   

4.
介绍了在没有数据分布先验知识的情况下,用进化方法直接从训练数据中建立紧致模糊分类系统的方法。使用VISIT算法获取每个个体模糊系统,再用遗传算法从中搜索最优的模糊系统。规则和隶属函数是在进化过程中自动建立和优化的。为了同时有效地评价系统的精度和紧致性,用一个模糊专家系统作适应度函数。在2个基准分类问题上的实验结果表明了新方法的有效性。  相似文献   

5.
In this paper, an imprecise data classification is considered using new version of Fisher discriminator, namely interval Fisher. In the conventional formulation of Fisher, elements of within‐class scatter matrix (related to covariance matrix between clusters) and between‐class scatter matrix (related to covariance matrix of centers of clusters) have single values; but in the interval Fisher, the elements of matrices are in the interval form and can vary in a range. The particle swarm optimization search method is used for solving a constrained optimization problem of the interval Fisher discriminator. Unlike conventional Fisher with one optimal hyperplane, interval Fisher gives two optimal hyperplanes thereupon three decision regions are obtained. Two classes with regard to imprecise scatter matrices are derived by decision making using these optimal hyperplanes. Also, fuzzy region lets us help in fuzzy decision over input test samples. Unlike a support vector classifier with two parallel hyperplanes, interval Fisher generally gives us two nonparallel hyperplanes. Experimental results show the suitability of this idea. © 2011 Wiley Periodicals, Inc.  相似文献   

6.
LESS: a model-based classifier for sparse subspaces   总被引:1,自引:0,他引:1  
In this paper, we specifically focus on high-dimensional data sets for which the number of dimensions is an order of magnitude higher than the number of objects. From a classifier design standpoint, such small sample size problems have some interesting challenges. The first challenge is to find, from all hyperplanes that separate the classes, a separating hyperplane which generalizes well for future data. A second important task is to determine which features are required to distinguish the classes. To attack these problems, we propose the LESS (Lowest Error in a Sparse Subspace) classifier that efficiently finds linear discriminants in a sparse subspace. In contrast with most classifiers for high-dimensional data sets, the LESS classifier incorporates a (simple) data model. Further, by means of a regularization parameter, the classifier establishes a suitable trade-off between subspace sparseness and classification accuracy. In the experiments, we show how LESS performs on several high-dimensional data sets and compare its performance to related state-of-the-art classifiers like, among others, linear ridge regression with the LASSO and the Support Vector Machine. It turns out that LESS performs competitively while using fewer dimensions.  相似文献   

7.
Empirical characterization of random forest variable importance measures   总被引:2,自引:0,他引:2  
Microarray studies yield data sets consisting of a large number of candidate predictors (genes) on a small number of observations (samples). When interest lies in predicting phenotypic class using gene expression data, often the goals are both to produce an accurate classifier and to uncover the predictive structure of the problem. Most machine learning methods, such as k-nearest neighbors, support vector machines, and neural networks, are useful for classification. However, these methods provide no insight regarding the covariates that best contribute to the predictive structure. Other methods, such as linear discriminant analysis, require the predictor space be substantially reduced prior to deriving the classifier. A recently developed method, random forests (RF), does not require reduction of the predictor space prior to classification. Additionally, RF yield variable importance measures for each candidate predictor. This study examined the effectiveness of RF variable importance measures in identifying the true predictor among a large number of candidate predictors. An extensive simulation study was conducted using 20 levels of correlation among the predictor variables and 7 levels of association between the true predictor and the dichotomous response. We conclude that the RF methodology is attractive for use in classification problems when the goals of the study are to produce an accurate classifier and to provide insight regarding the discriminative ability of individual predictor variables. Such goals are common among microarray studies, and therefore application of the RF methodology for the purpose of obtaining variable importance measures is demonstrated on a microarray data set.  相似文献   

8.
In this article the effectiveness of some recently developed genetic algorithm-based pattern classifiers was investigated in the domain of satellite imagery which usually have complex and overlapping class boundaries. Landsat data, SPOT image and IRS image are considered as input. The superiority of these classifiers over k-NN rule, Bayes' maximum likelihood classifier and multilayer perceptron (MLP) for partitioning different landcover types is established. Results based on producer's accuracy (percentage recognition score), user's accuracy and kappa values are provided. Incorporation of the concept of variable length chromosomes and chromosome discrimination led to superior performance in terms of automatic evolution of the number of hyperplanes for modelling the class boundaries, and the convergence time. This non-parametric classifier requires very little a priori information, unlike k-NN rule and MLP (where the performance depends heavily on the value of k and the architecture, respectively), and Bayes' maximum likelihood classifier (where assumptions regarding the class distribution functions need to be made).  相似文献   

9.
A simple and fast multi-class piecewise linear classifier is proposed and implemented. For a pair of classes, the piecewise linear boundary is a collection of segments of hyperplanes created as perpendicular bisectors of line segments linking centroids of the classes or parts of classes. For a multi-class problem, a binary partition tree is initially created which represents a hierarchical division of given pattern classes into groups, with each non-leaf node corresponding to some group. After that, a piecewise linear boundary is constructed for each non-leaf node of the partition tree as for a two-class problem. The resulting piecewise linear boundary is a set of boundaries corresponding to all non-leaf nodes of the tree. The basic data structures of algorithms of synthesis of a piecewise linear classifier and classification of unknown patterns are described. The proposed classifier is compared with a number of known pattern classifiers by benchmarking with the use of real-world data sets.  相似文献   

10.
An algorithm OnSVM of the kernel-based classification is proposed which solution is very close to -SVM an efficient modification of support vectors machine. The algorithm is faster than batch implementations of -SVM and has a smaller resulting number of support vectors. The approach developed maximizes a margin between a pair of hyperplanes in feature space and can be used in online setup. A ternary classifier of 2-class problem with an “unknown” decision is constructed using these hyperplanes.  相似文献   

11.
When choosing a classification rule, it is important to take into account the amount of sample data available. This paper examines the performances of classifiers of differing complexities in relation to the complexity of feature-label distributions in the case of small samples. We define the distributional complexity of a feature-label distribution to be the minimal number of hyperplanes necessary to achieve the Bayes classifier if the Bayes classifier is achievable by a finite number of hyperplanes, and infinity otherwise. Our approach is to choose a model and compare classifier efficiencies for various sample sizes and distributional complexities. Simulation results are obtained by generating data based on the model and the distributional complexities. A linear support vector machine (SVM) is considered, along with several nonlinear classifiers. For the most part, we see that there is little improvement when one uses a complex classifier instead of a linear SVM. For higher levels of distributional complexity, the linear classifier degrades, but so do the more complex classifiers owing to insufficient training data. Hence, if one were to obtain a good result with a more complex classifier, it is most likely that the distributional complexity is low and there is no gain over using a linear classifier. Hence, under the model, it is generally impossible to claim that use of the nonlinear classifier is beneficial. In essence, the sample sizes are too small to take advantage of the added complexity. An exception to this observation is the behavior of the three-nearest-neighbor (3NN) classifier in the case of two variables (but not three) when there is very little overlap between the label distributions and the sample size is not too small. With a sample size of 60, the 3NN classifier performs close to the Bayes classifier, even for high levels of distributional complexity. Consequently, if one uses the 3NN classifier with two variables and obtains a low error, then the distributional complexity might be large and, if such is the case, there is a significant gain over using a linear classifier.  相似文献   

12.
As the number of documents has been rapidly increasing in recent time, automatic text categorization is becoming a more important and fundamental task in information retrieval and text mining. Accuracy and interpretability are two important aspects of a text classifier. While the accuracy of a classifier measures the ability to correctly classify unseen data, interpretability is the ability of the classifier to be understood by humans and provide reasons why each data instance is assigned to a label. This paper proposes an interpretable classification method by exploiting the Dirichlet process mixture model of von Mises–Fisher distributions for directional data. By using the labeled information of the training data explicitly and determining automatically the number of topics for each class, the learned topics are coherent, relevant and discriminative. They help interpret as well as distinguish classes. Our experimental results showed the advantages of our approach in terms of separability, interpretability and effectiveness in classification task of datasets with high dimension and complex distribution. Our method is highly competitive with state-of-the-art approaches.  相似文献   

13.
方敏  王宝树 《计算机科学》2003,30(10):52-54
The fuzzy associative classifier is investigated in this paper. The design methods of the fuzzy associative classifier with genetic algorithm for training are presented. This method trains the weight and back terms to obtain classification rules automatically. Radar radiant points are classified by using of this algorithm, and the simulation results show that the method has higher identification precision than available fuzzy classifiers.  相似文献   

14.
实际的分类数据往往是分布不均衡的.传统的分类器大都会倾向多数类而忽略少数类,导致分类性能恶化.针对该问题提出一种基于变分贝叶斯推断最优高斯混合模型(varition Bayesian-optimized optimal Gaussian mixture model, VBoGMM)的自适应不均衡数据综合采样法. VBoGMM可自动衰减到真实的高斯成分数,实现任意数据的最优分布估计;进而基于所获得的分布特性对少数类样本进行自适应综合过采样,并采用Tomek-link对准则对采样数据进行清洗以获得相对均衡的数据集用于后续的分类模型学习.在多个公共不均衡数据集上进行大量的验证和对比实验,结果表明:所提方法能在实现样本均衡化的同时,维持多数类与少数类样本空间分布特性,因而能有效提升传统分类模型在不均衡数据集上的分类性能.  相似文献   

15.
High-dimensional data encountered in genomic and proteomic studies are often limited by the sample size but has a higher number of predictor variables. Therefore selecting the most relevant variables that are correlated with the outcome variable is a crucial step. This paper describes an approach for selecting a set of optimal variables to achieve a classification model with high predictive accuracy. The work described using a biological classifier published elsewhere but it can be generalized for any application.  相似文献   

16.
Support vector machines (SVMs), initially proposed for two-class classification problems, have been very successful in pattern recognition problems. For multi-class classification problems, the standard hyperplane-based SVMs are made by constructing and combining several maximal-margin hyperplanes, and each class of data is confined into a certain area constructed by those hyperplanes. Instead of using hyperplanes, hyperspheres that tightly enclosed the data of each class can be used. Since the class-specific hyperspheres are constructed for each class separately, the spherical-structured SVMs can be used to deal with the multi-class classification problem easily. In addition, the center and radius of the class-specific hypersphere characterize the distribution of examples from that class, and may be useful for dealing with imbalance problems. In this paper, we incorporate the concept of maximal margin into the spherical-structured SVMs. Besides, the proposed approach has the advantage of using a new parameter on controlling the number of support vectors. Experimental results show that the proposed method performs well on both artificial and benchmark datasets.  相似文献   

17.
H/α-Wishart分类方法是目前常用且较为有效的极化SAR影像分类方法,但其分类精度还有待改善。研究一种基于遗传算法的极化SAR影像的分类方法,该方法根据极化SAR影像Cloude特征分解的特征值,采用H/α平面进行初分类,然后采用遗传算法迭代进行再次分类。针对遗传算法“早熟”和收敛速度慢的问题,结合H/α平面图对遗传算法的变异算子进行了改进,以利用极化散射机理缩小变异范围,改善算法收敛速度。采用NASA-JPL实验室的极化SAR数据以及中国电子科技集团38研究X波段原型样机的高分辨率极化SAR数据进行实验,结果表明:该方法极化SAR影像分类精度优于H/α-Wishart分类方法。  相似文献   

18.
In this paper, a curve fitting space (CFS) is presented to map non-linearly separable data to linearly separable ones. A linear or quadratic transformation maps data into a new space for better classification, if the transformation method is properly guessed. This new CFS space can be of high or low dimensionality, and the number of dimensions is generally low, and it is equal to the number of classes. The CFS method is based on fitting a hyperplane or curve to the learning data or enclosing them into a hypersurface. In the proposed method, the hyperplanes, curves, or cortex become the axis of the new space. In the new space, a linear support vector machine multi-class classifier is applied to classify the learn data.  相似文献   

19.
Evolutionary design of a fuzzy classifier from data   总被引:6,自引:0,他引:6  
Genetic algorithms show powerful capabilities for automatically designing fuzzy systems from data, but many proposed methods must be subjected to some minimal structure assumptions, such as rule base size. In this paper, we also address the design of fuzzy systems from data. A new evolutionary approach is proposed for deriving a compact fuzzy classification system directly from data without any a priori knowledge or assumptions on the distribution of the data. At the beginning of the algorithm, the fuzzy classifier is empty with no rules in the rule base and no membership functions assigned to fuzzy variables. Then, rules and membership functions are automatically created and optimized in an evolutionary process. To accomplish this, parameters of the variable input spread inference training (VISIT) algorithm are used to code fuzzy systems on the training data set. Therefore, we can derive each individual fuzzy system via the VISIT algorithm, and then search the best one via genetic operations. To evaluate the fuzzy classifier, a fuzzy expert system acts as the fitness function. This fuzzy expert system can effectively evaluate the accuracy and compactness at the same time. In the application section, we consider four benchmark classification problems: the iris data, wine data, Wisconsin breast cancer data, and Pima Indian diabetes data. Comparisons of our method with others in the literature show the effectiveness of the proposed method.  相似文献   

20.
This paper proposes a probabilistic variant of the SOM-kMER (Self Organising Map-kernel-based Maximum Entropy learning Rule) model for data classification. The classifier, known as pSOM-kMER (probabilistic SOM-kMER), is able to operate in a probabilistic environment and to implement the principles of statistical decision theory in undertaking classification problems. A distinctive feature of pSOM-kMER is its ability in revealing the underlying structure of data. In addition, the Receptive Field (RF) regions generated can be used for variable kernel and non-parametric density estimation. Empirical evaluation using benchmark datasets shows that pSOM-kMER is able to achieve good performance as compared with those from a number of machine learning systems. The applicability of the proposed model as a useful data classifier is also demonstrated with a real-world medical data classification problem.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号