首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A method for pattern classification using genetic algorithms (GAs) has been recently described in Pal, Bandyopadhyay and Murthy (1998), where the class boundaries of a data set are approximated by a fixed number H of hyperplanes. As a consequence of fixing H a priori, the classifier suffered from the limitation of overfitting (or underfitting) the training data with an associated loss of its generalization capability. In this paper, we propose a scheme for evolving the value of H automatically using the concept of variable length strings/chromosomes. The crossover and mutation operators are newly defined in order to handle variable string lengths. The fitness function ensures primarily the minimization of the number of misclassified samples, and also the reduction of the number of hyperplanes. Based on an analogy between the classification principles of the genetic classifier and multilayer perceptron (with hard limiting neurons), a method for automatically determining the architecture and the connection weights of the latter is described.  相似文献   

2.
Genetic algorithms for generation of class boundaries.   总被引:3,自引:0,他引:3  
A method is described for finding decision boundaries, approximated by piecewise linear segments, for classifying patterns in R(N),N>/=2, using an elitist model of genetic algorithms. It involves generation and placement of a set of hyperplanes (represented by strings) in the feature space that yields minimum misclassification. A scheme for the automatic deletion of redundant hyperplanes is also developed in case the algorithm starts with an initial conservative estimate of the number of hyperplanes required for modeling the decision boundary. The effectiveness of the classification methodology, along with the generalization ability of the decision boundary, is demonstrated for different parameter values on both artificial data and real life data sets having nonlinear/overlapping class boundaries. Results are compared extensively with those of the Bayes classifier, k-NN rule and multilayer perceptron.  相似文献   

3.
In this paper a new framework based on multiobjective optimization (MOO), namely FeaClusMOO, is proposed which is capable of identifying the correct partitioning as well as the most relevant set of features from a data set. A newly developed multiobjective simulated annealing based optimization technique namely archived multiobjective simulated annealing (AMOSA) is used as the background strategy for optimization. Here features and cluster centers are encoded in the form of a string. As the objective functions, two internal cluster validity indices measuring the goodness of the obtained partitioning using Euclidean distance and point symmetry based distance, respectively, and a count on the number of features are utilized. These three objectives are optimized simultaneously using AMOSA in order to detect the appropriate subset of features, appropriate number of clusters as well as the appropriate partitioning. Points are allocated to different clusters using a point symmetry based distance. Mutation changes the feature combination as well as the set of cluster centers. Since AMOSA, like any other MOO technique, provides a set of solutions on the final Pareto front, a technique based on the concept of semi-supervised classification is developed to select a solution from the given set. The effectiveness of the proposed FeaClustMOO in comparison with other clustering techniques like its Euclidean distance based version where Euclidean distance is used for cluster assignment, a genetic algorithm based automatic clustering technique (VGAPS-clustering) using point symmetry based distance with all the features, K-means clustering technique with all features is shown for seven higher dimensional data sets obtained from real-life.  相似文献   

4.
In this paper the problem of automatic clustering a data set is posed as solving a multiobjective optimization (MOO) problem, optimizing a set of cluster validity indices simultaneously. The proposed multiobjective clustering technique utilizes a recently developed simulated annealing based multiobjective optimization method as the underlying optimization strategy. Here variable number of cluster centers is encoded in the string. The number of clusters present in different strings varies over a range. The points are assigned to different clusters based on the newly developed point symmetry based distance rather than the existing Euclidean distance. Two cluster validity indices, one based on the Euclidean distance, XB-index, and another recently developed point symmetry distance based cluster validity index, Sym-index, are optimized simultaneously in order to determine the appropriate number of clusters present in a data set. Thus the proposed clustering technique is able to detect both the proper number of clusters and the appropriate partitioning from data sets either having hyperspherical clusters or having point symmetric clusters. A new semi-supervised method is also proposed in the present paper to select a single solution from the final Pareto optimal front of the proposed multiobjective clustering technique. The efficacy of the proposed algorithm is shown for seven artificial data sets and six real-life data sets of varying complexities. Results are also compared with those obtained by another multiobjective clustering technique, MOCK, two single objective genetic algorithm based automatic clustering techniques, VGAPS clustering and GCUK clustering.  相似文献   

5.
杨鹤标  王健 《计算机工程》2010,36(20):52-54
针对多关系多分类的非平衡数据,提出一种分类模型。在预处理阶段,建立目标类纠错输出编码(ECOC)、目标关系与背景关系间的虚拟连接并完成属性聚集处理,进而划分训练集和验证集。在训练阶段,依据一对多划分思想,结合CrossMine算法构造多个子分类器,采用AUC法评估验证各子分类器。在验证阶段,比较目标类ECOC与各子分类器分类结果连接字的海明距离,选择最小海明距离的目标类为最终分类。经合成和真实数据的实验,验证了模型有效性及分类效果。  相似文献   

6.
In this paper, we propose a simulated annealing (SA) based multiobjective optimization (MOO) approach for classifier ensemble. Several different versions of the objective functions are exploited. We hypothesize that the reliability of prediction of each classifier differs among the various output classes. Thus, in an ensemble system, it is necessary to find out the appropriate weight of vote for each output class in each classifier. Diverse classification methods such as Maximum Entropy (ME), Conditional Random Field (CRF) and Support Vector Machine (SVM) are used to build different models depending upon the various representations of the available features. One most important characteristics of our system is that the features are selected and developed mostly without using any deep domain knowledge and/or language dependent resources. The proposed technique is evaluated for Named Entity Recognition (NER) in three resource-poor Indian languages, namely Bengali, Hindi and Telugu. Evaluation results yield the recall, precision and F-measure values of 93.95%, 95.15% and 94.55%, respectively for Bengali, 93.35%, 92.25% and 92.80%, respectively for Hindi and 84.02%, 96.56% and 89.85%, respectively for Telugu. Experiments also suggest that the classifier ensemble identified by the proposed MOO based approach optimizing the F-measure values of named entity (NE) boundary detection outperforms all the individual models, two conventional baseline models and three other MOO based ensembles.  相似文献   

7.
In this paper, we propose a new support vector machine (SVM) called dual margin Lagrangian support vectors machine (DMLSVM). Unlike other SVMs which use only support vectors to determine the separating hyperplanes, DMLSVM utilizes all the available training data for training the classifier, thus producing robust performance. The training data are weighted differently depending on whether they are in a marginal region or surplus region. For fast training, DMLSVM borrows its training algorithm from Lagrangian SVM (LSVM) and tailors the algorithm to its formulation. The convergence of our training method is rigorously proven and its validity is tested on a synthetic test set and UCI dataset. The proposed method can be used in a variety of applications such as a recommender systems for web contents of IPTV services.  相似文献   

8.
In this paper, the concept of finding an appropriate classifier ensemble for named entity recognition is posed as a multiobjective optimization (MOO) problem. Our underlying assumption is that instead of searching for the best-fitting feature set for a particular classifier, ensembling of several classifiers those are trained using different feature representations could be a more fruitful approach, but it is crucial to determine the appropriate subset of classifiers that are most suitable for the ensemble. We use three heterogenous classifiers namely maximum entropy, conditional random field, and support vector machine in order to build a number of models depending upon the various representations of the available features. The proposed MOO-based ensemble technique is evaluated for three resource-constrained languages, namely Bengali, Hindi, and Telugu. Evaluation results yield the recall, precision, and F-measure values of 92.21, 92.72, and 92.46%, respectively, for Bengali; 97.07, 89.63, and 93.20%, respectively, for Hindi; and 80.79, 93.18, and 86.54%, respectively, for Telugu. We also evaluate our proposed technique with the CoNLL-2003 shared task English data sets that yield the recall, precision, and F-measure values of 89.72, 89.84, and 89.78%, respectively. Experimental results show that the classifier ensemble identified by our proposed MOO-based approach outperforms all the individual classifiers, two different conventional baseline ensembles, and the classifier ensemble identified by a single objective?Cbased approach. In a part of the paper, we formulate the problem of feature selection in any classifier under the MOO framework and show that our proposed classifier ensemble attains superior performance to it.  相似文献   

9.
A genetic algorithm-based rule extraction system   总被引:1,自引:0,他引:1  
Individual classifiers predict unknown objects. Although, these are usually domain specific, and lack the property of scaling up prediction while handling data sets with huge size and high-dimensionality or imbalance class distribution. This article introduces an accuracy-based learning system called DTGA (decision tree and genetic algorithm) that aims to improve prediction accuracy over any classification problem irrespective to domain, size, dimensionality and class distribution. More specifically, the proposed system consists of two rule inducing phases. In the first phase, a base classifier, C4.5 (a decision tree based rule inducer) is used to produce rules from training data set, whereas GA (genetic algorithm) in the next phase refines them with the aim to provide more accurate and high-performance rules for prediction. The system has been compared with competent non-GA based systems: neural network, Naïve Bayes, rule-based classifier using rough set theory and C4.5 (i.e., the base classifier of DTGA), on a number of benchmark datasets collected from UCI (University of California at Irvine) machine learning repository. Empirical results demonstrate that the proposed hybrid approach provides marked improvement in a number of cases.  相似文献   

10.
一种高效的最小二乘支持向量机分类器剪枝算法   总被引:2,自引:0,他引:2  
针对最小二乘支持向量机丧失稀疏性的问题,提出了一种高效的剪枝算法.为了避免解初始的线性代数方程组,采用了一种自下而上的策略.在训练的过程中,根据一些特定的剪枝条件,块增量学习和逆学习交替进行,一个小的支持向量集能够自动形成.使用此集合,可以构造最终的分类器.为了测试新算法的有效性,把它应用于5个UCI数据集.实验结果表明:使用新的剪枝算法,当增量块的大小等于2时,在几乎不损失精度的情况下,可以得到稀疏解.另外,和SMO算法相比,新算法的速度更快.新的算法不仅适用于最小二乘支持向量机分类器,也可向最小二乘支持向量回归机推广.  相似文献   

11.
Part-of-Speech (PoS) tagging is an important pipelined module for almost all Natural Language Processing (NLP) application areas. In this paper we formulate PoS tagging within the frameworks of single and multi-objective optimization techniques. At the very first step we propose a classifier ensemble technique for PoS tagging using the concept of single objective optimization (SOO) that exploits the search capability of simulated annealing (SA). Thereafter we devise a method based on multiobjective optimization (MOO) to solve the same problem, and for this a recently developed multiobjective simulated annealing based technique, AMOSA, is used. The characteristic features of AMOSA are its concepts of the amount of domination and archive in simulated annealing, and situation specific acceptance probabilities. We use Conditional Random Field (CRF) and Support Vector Machine (SVM) as the underlying classification methods that make use of a diverse set of features, mostly based on local contexts and orthographic constructs. We evaluate our proposed approaches for two Indian languages, namely Bengali and Hindi. Evaluation results of the single objective version shows the overall accuracy of 88.92% for Bengali and 87.67% for Hindi. The MOO based ensemble yields the overall accuracies of 90.45% and 89.88% for Bengali and Hindi, respectively.  相似文献   

12.
基于“3σ”规则的贝叶斯分类器   总被引:1,自引:0,他引:1  
在软测量建模问题中为了提高模型的估计精度,通常需要将原始数据集分类,以构造多个子模型。数据分类中利用朴素贝叶斯分类器简单高效的优点,首先对连续的类变量进行类别范围划分,然后用概率论中的3σ规则对连续的属性变量离散。可以消除训练样本中干扰数据的影响,利用遗传算法从训练样本集中优选样本。对连续变量的离散和样本的优选作为对数据的预处理,预处理后的训练样本构建贝叶斯分类器。通过对UC I数据集和双酚A生产过程在线监测数据集的实验仿真,实验结果表明,遗传算法优选样本集的3σ规则朴素贝叶斯分类方法比其它方法有更高的分类精度。  相似文献   

13.
基于支持向量机和k-近邻分类器的多特征融合方法   总被引:1,自引:0,他引:1  
陈丽  陈静 《计算机应用》2009,29(3):833-835
针对传统分类方法只采用一种分类器而存在的片面性,分类精度不高,以及支持向量机分类超平面附近点易错分的问题,提出了基于支持向量机(SVM)和k 近邻(KNN)的多特征融合方法。在该算法中,设样本集特征可分为L组,先用SVM算法根据训练集中每组特征数据构造分类超平面,共构造L个;其次用SVM KNN方法对测试集进行测试,得到由L组后验概率构成的决策轮廓矩阵;最后将其进行多特征融合,输出最终的分类结果。用鸢尾属植物数据进行了数值实验,实验结果表明:采用基于SVM KNN的多特征融合方法比单独使用一种SVM或SVM KNN方法的平均预测精度分别提高了28.7%和1.9%。  相似文献   

14.
This study reports the design and implementation of a pattern recognition algorithm aimed to classify electroencephalographic (EEG) signals based on a class of dynamic neural networks (NN) described by time delay differential equations (TDNN). This kind of NN introduces the signal windowing process used in different pattern classification methods. The development of the classifier included a new set of learning laws that considered the impact of delayed information on the classifier structure. Both, the training and the validation processes were completely designed and evaluated in this study. The training method for this kind of NN was obtained by applying the Lyapunov theory stability analysis. The accuracy of training process was characterized in terms of the number of delays. A parallel structure (similar to an associative memory) with fixed (obtained after training) weights was used to execute the validation stage. Two methods were considered to validate the pattern classification method: a generalization-regularization and the k-fold cross validation processes (k = 5). Two different classes were considered: normal EEG and patients with previous confirmed neurological diagnosis. The first one contains the EEG signals from 100 healthy patients while the second contains information of epileptic seizures from the same number of patients. The pattern classification algorithm achieved a correct classification percentage of 92.12% using the information of the entire database. In comparison with similar pattern classification methods that considered the same database, the proposed CNN proved to achieve the same or even better correct classification results without pre-treating the EEG raw signal. This new type of classifier working in continuous time but using the delayed information of the input seems to be a reliable option to develop an accurate classification of windowed EEG signals.  相似文献   

15.
提出了一种没有训练集情况下实现对未标注类别文本文档进行分类的问题。类关联词是与类主体相关、能反映类主体的单词或短语。利用类关联词提供的先验信息,形成文档分类的先验概率,然后组合利用朴素贝叶斯分类器和EM迭代算法,在半监督学习过程中加入分类约束条件,用类关联词来监督构造一个分类器,实现了对完全未标注类别文档的分类。实验结果证明,此方法能够以较高的准确率实现没有训练集情况下的文本分类问题,在类关联词约束下的分类准确率要高于没有约束情况下的分类准确率。  相似文献   

16.
MOEA/D: A Multiobjective Evolutionary Algorithm Based on Decomposition   总被引:10,自引:0,他引:10  
Decomposition is a basic strategy in traditional multiobjective optimization. However, it has not yet been widely used in multiobjective evolutionary optimization. This paper proposes a multiobjective evolutionary algorithm based on decomposition (MOEA/D). It decomposes a multiobjective optimization problem into a number of scalar optimization subproblems and optimizes them simultaneously. Each subproblem is optimized by only using information from its several neighboring subproblems, which makes MOEA/D have lower computational complexity at each generation than MOGLS and nondominated sorting genetic algorithm II (NSGA-II). Experimental results have demonstrated that MOEA/D with simple decomposition methods outperforms or performs similarly to MOGLS and NSGA-II on multiobjective 0-1 knapsack problems and continuous multiobjective optimization problems. It has been shown that MOEA/D using objective normalization can deal with disparately-scaled objectives, and MOEA/D with an advanced decomposition method can generate a set of very evenly distributed solutions for 3-objective test instances. The ability of MOEA/D with small population, the scalability and sensitivity of MOEA/D have also been experimentally investigated in this paper.  相似文献   

17.
化工过程故障诊断中样本数据分布不均衡现象普遍存在.在使用不均衡样本作为训练集建立各类故障诊断分类器时,易出现分类器的识别率偏置于多数类样本的结果,由此产生虽正常状态易识别,但更受关注的故障状态却难以被诊断的现象.针对该问题,本文提出一种基于Easy Ensemble思想的主元分析–支持向量机(Easy Ensemble based principle component analysis–support vector machine,EEPS)故障诊断算法,通过欠采样方法抽取多数类样本子集组建多个新的均衡数据样本集,使用主元分析(principle component analysis,PCA)进行特征提取并使用支持向量机(support vector machine,SVM)算法进行训练,得到多个基于SVM的故障诊断分类器,然后使用Adaboost算法集成最终的分类,从而提高故障诊断准确性.所提方法被用于TE(Tenessee Eastman)化工过程,实验结果表明,EEPS算法能够有效提高分类器在不均衡数据集上的诊断性能和预报能力.  相似文献   

18.
A method for identifying the structure of nonlinear polynomial dynamic models is presented. This approach uses an evolutionary algorithm, genetic programming, in a multiobjective fashion to generate global models which describe the dynamic behavior of the nonlinear system under investigation. The validation stage of system identification is simultaneously evaluated using the multiobjective tool, in order to direct the identification process to a set of global models of the system.  相似文献   

19.
On piecewise-linear classification   总被引:2,自引:0,他引:2  
The authors make use of a real data set containing 9-D measurements of fine needle aspirates of a patient's breast for the purpose of classifying a tumor's malignancy for which early stopping in the generation of the separating hyperplanes is not appropriate. They compare a piecewise-linear classification method with classification based on a single linear separator. A precise methodology for comparing the relative efficacy of two classification methods for a particular task is described and is applied to the comparison on the breast cancer data of the relative performances of the two versions of the piecewise-linear classifier and the classification based on an optimal linear separator. It is found that for this data set, the piecewise-linear classifier that uses all the hyperplanes needed to separate the training set outperforms the other two methods and that these differences in performance are significant at the 0.001 level. There is no statistically significant difference between the performance of the other two methods. The authors discuss the relevance of these results for this and other applications  相似文献   

20.
An algorithm OnSVM of the kernel-based classification is proposed which solution is very close to -SVM an efficient modification of support vectors machine. The algorithm is faster than batch implementations of -SVM and has a smaller resulting number of support vectors. The approach developed maximizes a margin between a pair of hyperplanes in feature space and can be used in online setup. A ternary classifier of 2-class problem with an “unknown” decision is constructed using these hyperplanes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号