首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
In real-world classification problems, different types of misclassification errors often have asymmetric costs, thus demanding cost-sensitive learning methods that attempt to minimize average misclassification cost rather than plain error rate. Instance weighting and post hoc threshold adjusting are two major approaches to cost-sensitive classifier learning. This paper compares the effects of these two approaches on several standard, off-the-shelf classification methods. The comparison indicates that the two approaches lead to similar results for some classification methods, such as Naïve Bayes, logistic regression, and backpropagation neural network, but very different results for other methods, such as decision tree, decision table, and decision rule learners. The findings from this research have important implications on the selection of the cost-sensitive classifier learning approach as well as on the interpretation of a recently published finding about the relative performance of Naïve Bayes and decision trees.  相似文献   

3.
Security is recognized as an important problem in planning, design and operation stages of electric power systems. Power system security assessment deals with the system’s ability to continue to provide service in the event of an unforeseen contingency. This paper proposes a particle swarm optimization (PSO) based classification for static security evaluation in power systems. A straightforward and quick procedure is used to select a small number of variables as features from a large set of variables which are normally available in power systems. A simple first order security function is designed using the selected features for classification. The training of weights in the classifier function (security function) is carried out by PSO technique. The PSO algorithm has minimized the error rate in classification. The procedure to determine the security function (classifier) is discussed. The performance of the algorithm is tested on IEEE 14 Bus, IEEE 57 Bus and IEEE 118 Bus systems. Simulation results show that the PSO classifier gives a fairly high classification accuracy and less misclassification rate.  相似文献   

4.
《Pattern recognition letters》2002,23(1-3):227-233
The problem studied is the behavior of a discrete classifier on a finite learning sample. With naive Bayes approach, the value of misclassification probability is represented as a random function, for which the first two moments are analytically derived. For arbitrary distributions, this allows evaluating learning sample size sufficient for the classification with given admissible misclassification probability and confidence level. The comparison with statistical learning theory shows that the suggested approach frequently recommends significantly smaller learning sample size.  相似文献   

5.
针对传统分类算法隐含的假设(相信并且接受每个样本的分类结果)在医疗/故障诊断和欺诈/入侵检测等领域中并不适用的问题,提出嵌入非对称拒识代价的二元分类问题,并对其进行简化.在此基础上设计出基于支持向量机(SVM)的代价敏感分类算法(CSVM-CRC).该算法包括训练 SVM 分类器、计算后验概率、估计分类可靠性和确定最优拒识阈值4个步骤.基于10个 Benchmark 数据集的实验研究表明, CSVM-CRC 算法能够有效降低平均代价.  相似文献   

6.
ContextSoftware defect prediction (SDP) is an important task in software engineering. Along with estimating the number of defects remaining in software systems and discovering defect associations, classifying the defect-proneness of software modules plays an important role in software defect prediction. Several machine-learning methods have been applied to handle the defect-proneness of software modules as a classification problem. This type of “yes” or “no” decision is an important drawback in the decision-making process and if not precise may lead to misclassifications. To the best of our knowledge, existing approaches rely on fully automated module classification and do not provide a way to incorporate extra knowledge during the classification process. This knowledge can be helpful in avoiding misclassifications in cases where system modules cannot be classified in a reliable way.ObjectiveWe seek to develop a SDP method that (i) incorporates a reject option in the classifier to improve the reliability in the decision-making process; and (ii) makes it possible postpone the final decision related to rejected modules for an expert analysis or even for another classifier using extra domain knowledge.MethodWe develop a SDP method called rejoELM and its variant, IrejoELM. Both methods were built upon the weighted extreme learning machine (ELM) with reject option that makes it possible postpone the final decision of non-classified modules, the rejected ones, to another moment. While rejoELM aims to maximize the accuracy for a rejection rate, IrejoELM maximizes the F-measure. Hence, IrejoELM becomes an alternative for classification with reject option for imbalanced datasets.ResultsrejoEM and IrejoELM are tested on five datasets of source code metrics extracted from real world open-source software projects. Results indicate that rejoELM has an accuracy for several rejection rates that is comparable to some state-of-the-art classifiers with reject option. Although IrejoELM shows lower accuracies for several rejection rates, it clearly outperforms all other methods when the F-measure is used as a performance metric.ConclusionIt is concluded that rejoELM is a valid alternative for classification with reject option problems when classes are nearly equally represented. On the other hand, IrejoELM is shown to be the best alternative for classification with reject option on imbalanced datasets. Since SDP problems are usually characterized as imbalanced learning problems, the use of IrejoELM is recommended.  相似文献   

7.
In document recognition, it is often important to obtain high accuracy or reliability and to reject patterns that cannot be classified with high confidence. This is the case for applications such as the processing of financial documents in which errors can be very costly and therefore far less tolerable than rejections. This paper presents a new approach based on Linear Discriminant Analysis (LDA) to reject less reliable classifier outputs. To implement the rejection, which can be considered a two-class problem of accepting the classification result or otherwise, an LDA-based measurement is used to determine a new rejection threshold. This measurement (LDAM) is designed to take into consideration the confidence values of the classifier outputs and the relations between them, and it represents a more comprehensive measurement than traditional rejection measurements such as First Rank Measurement and First Two Ranks Measurement. Experiments are conducted on the CENPARMI database of numerals, the CENPARMI Arabic Isolated Numerals Database, and the numerals in the NIST Special Database 19. The results show that LDAM is more effective, and it can achieve a higher reliability while maintaining a high recognition rate on these databases of very different origins and sizes.  相似文献   

8.
针对目前浅层分类方法存在训练样本数量过大和拟合复杂函数能力较弱等不足,提出一种改进的基于深信度网络分类算法的行人检测方法。首先,通过搭建带T分布函数显层节点的受限波兹曼机输入端改进深信度网络的输入方式,将行人特征提取信息通过输入端的显层结构转化为分类器可以识别的伯努利分布方式;其次,搭建多隐层受限波兹曼机中间层结构,实现隐层结构间的数据传递,保留关键信息。最后,利用BP神经网络搭建分类结构的输出端,实现分类误差信息反向传播并对分类结构的参数进行微调,不断优化分类器结构。实验证明,改进的深信度网络行人检测算法性能优于经典浅层分类算法,算法的检测速度也能满足使用要求。  相似文献   

9.
This contribution proposes a powerful technique for two-class imbalanced classification problems by combining the synthetic minority over-sampling technique (SMOTE) and the particle swarm optimisation (PSO) aided radial basis function (RBF) classifier. In order to enhance the significance of the small and specific region belonging to the positive class in the decision region, the SMOTE is applied to generate synthetic instances for the positive class to balance the training data set. Based on the over-sampled training data, the RBF classifier is constructed by applying the orthogonal forward selection procedure, in which the classifier's structure and the parameters of RBF kernels are determined using a PSO algorithm based on the criterion of minimising the leave-one-out misclassification rate. The experimental results obtained on a simulated imbalanced data set and three real imbalanced data sets are presented to demonstrate the effectiveness of our proposed algorithm.  相似文献   

10.
11.
The recognition of connected handwritten digit strings is a challenging task due mainly to two problems: poor character segmentation and unreliable isolated character recognition. The authors first present a rational B-spline representation of digit templates based on Pixel-to-Boundary Distance (PBD) maps. We then present a neural network approach to extract B-spline PBD templates and an evolutionary algorithm to optimize these templates. In total, 1000 templates (100 templates for each of 10 classes) were extracted from and optimized on 10426 training samples from the NIST Special Database 3. By using these templates, a nearest neighbor classifier can successfully reject 90.7 percent of nondigit patterns while achieving a 96.4 percent correct classification of isolated test digits. When our classifier is applied to the recognition of 4958 connected handwritten digit strings (4555 2-digit, 355 3-digit, and 48 4-digit strings) from the NIST Special Database 3 with a dynamic programming approach, it has a correct classification rate of 82.4 percent with a rejection rate of as low as 0.85 percent. Our classifier compares favorably in terms of correct classification rate and robustness with other classifiers that are tested  相似文献   

12.
The generalization error bounds found by current error models using the number of effective parameters of a classifier and the number of training samples are usually very loose. These bounds are intended for the entire input space. However, support vector machine (SVM), radial basis function neural network (RBFNN), and multilayer perceptron neural network (MLPNN) are local learning machines for solving problems and treat unseen samples near the training samples to be more important. In this paper, we propose a localized generalization error model which bounds from above the generalization error within a neighborhood of the training samples using stochastic sensitivity measure. It is then used to develop an architecture selection technique for a classifier with maximal coverage of unseen samples by specifying a generalization error threshold. Experiments using 17 University of California at Irvine (UCI) data sets show that, in comparison with cross validation (CV), sequential learning, and two other ad hoc methods, our technique consistently yields the best testing classification accuracy with fewer hidden neurons and less training time.  相似文献   

13.
S.  N.  P. 《Neurocomputing》2008,71(7-9):1345-1358
This paper presents a new sequential multi-category classifier using radial basis function (SMC-RBF) network for real-world classification problems. The classification algorithm processes the training data one by one and builds the RBF network starting with zero hidden neuron. The growth criterion uses the misclassification error, the approximation error to the true decision boundary and a distance measure between the current sample and the nearest neuron belonging to the same class. SMC-RBF uses the hinge loss function (instead of the mean square loss function) for a more accurate estimate of the posterior probability. For network parameter updates, a decoupled extended Kalman filter is used to reduce the computational overhead. Performance of the proposed algorithm is evaluated using three benchmark problems, viz., image segmentation, vehicle and glass from the UCI machine learning repository. In addition, performance comparison has also been done on two real-world problems in the areas of remote sensing and bio-informatics. The performance of the proposed SMC-RBF classifier is also compared with the other RBF sequential learning algorithms like MRAN, GAP-RBFN, OS-ELM and the well-known batch classification algorithm SVM. The results indicate that SMC-RBF produces a higher classification accuracy with a more compact network. Also, the study indicates that using a function approximation algorithm for classification problems may not work well when the classes are not well separated and the training data is not uniformly distributed among the classes.  相似文献   

14.
一个基于模糊神经网络的模式分类系统   总被引:9,自引:0,他引:9  
目前,基于神经网络的分类系统在许多领域得到了越来越广泛的应用。但是,该系统大多采用的是离线自适应机制,即神经网络需学习新的分类知识时,要重新训练神经网络,从而大大增加神经网络的训练时间;对于重叠分类,一般是构成一个贝叶斯分类器。然而,贝叶斯分类器的构成需要关于分类数据的概率密度函数的先验知识,而这些知识常常在模式分类前是难以获得的。为了解决这些问题,文中根据模糊集合理论,提出了一种基于模糊神经网络  相似文献   

15.
基于支持向量机和距离度量的纹理分类   总被引:9,自引:1,他引:9       下载免费PDF全文
针对图象纹理分类问题,提出了一种将支持向量机和距离度量相结合,以构成两级组合分类器的分类方法,用该方法分类时,先采用距离度量进行前级分类,然后根据图象的纹理统计特征,采用欧氏距离来度量图象之间的相似性,若符合条件,则给出分类结果,否则拒识,并转入后级分类器,而后级分类器则采用一种新的模式分类方法-支持向量机进行分类,该组合分类方法不仅充分利用了支持向量机识别率高和距离度量速度快的优点,并且还利用距离度量的结果去指导支持向量机的训练和测试,由纹理图象分类的实验表明,该算法具有较高的效率和识别精度,同时也对推动支持向量机这一新的模式分类方法的实际应用具有积极意义。  相似文献   

16.
In statistical image classification it is usually assumed that feature observations given class labels are independently distributed. Even in the case when training sample is formed by dependent feature observations, the feature observations to be classified are usually assumed to be independent from training sample. In this paper we propose the original method of the incorporation of spatial information into the per-pixel classifiers. Our approach is based on the retraction of the independence assumption by proposing stationary Gaussian random field (GRF) model for features. The conditional distribution of class label of observation to be classified is assumed to be dependent on its spatial adjacency within the spatial framework of the training sample. For a given training sample, plug-in version of the Bayes discriminant function (PBDF) is proposed for classification. Performance of the proposed PBDF is tested and compared with ones ignoring dependence among feature observations to be classified and training sample. For illustration the image of figure corrupted by the additive GRF is analyzed. The advantage of the proposed classifier against the competing one is shown visually and numerically in the first example. In the second example, three spatial sampling designs for training data are compared on the basis of the actual error rate values of the proposed PBDF. For the remotely sensed image, the advantage of the proposed classification method against popular unsupervised classification method is shown in terms of visual evaluation and empirical errors of misclassification.  相似文献   

17.
一种协同半监督分类算法Co-S3OM   总被引:1,自引:0,他引:1  
为了提高半监督分类的有效性, 提出了一种基于SOM神经网络和协同训练的半监督分类算法Co-S3OM (coordination semi-supervised SOM)。将有限的有标记样本分为无重复的三个均等的训练集, 分别使用改进的监督SSOM算法(supervised SOM)训练三个单分类器, 通过三个单分类器共同投票的方法挖掘未标记样本中的隐含信息, 扩大有标记样本的数量, 依次扩充单分类器训练集, 生成最终的分类器。最后选取UCI数据集进行实验, 结果表明Co-S3OM具有较高的标记率和分类率。  相似文献   

18.
A theoretical analysis of bagging as a linear combination of classifiers   总被引:1,自引:0,他引:1  
We apply an analytical framework for the analysis of linearly combined classifiers to ensembles generated by bagging. This provides an analytical model of bagging misclassification probability as a function of the ensemble size, which is a novel result in the literature. Experimental results on real data sets confirm the theoretical predictions. This allows us to derive a novel and theoretically grounded guideline for choosing bagging ensemble size. Furthermore, our results are consistent with explanations of bagging in terms of classifier instability and variance reduction, support the optimality of the simple average over the weighted average combining rule for ensembles generated by bagging, and apply to other randomization-based methods for constructing classifier ensembles. Although our results do not allow to compare bagging misclassification probability with the one of an individual classifier trained on the original training set, we discuss how the considered theoretical framework could be exploited to this aim.  相似文献   

19.
The area under the ROC curve (AUC) provides a good scalar measure of ranking performance without requiring a specific threshold for performance comparison among classifiers. AUC is useful for imprecise environments since it operates independently with respect to class distributions and misclassification costs. A direct optimization of this AUC criterion thus becomes a natural choice for binary classifier design. However, a direct formulation based on the AUC criterion would require a high computational cost due to the drastically increasing input pair features. In this paper, we propose an online learning algorithm to circumvent this computational problem for binary classification. Different from those conventional recursive formulations, the proposed formulation involves a pairwise cost function which pairs up a newly arrived data point with those of opposite class in stored data. Moreover, with incorporation of a sparse learning into the online formulation, the computational effort can be significantly reduced. Our empirical results on three different scales of public databases show promising potential in terms of classification AUC, accuracy, and computational efficiency.  相似文献   

20.
A multichannel approach to fingerprint classification   总被引:29,自引:0,他引:29  
Fingerprint classification provides an important indexing mechanism in a fingerprint database. An accurate and consistent classification can greatly reduce fingerprint matching time for a large database. We present a fingerprint classification algorithm which is able to achieve an accuracy better than previously reported in the literature. We classify fingerprints into five categories: whorl, right loop, left loop, arch, and tented arch. The algorithm uses a novel representation (FingerCode) and is based on a two-stage classifier to make a classification. It has been tested on 4000 images in the NIST-4 database. For the five-class problem, a classification accuracy of 90 percent is achieved (with a 1.8 percent rejection during the feature extraction phase). For the four-class problem (arch and tented arch combined into one class), we are able to achieve a classification accuracy of 94.8 percent (with 1.8 percent rejection). By incorporating a reject option at the classifier, the classification accuracy can be increased to 96 percent for the five-class classification task, and to 97.8 percent for the four-class classification task after a total of 32.5 percent of the images are rejected  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号