首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
On the use of ROC analysis for the optimization of abstaining classifiers   总被引:1,自引:0,他引:1  
Classifiers that refrain from classification in certain cases can significantly reduce the misclassification cost. However, the parameters for such abstaining classifiers are often set in a rather ad-hoc manner. We propose a method to optimally build a specific type of abstaining binary classifiers using ROC analysis. These classifiers are built based on optimization criteria in the following three models: cost-based, bounded-abstention and bounded-improvement. We show that selecting the optimal classifier in the first model is similar to known iso-performance lines and uses only the slopes of ROC curves, whereas selecting the optimal classifier in the remaining two models is not straightforward. We investigate the properties of the convex-down ROCCH (ROC Convex Hull) and present a simple and efficient algorithm for finding the optimal classifier in these models, namely, the bounded-abstention and bounded-improvement models. We demonstrate the application of these models to effectively reduce misclassification cost in real-life classification systems. The method has been validated with an ROC building algorithm and cross-validation on 15 UCI KDD datasets. An early version of this paper was published at ICML2005. Action Editor: Johannes Fürnkranz.  相似文献   

2.
秦锋  杨波  程泽凯 《微机发展》2006,16(10):85-88
在数据挖掘领域中,不同分类器建立的模型性能不尽相同。对分类器性能的评价是选择优秀分类器的基础。为了更好地对分类器性能进行评估,文中对分类器性能评价标准进行了研究。分析了传统分类器性能评价标准在应用时存在的一些问题,重点介绍了ROC曲线(the Receiver Operating Characteristic curve)和AUC(the area under the ROC curve)评价方法,并剖析了它们的优缺点。对比分析表明,ROC曲线和AUC方法虽然存在着一定的不足,但是在分类器性能评价中所表现出的诱人性质使其必定具有广阔的应用前景。  相似文献   

3.
The simultaneous use of multiple classifiers has been shown to provide performance improvement in classification problems. The selection of an optimal set of classifiers is an important part of multiple classifier systems and the independence of classifier outputs is generally considered to be an advantage for obtaining better multiple classifier systems. In this paper, the need for the classifier independence is interrogated from classification performance point of view. The performance achieved with the use of classifiers having independent joint distributions is compared to some other classifiers which are defined to have best and worst joint distributions. These distributions are obtained by formulating the combination operation as an optimization problem. The analysis revealed several important observations about classifier selection which are then used to analyze the problem of selecting an additional classifier to be used with the available multiple classifier system.  相似文献   

4.
倪黄晶  王蔚 《计算机工程》2011,37(10):160-161
不同的基分类器对不同分布类型的多类别不平衡数据的适应性存在较大差异。为此,针对分类器的选用问题,在分析比较准确率(ACC)及曲线下面积(AUC)的评价标准基础上,选择基于AUC的分类器评价方法,将支持向量机、决策树和贝叶斯分类器应用于标准数据集中,并采用AUC来评价结果,得出相关结论:在多类不平衡数据上,贝叶斯是最好的基分类器,且SVM分类器存在一定改进空间。  相似文献   

5.
Epileptic seizures are manifestations of epilepsy. Careful analyses of the electroencephalograph (EEG) records can provide valuable insight and improved understanding of the mechanisms causing epileptic disorders. The detection of epileptiform discharges in the EEG is an important component in the diagnosis of epilepsy. As EEG signals are non-stationary, the conventional method of frequency analysis is not highly successful in diagnostic classification. This paper deals with a novel method of analysis of EEG signals using wavelet transform and classification using artificial neural network (ANN) and logistic regression (LR). Wavelet transform is particularly effective for representing various aspects of non-stationary signals such as trends, discontinuities and repeated patterns where other signal processing approaches fail or are not as effective. Through wavelet decomposition of the EEG records, transient features are accurately captured and localized in both time and frequency context. In epileptic seizure classification we used lifting-based discrete wavelet transform (LBDWT) as a preprocessing method to increase the computational speed. The proposed algorithm reduces the computational load of those algorithms that were based on classical wavelet transform (CWT). In this study, we introduce two fundamentally different approaches for designing classification models (classifiers) the traditional statistical method based on logistic regression and the emerging computationally powerful techniques based on ANN. Logistic regression as well as multilayer perceptron neural network (MLPNN) based classifiers were developed and compared in relation to their accuracy in classification of EEG signals. In these methods we used LBDWT coefficients of EEG signals as an input to classification system with two discrete outputs: epileptic seizure or non-epileptic seizure. By identifying features in the signal we want to provide an automatic system that will support a physician in the diagnosing process. By applying LBDWT in connection with MLPNN, we obtained novel and reliable classifier architecture. The comparisons between the developed classifiers were primarily based on analysis of the receiver operating characteristic (ROC) curves as well as a number of scalar performance measures pertaining to the classification. The MLPNN based classifier outperformed the LR based counterpart. Within the same group, the MLPNN based classifier was more accurate than the LR based classifier.  相似文献   

6.
We exploit an evolutionary three-objective optimization algorithm to produce a Pareto front approximation composed of fuzzy rule-based classifiers (FRBCs) with different trade-offs between accuracy (expressed in terms of sensitivity and specificity) and complexity (computed as sum of the conditions in the antecedents of the classifier rules). Then, we use the ROC convex hull method to select the potentially optimal classifiers in the projection of the Pareto front approximation onto the ROC plane. Our method was tested on 13 highly imbalanced datasets and compared with 2 two-objective evolutionary approaches and one heuristic approach to FRBC generation, and with three well-known classifiers. We show by the Wilcoxon signed-rank test that our three-objective optimization approach outperforms all the other techniques, except for one classifier, in terms of the area under the ROC convex hull, an accuracy measure used to globally compare different classification approaches. Further, all the FRBCs in the ROC convex hull are characterized by a low value of complexity. Finally, we discuss how, the misclassification costs and the class distributions are fixed, we can select the most suitable classifier for the specific application. We show that the FRBC selected from the convex hull produced by our three-objective optimization approach achieves the lowest classification cost among the techniques used as comparison in two specific medical applications.  相似文献   

7.
The area under the ROC curve (AUC) provides a good scalar measure of ranking performance without requiring a specific threshold for performance comparison among classifiers. AUC is useful for imprecise environments since it operates independently with respect to class distributions and misclassification costs. A direct optimization of this AUC criterion thus becomes a natural choice for binary classifier design. However, a direct formulation based on the AUC criterion would require a high computational cost due to the drastically increasing input pair features. In this paper, we propose an online learning algorithm to circumvent this computational problem for binary classification. Different from those conventional recursive formulations, the proposed formulation involves a pairwise cost function which pairs up a newly arrived data point with those of opposite class in stored data. Moreover, with incorporation of a sparse learning into the online formulation, the computational effort can be significantly reduced. Our empirical results on three different scales of public databases show promising potential in terms of classification AUC, accuracy, and computational efficiency.  相似文献   

8.
Current artificial immune system (AIS) classifiers have two major problems: 1) their populations of B-cells can grow to huge proportions, and 2) optimizing one B-cell (part of the classifier) at a time does not necessarily guarantee that the B-cell pool (the whole classifier) will be optimized. In this paper, the design of a new AIS algorithm and classifier system called simple AIS is described. It is different from traditional AIS classifiers in that it takes only one B-cell, instead of a B-cell pool, to represent the classifier. This approach ensures global optimization of the whole system, and in addition, no population control mechanism is needed. The classifier was tested on seven benchmark data sets using different classification techniques and was found to be very competitive when compared to other classifiers.  相似文献   

9.
The proposed work involves the multiobjective PSO based adaption of optimal neural network topology for the classification of multispectral satellite images. It is per pixel supervised classification using spectral bands (original feature space). This paper also presents a thorough experimental analysis to investigate the behavior of neural network classifier for given problem. Based on 1050 number of experiments, we conclude that following two critical issues needs to be addressed: (1) selection of most discriminative spectral bands and (2) determination of optimal number of nodes in hidden layer. We propose new methodology based on multiobjective particle swarm optimization (MOPSO) technique to determine discriminative spectral bands and the number of hidden layer node simultaneously. The accuracy with neural network structure thus obtained is compared with that of traditional classifiers like MLC and Euclidean classifier. The performance of proposed classifier is evaluated quantitatively using Xie-Beni and β indexes. The result shows the superiority of the proposed method to the conventional one.  相似文献   

10.
Non-parametric classification procedures based on a certainty measure and nearest neighbour rule for motor unit potential classification (MUP) during electromyographic (EMG) signal decomposition were explored. A diversity-based classifier fusion approach is developed and evaluated to achieve improved classification performance. The developed system allows the construction of a set of non-parametric base classifiers and then automatically chooses, from the pool of base classifiers, subsets of classifiers to form candidate classifier ensembles. The system selects the classifier ensemble members by exploiting a diversity measure for selecting classifier teams. The kappa statistic is used as the diversity measure to estimate the level of agreement between base classifier outputs, i.e., to measure the degree of decision similarity between base classifiers. The pool of base classifiers consists of two kinds of classifiers: adaptive certainty-based classifiers (ACCs) and adaptive fuzzy k-NN classifiers (AFNNCs) and both utilize different types of features. Once the patterns are assigned to their classes, by the classifier fusion system, firing pattern consistency statistics for each class are calculated to detect classification errors in an adaptive fashion. Performance of the developed system was evaluated using real and simulated EMG signals and was compared with the performance of the constituent base classifiers and the performance of the fixed ensemble containing the full set of base classifiers. Across the EMG signal data sets used, the diversity-based classifier fusion approach had better average classification performance overall, especially in terms of reducing classification errors.  相似文献   

11.
《Pattern recognition》2002,35(11):2397-2412
This paper introduces a multinomial selection problem (MSP) procedure as an alternative to classification accuracy and receiver operating characteristic analysis for evaluating competing pattern recognition algorithms. This new application of MSP demonstrates increased differentiation power over traditional classifier evaluation methods when applied to three “toy” problems of varying difficulty. The MSP procedure is also used to compare the performance of statistical classifiers and artificial neural networks on three real-world classification problems. The results provide confidence in the MSP procedure as a useful tool in distinguishing between competing classifiers and providing insights on the strength of conviction of a classifier.  相似文献   

12.
Receiver operating characteristic (ROC) analysis is a standard methodology to evaluate the performance of a binary classification system. The area under the ROC curve (AUC) is a performance metric that summarizes how well a classifier separates two classes. Traditional AUC optimization techniques are supervised learning methods that utilize only labeled data (i.e., the true class is known for all data) to train the classifiers. In this work, inspired by semi-supervised and transductive learning, we propose two new AUC optimization algorithms hereby referred to as semi-supervised learning receiver operating characteristic (SSLROC) algorithms, which utilize unlabeled test samples in classifier training to maximize AUC. Unlabeled samples are incorporated into the AUC optimization process, and their ranking relationships to labeled positive and negative training samples are considered as optimization constraints. The introduced test samples will cause the learned decision boundary in a multi-dimensional feature space to adapt not only to the distribution of labeled training data, but also to the distribution of unlabeled test data. We formulate the semi-supervised AUC optimization problem as a semi-definite programming problem based on the margin maximization theory. The proposed methods SSLROC1 (1-norm) and SSLROC2 (2-norm) were evaluated using 34 (determined by power analysis) randomly selected datasets from the University of California, Irvine machine learning repository. Wilcoxon signed rank tests showed that the proposed methods achieved significant improvement compared with state-of-the-art methods. The proposed methods were also applied to a CT colonography dataset for colonic polyp classification and showed promising results.1  相似文献   

13.
The receiver operating characteristic (ROC) formulation of the two class signal detection problem is well known with its present theory being based on decision theory and psychophysics. Statistical procedures developed for analyzing these human observer detection experiments can be extended to analyzing pattern recognition experiments with computer based classification schemes. This article presents an introduction to statistical estimation and hypothesis testing methodology, which can be employed in analyzing the performance of various classifiers. The methodology will be illustrated by analyzing the performance of two classifiers in a breast cancer detection task.  相似文献   

14.
Recent researches in fault classification have shown the importance of accurately selecting the features that have to be used as inputs to the diagnostic model. In this work, a multi-objective genetic algorithm (MOGA) is considered for the feature selection phase. Then, two different techniques for using the selected features to develop the fault classification model are compared: a single classifier based on the feature subset with the best classification performance and an ensemble of classifiers working on different feature subsets. The motivation for developing ensembles of classifiers is that they can achieve higher accuracies than single classifiers. An important issue for an ensemble to be effective is the diversity in the predictions of the base classifiers which constitute it, i.e. their capability of erring on different sub-regions of the pattern space. In order to show the benefits of having diverse base classifiers in the ensemble, two different ensembles have been developed: in the first, the base classifiers are constructed on feature subsets found by MOGAs aimed at maximizing the fault classification performance and at minimizing the number of features of the subsets; in the second, diversity among classifiers is added to the MOGA search as the third objective function to maximize. In both cases, a voting technique is used to effectively combine the predictions of the base classifiers to construct the ensemble output. For verification, some numerical experiments are conducted on a case of multiple-fault classification in rotating machinery and the results achieved by the two ensembles are compared with those obtained by a single optimal classifier.  相似文献   

15.
Automatic text classification is one of the most important tools in Information Retrieval. This paper presents a novel text classifier using positive and unlabeled examples. The primary challenge of this problem as compared with the classical text classification problem is that no labeled negative documents are available in the training example set. Firstly, we identify many more reliable negative documents by an improved 1-DNF algorithm with a very low error rate. Secondly, we build a set of classifiers by iteratively applying the SVM algorithm on a training data set, which is augmented during iteration. Thirdly, different from previous PU-oriented text classification works, we adopt the weighted vote of all classifiers generated in the iteration steps to construct the final classifier instead of choosing one of the classifiers as the final classifier. Finally, we discuss an approach to evaluate the weighted vote of all classifiers generated in the iteration steps to construct the final classifier based on PSO (Particle Swarm Optimization), which can discover the best combination of the weights. In addition, we built a focused crawler based on link-contexts guided by different classifiers to evaluate our method. Several comprehensive experiments have been conducted using the Reuters data set and thousands of web pages. Experimental results show that our method increases the performance (F1-measure) compared with PEBL, and a focused web crawler guided by our PSO-based classifier outperforms other several classifiers both in harvest rate and target recall.  相似文献   

16.
In this paper,a new medical image classification scheme is proposed using selforganizing map(SOM)combined with multiscale technique.It addresses the problem of the handling of edge pixels in the traditional multiscale SOM classifiers.First,to solve the difficulty in manual selection of edge pixels,a multiscale edge detection algorithm based on wavelet transform is proposed.Edge pixels detected are then selected into the training set as a new class and a multiscale SOM classifier is trained using this training set.In this new scheme,the SOM classifier can perform both the classification on the entire image and the edge detection simultaneously.On the other hand,the misclassification of the traditional multiscale SOM classifier in regions near edges is graeatly reduced and the correct classification is improved at the same time.  相似文献   

17.
集成学习是一种可以有效改善分类系统性能的数据挖掘方法。采用动态分类器集成选择算法对卷烟感官质量进行智能评估。产生包含多个基分类器的分类器池;根据基分类器在被测样本邻域内的表现选择满足要求的分类器;采用被选择的分类器产生最终的预测结果。为了验证该方法的有效性,采用国内某烟草公司提供的卷烟感官评估历史数据集进行了实验比较分析。实验结果表明,与其他方法相比,该方法获得的效果明显改善。  相似文献   

18.
PAV and the ROC convex hull   总被引:1,自引:0,他引:1  
Classifier calibration is the process of converting classifier scores into reliable probability estimates. Recently, a calibration technique based on isotonic regression has gained attention within machine learning as a flexible and effective way to calibrate classifiers. We show that, surprisingly, isotonic regression based calibration using the Pool Adjacent Violators algorithm is equivalent to the ROC convex hull method. Editor: Johannes Fürnkranz.  相似文献   

19.
Predicting future stock index price movement has always been a fascinating research area both for the investors who wish to yield a profit by trading stocks and for the researchers who attempt to expose the buried information from the complex stock market time series data. This prediction problem can be addressed as a binary classification problem with two class labels, one for the increasing movement and other for the decreasing movement. In literature, a wide range of classifiers has been tested for this application. As the performance of individual classifier varies for a diverse dataset with respect to different performance measures, it is impractical to acknowledge a specific classifier to be the best one. Hence, designing an efficient classifier ensemble instead of an individual classifier is fetching increasing attention from many researchers. Again selection of base classifiers and deciding their preferences in ensemble with respect to a variety of performance criteria can be considered as a Multi Criteria Decision Making (MCDM) problem. In this paper, an integrated TOPSIS Crow Search based weighted voting classifier ensemble is proposed for stock index price movement prediction. Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS), one of the popular MCDM techniques, is suggested for ranking and selecting a set of base classifiers for the ensemble whereas the weights of the classifiers used in the ensemble are tuned by the Crow Search method. The proposed ensemble model is validated for prediction of stock index price over the historical prices of BSE SENSEX, S&P500 and NIFTY 50 stock indices. The model has shown better performance compared to individual classifiers and other ensemble models such as majority voting, weighted voting, differential evolution and particle swarm optimization based classifier ensemble.  相似文献   

20.
实体识别常利用分类器根据记录对的字段相似度向量将记录对分为匹配、不匹配和可能匹配,因此分类器的准确性与实体识别的准确性直接相关。为提高分类准确性,本文基于重采样和集成选择技术构建一个多分类器系统。充分利用实体识别的特点,在分类之前发现分类困难的样本,并使重采样比率在一个区间内变化,生成一组重采样样本;然后用重采样后的样本训练分类器构建一个并行多分类器系统,强调分类器之间的差异度和稀疏度,从该多分类器系统中选择最优分类器子集,即最优的重采样比率组合,分别用非线性规划和极值方法求解该集成选择模型。实验结果表明,本方法与现有的多分类器系统相比具有更高的准确性。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号