首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
In this note we use examples from the literature to illustrate some poor practices in assessing the performance of supervised classification rules, and we suggest guidelines for better methodology. We also describe a new assessment criterion that is suitable for the needs of many practical problems.  相似文献   

2.
Toward optimal classifier system performance in non-Markov environments   总被引:2,自引:0,他引:2  
Wilson's (1994) bit-register memory scheme was incorporated into the XCS classifier system and investigated in a series of non-Markov environments. Two extensions to the scheme were important in obtaining near-optimal performance in the harder environments. The first was an exploration strategy in which exploration of external actions was probabilistic as in Markov environments, but internal "actions" (register settings) were selected deterministically. The second was use of a register having more bit-positions than were strictly necessary to resolve environmental aliasing. The origins and effects of the two extensions are discussed.  相似文献   

3.
4.
The identification of significant attributes is of major importance to the performance of a variety of Learning Classifier Systems including the newly-emerged Bioinformatics-oriented Hierarchical Evolutionary Learning (BioHEL) algorithm. However, the BioHEL fails to deliver on a set of synthetic datasets which are the checkerboard data mixed with Gaussian noises due to the fact the significant attributes were not successfully recognised. To address this issue, a univariate Estimation of Distribution Algorithm (EDA) technique is introduced to BioHEL which primarily builds a probabilistic model upon the outcome of the generalization and specialization operations. The probabilistic model which estimates the significance of each attribute provides guidance for the exploration of the problem space. Experiment evaluations showed that the proposed BioHEL systems achieved comparable performance to the conventional one on a number of real-world small-scale datasets. Research efforts were also made on finding the optimal parameter for the traditional and proposed BioHEL systems.  相似文献   

5.
On-line learning systems which use incoming batches of training examples to induce rules for a classification task, such as credit card fraud detection, may have to deal with concept drift whereby some of the underlying class definitions change over time. Identifying drift against a background of noise and maintaining accuracy of the learned rules are challenging tasks.We propose a methodology for handling these problems based on the assessment of relevance of a time-stamp attribute (TSAR). In place of the time-windowing of examples that tends to be used in current approaches, we employ a new purging mechanism to remove examples that are no longer valid but retain valid examples regardless of age. This allows the example base to grow thus facilitating good classification.We describe one particular TSAR algorithm, CD3, which utilises ID3 with post pruning. We report on trials that show CD3 can cope very well in a variety of batch-drift scenarios.  相似文献   

6.
秦锋  罗慧  程泽凯  任诗流  陈莉 《计算机工程与设计》2007,28(24):5919-5920,5972
分类器评估一般采用准确性评估.理论证明,基于AUC方法评估分类器优于准确性评估方法,但该方法局限于二类分类问题.提出一种将二类分类问题推广到多类分类问题的新方法,用纠错输出码转换得到转换矩阵,通过转换矩阵把多类分类问题转换成二类分类问题,计算二类分类的平均值来评估分类器的性能.新方法在MBNC实验平台下编程实现,并评估贝叶斯分类器的性能,实验结果表明,这种方法是有效的.  相似文献   

7.
特征权对贝叶斯分类器文本分类性能的影响   总被引:1,自引:0,他引:1  
高秀梅  陈芳  宋枫溪  金忠 《计算机应用》2008,28(12):3080-3083
在文本分类研究中,人们希望用特征权来改善文本分类效果。以最优分类器——贝叶斯分类器为基准分类器,研究了特征权对文本分类性能的可能影响。理论推导表明,就最优分类器而言,特征权不能有效提高文本分类效果。  相似文献   

8.
Cost curves: An improved method for visualizing classifier performance   总被引:10,自引:0,他引:10  
This paper introduces cost curves, a graphical technique for visualizing the performance (error rate or expected cost) of 2-class classifiers over the full range of possible class distributions and misclassification costs. Cost curves are shown to be superior to ROC curves for visualizing classifier performance for most purposes. This is because they visually support several crucial types of performance assessment that cannot be done easily with ROC curves, such as showing confidence intervals on a classifier's performance, and visualizing the statistical significance of the difference in performance of two classifiers. A software tool supporting all the cost curve analysis described in this paper is available from the authors. Editors: Tom Faweett  相似文献   

9.
Text data mining is a process of exploratory data analysis. Classification maps data into predefined groups or classes. It is often referred to as supervised learning because the classes are determined before examining the data. This paper describes the proposed k-Nearest Neighbor classifier that performs comparative cross-validation for the existing k-Nearest Neighbor classifier. The feasibility and the benefits of the proposed approach are demonstrated by means of data mining problem: direct marketing. Direct marketing has become an important application field of data mining. Comparative cross-validation involves estimation of accuracy by either stratified k-fold cross-validation or equivalent repeated random subsampling. While the proposed method may have a high bias; its performance (accuracy estimation in our case) may be poor due to a high variance. Thus the accuracy with the proposed k-Nearest Neighbor classifier was less than that with the existing k-Nearest Neighbor classifier, and the smaller the improvement in runtime the larger the improvement in precision and recall. In our proposed method we have determined the classification accuracy and prediction accuracy where the prediction accuracy is comparatively high.  相似文献   

10.
This paper presents a simulation comparing various resampling procedures for estimating classification error rate. The simulations were done for small sample sizes, for the two-class and three-class problems.  相似文献   

11.
《Pattern recognition letters》2002,23(1-3):227-233
The problem studied is the behavior of a discrete classifier on a finite learning sample. With naive Bayes approach, the value of misclassification probability is represented as a random function, for which the first two moments are analytically derived. For arbitrary distributions, this allows evaluating learning sample size sufficient for the classification with given admissible misclassification probability and confidence level. The comparison with statistical learning theory shows that the suggested approach frequently recommends significantly smaller learning sample size.  相似文献   

12.
This study presents an approach to predict the performance of sales agents of a call center dedicated exclusively to sales and telemarketing activities. This approach is based on a naive Bayesian classifier. The objective is to know what levels of the attributes are indicative of individuals who perform well. A sample of 1037 sales agents was taken during the period between March and September of 2009 on campaigns related to insurance sales and service pre-paid phone services, to build the naive Bayes network. It has been shown that, socio-demographic attributes are not suitable for predicting performance. Alternatively, operational records were used to predict production of sales agents, achieving satisfactory results. In this case, the classifier training and testing is done through a stratified tenfold cross-validation. It classified the instances correctly 80.60% of times, with the proportion of false positives of 18.1% for class no (does not achieve minimum) and 20.8% for the class yes (achieves equal or above minimum acceptable). These results suggest that socio-demographic attributes has no predictive power on performance, while the operational information of the activities of the sale agent can predict the future performance of the agent.  相似文献   

13.
This paper introduces different classification systems based on artificial neural networks for the automatic detection of epileptic spikes in electroencephalogram records. Different multilayer perceptron networks are constructed and trained with different algorithms. The inputs of the networks consist of either raw data or extracted features. To improve the generalization performance of the classifiers, “training with noise” method is used whereby new training data is constructed by adding uncorrelated Gaussian noise to real data. The performances of the constructed classifiers are examined and compared both with each other and with other similar systems found in literature based on sensitivity, specificity and selectivity measures.  相似文献   

14.
Brain-computer interface performance is estimated using a model based on the detection of steady-state visual evoked potentials (SSVEPs). It is established that the most significant parameters determining if the SSVEP-based brain-computer interfaces can be used in principle are the ratio of the number of samples in the analyzed signal to the sampling rate and the frequency range in which SSVEPs are detected. If it is necessary to identify the factors that limit the performance of the interface, then the ratio of the frequency range to the number of possible frequencies of the SSVEPs and the ratio of the number of samples in the analyzed signal to the sampling rate are significant predictors. The results presented in this paper make it possible to simulate parameters of brain-computer interfaces on the basis of requirements for a particular device and its capabilities. This makes it possible to design simpler hardware and software for specific tasks and reduce debugging time.  相似文献   

15.
Classification techniques development constitutes a foundation for machine learning evolution, which has become a major part of the current mainstream of Artificial Intelligence research lines. However, the computational cost associated with these techniques limits their use in resource constrained embedded platforms. As the classification task is often combined with other high computational cost functions, efficient performance of the main modules is fundamental requirements to achieve hard real-time speed for the whole system. Graph-based machine learning techniques offer a powerful framework for building classifiers. Optimum-Path Forest (OPF) is a graph-based classifier presenting the interesting ability to provide nonlinear classes separation surfaces. This work proposes a SoC/FPGA based design and implementation of an architecture for embedded applications, presenting a hardware converted algorithm for an OPF classifier. Comparison of the achieved results with an embedded processor software implementation shows accelerations of the OPF classification from 2.18 to 9 times, which permits to expect real-time performance to embedded applications.  相似文献   

16.
最小距离分类器的改进算法--加权最小距离分类器   总被引:12,自引:0,他引:12  
任靖  李春平 《计算机应用》2005,25(5):992-994
最小距离分类器是一种简单而有效的分类方法。为了提高最小距离分类器的分类性能,主要的改进方法是选择更有效的距离度量。通过分析多重限制分类器和决策树分类器的分类原则,提出了基于标准化欧式距离的加权最小距离分类器。该分类器通过对标称型和字符串型属性的距离的加权定义。以及增加属性值的范围约束,扩大了最小标准化欧式距离分类器的适用范围,同时提高了其分类准确率。实验结果表明,加权最小距离分类器具有较高的分类准确率。  相似文献   

17.
Ping  Tien D.  Ching Y. 《Pattern recognition》2007,40(12):3415-3429
This paper presents a novel cascade ensemble classifier system for the recognition of handwritten digits. This new system aims at attaining a very high recognition rate and a very high reliability at the same time, in other words, achieving an excellent recognition performance of handwritten digits. The trade-offs among recognition, error, and rejection rates of the new recognition system are analyzed. Three solutions are proposed: (i) extracting more discriminative features to attain a high recognition rate, (ii) using ensemble classifiers to suppress the error rate and (iii) employing a novel cascade system to enhance the recognition rate and to reduce the rejection rate. Based on these strategies, seven sets of discriminative features and three sets of random hybrid features are extracted and used in the different layers of the cascade recognition system. The novel gating networks (GNs) are used to congregate the confidence values of three parallel artificial neural networks (ANNs) classifiers. The weights of the GNs are trained by the genetic algorithms (GAs) to achieve the overall optimal performance. Experiments conducted on the MNIST handwritten numeral database are shown with encouraging results: a high reliability of 99.96% with minimal rejection, or a 99.59% correct recognition rate without rejection in the last cascade layer.  相似文献   

18.
A classifier ensemble combines a set of individual classifier’s predictions to produce more accurate results than that of any single classifier system. However, one classifier ensemble with too many classifiers may consume a large amount of computational time. This paper proposes a new ensemble subset evaluation method that integrates classifier diversity measures into a novel classifier ensemble reduction framework. The framework converts the ensemble reduction into an optimization problem and uses the harmony search algorithm to find the optimized classifier ensemble. Both pairwise and non-pairwise diversity measure algorithms are applied by the subset evaluation method. For the pairwise diversity measure, three conventional diversity algorithms and one new diversity measure method are used to calculate the diversity’s merits. For the non-pairwise diversity measure, three classical algorithms are used. The proposed subset evaluation methods are demonstrated by the experimental data. In comparison with other classifier ensemble methods, the method implemented by the measurement of the interrater agreement exhibits a high accuracy prediction rate against the current ensembles’ performance. In addition, the framework with the new diversity measure achieves relatively good performance with less computational time.  相似文献   

19.
The accuracy-based XCS classifier system has been shown to solve typical data mining problems in a machine-learning competitive way. However, successful applications in multistep problems, modeled by a Markov decision process, were restricted to very small problems. Until now, the temporal difference learning technique in XCS was based on deterministic updates. However, since a prediction is actually generated by a set of rules in XCS and Learning Classifier Systems in general, gradient-based update methods are applicable. The extension of XCS to gradient-based update methods results in a classifier system that is more robust and more parameter independent, solving large and difficult maze problems reliably. Additionally, the extension to gradient methods highlights the relation of XCS to other function approximation methods in reinforcement learning.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号