首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
This paper presents a novel reject rule for support vector classifiers, based on the receiver operating characteristic (ROC) curve. The rule minimises the expected classification cost, defined on the basis of classification and the error costs for the particular application at hand. The rationale of the proposed approach is that the ROC curve of the SVM contains all of the necessary information to find the optimal threshold values that minimise the expected classification cost. To evaluate the effectiveness of the proposed reject rule, a large number of tests has been performed on several data sets, and with different kernels. A comparison technique, based on the Wilcoxon rank sum test, has been defined and employed to provide the results at an adequate significance level. The experiments have definitely confirmed the effectiveness of the proposed reject rule.  相似文献   

2.
We introduce an embedded method that simultaneously selects relevant features during classifier construction by penalizing each feature’s use in the dual formulation of support vector machines (SVM). This approach called kernel-penalized SVM (KP-SVM) optimizes the shape of an anisotropic RBF Kernel eliminating features that have low relevance for the classifier. Additionally, KP-SVM employs an explicit stopping condition, avoiding the elimination of features that would negatively affect the classifier’s performance. We performed experiments on four real-world benchmark problems comparing our approach with well-known feature selection techniques. KP-SVM outperformed the alternative approaches and determined consistently fewer relevant features.  相似文献   

3.
Texture classification using the support vector machines   总被引:12,自引:0,他引:12  
Shutao  James T.  Hailong  Yaonan 《Pattern recognition》2003,36(12):2883-2893
In recent years, support vector machines (SVMs) have demonstrated excellent performance in a variety of pattern recognition problems. In this paper, we apply SVMs for texture classification, using translation-invariant features generated from the discrete wavelet frame transform. To alleviate the problem of selecting the right kernel parameter in the SVM, we use a fusion scheme based on multiple SVMs, each with a different setting of the kernel parameter. Compared to the traditional Bayes classifier and the learning vector quantization algorithm, SVMs, and, in particular, the fused output from multiple SVMs, produce more accurate classification results on the Brodatz texture album.  相似文献   

4.
ContextSoftware defect prediction (SDP) is an important task in software engineering. Along with estimating the number of defects remaining in software systems and discovering defect associations, classifying the defect-proneness of software modules plays an important role in software defect prediction. Several machine-learning methods have been applied to handle the defect-proneness of software modules as a classification problem. This type of “yes” or “no” decision is an important drawback in the decision-making process and if not precise may lead to misclassifications. To the best of our knowledge, existing approaches rely on fully automated module classification and do not provide a way to incorporate extra knowledge during the classification process. This knowledge can be helpful in avoiding misclassifications in cases where system modules cannot be classified in a reliable way.ObjectiveWe seek to develop a SDP method that (i) incorporates a reject option in the classifier to improve the reliability in the decision-making process; and (ii) makes it possible postpone the final decision related to rejected modules for an expert analysis or even for another classifier using extra domain knowledge.MethodWe develop a SDP method called rejoELM and its variant, IrejoELM. Both methods were built upon the weighted extreme learning machine (ELM) with reject option that makes it possible postpone the final decision of non-classified modules, the rejected ones, to another moment. While rejoELM aims to maximize the accuracy for a rejection rate, IrejoELM maximizes the F-measure. Hence, IrejoELM becomes an alternative for classification with reject option for imbalanced datasets.ResultsrejoEM and IrejoELM are tested on five datasets of source code metrics extracted from real world open-source software projects. Results indicate that rejoELM has an accuracy for several rejection rates that is comparable to some state-of-the-art classifiers with reject option. Although IrejoELM shows lower accuracies for several rejection rates, it clearly outperforms all other methods when the F-measure is used as a performance metric.ConclusionIt is concluded that rejoELM is a valid alternative for classification with reject option problems when classes are nearly equally represented. On the other hand, IrejoELM is shown to be the best alternative for classification with reject option on imbalanced datasets. Since SDP problems are usually characterized as imbalanced learning problems, the use of IrejoELM is recommended.  相似文献   

5.
Support vector machines (SVM) is an effective tool for building good credit scoring models. However, the performance of the model depends on its parameters’ setting. In this study, we use direct search method to optimize the SVM-based credit scoring model and compare it with other three parameters optimization methods, such as grid search, method based on design of experiment (DOE) and genetic algorithm (GA). Two real-world credit datasets are selected to demonstrate the effectiveness and feasibility of the method. The results show that the direct search method can find the effective model with high classification accuracy and good robustness and keep less dependency on the initial search space or point setting.  相似文献   

6.
We propose new support vector machines (SVMs) that incorporate the geometric distribution of an input data set by associating each data point with a possibilistic membership, which measures the relative strength of the self class membership. By using a possibilistic distance measure based on the possibilistic membership, we reformulate conventional SVMs in three ways. The proposed methods are shown to have better classification performance than conventional SVMs in various tests.  相似文献   

7.
Early detection of ventricular fibrillation (VF) is crucial for the success of the defibrillation therapy in automatic devices. A high number of detectors have been proposed based on temporal, spectral, and time-frequency parameters extracted from the surface electrocardiogram (ECG), showing always a limited performance. The combination ECG parameters on different domain (time, frequency, and time-frequency) using machine learning algorithms has been used to improve detection efficiency. However, the potential utilization of a wide number of parameters benefiting machine learning schemes has raised the need of efficient feature selection (FS) procedures. In this study, we propose a novel FS algorithm based on support vector machines (SVM) classifiers and bootstrap resampling (BR) techniques. We define a backward FS procedure that relies on evaluating changes in SVM performance when removing features from the input space. This evaluation is achieved according to a nonparametric statistic based on BR. After simulation studies, we benchmark the performance of our FS algorithm in AHA and MIT-BIH ECG databases. Our results show that the proposed FS algorithm outperforms the recursive feature elimination method in synthetic examples, and that the VF detector performance improves with the reduced feature set.  相似文献   

8.
In this paper we formulate a least squares version of the recently proposed twin support vector machine (TSVM) for binary classification. This formulation leads to extremely simple and fast algorithm for generating binary classifiers based on two non-parallel hyperplanes. Here we attempt to solve two modified primal problems of TSVM, instead of two dual problems usually solved. We show that the solution of the two modified primal problems reduces to solving just two systems of linear equations as opposed to solving two quadratic programming problems along with two systems of linear equations in TSVM. Classification using nonlinear kernel also leads to systems of linear equations. Our experiments on publicly available datasets indicate that the proposed least squares TSVM has comparable classification accuracy to that of TSVM but with considerably lesser computational time. Since linear least squares TSVM can easily handle large datasets, we further went on to investigate its efficiency for text categorization applications. Computational results demonstrate the effectiveness of the proposed method over linear proximal SVM on all the text corpuses considered.  相似文献   

9.
Fuzzy functions with support vector machines   总被引:1,自引:0,他引:1  
A new fuzzy system modeling (FSM) approach that identifies the fuzzy functions using support vector machines (SVM) is proposed. This new approach is structurally different from the fuzzy rule base approaches and fuzzy regression methods. It is a new alternate version of the earlier FSM with fuzzy functions approaches. SVM is applied to determine the support vectors for each fuzzy cluster obtained by fuzzy c-means (FCM) clustering algorithm. Original input variables, the membership values obtained from the FCM together with their transformations form a new augmented set of input variables. The performance of the proposed system modeling approach is compared to previous fuzzy functions approaches, standard SVM, LSE methods using an artificial sparse dataset and a real-life non-sparse dataset. The results indicate that the proposed fuzzy functions with support vector machines approach is a feasible and stable method for regression problems and results in higher performances than the classical statistical methods.  相似文献   

10.
One of the most powerful, popular and accurate classification techniques is support vector machines (SVMs). In this work, we want to evaluate whether the accuracy of SVMs can be further improved using training set selection (TSS), where only a subset of training instances is used to build the SVM model. By contrast to existing approaches, we focus on wrapper TSS techniques, where candidate subsets of training instances are evaluated using the SVM training accuracy. We consider five wrapper TSS strategies and show that those based on evolutionary approaches can significantly improve the accuracy of SVMs.  相似文献   

11.
Support vector machines (SVMs) are the effective machine-learning methods based on the structural risk minimization (SRM) principle, which is an approach to minimize the upper bound risk functional related to the generalization performance. The parameter selection is an important factor that impacts the performance of SVMs. Evolution Strategy with Covariance Matrix Adaptation (CMA-ES) is an evolutionary optimization strategy, which is used to optimize the parameters of SVMs in this paper. Compared with the traditional SVMs, the optimal SVMs using CMA-ES have more accuracy in predicting the Lorenz signal. The industry case illustrates that the proposed method is very successfully in forecasting the short-term fault of large machinery.  相似文献   

12.
We present a two-step method to speed-up object detection systems in computer vision that use support vector machines as classifiers. In the first step we build a hierarchy of classifiers. On the bottom level, a simple and fast linear classifier analyzes the whole image and rejects large parts of the background. On the top level, a slower but more accurate classifier performs the final detection. We propose a new method for automatically building and training a hierarchy of classifiers. In the second step we apply feature reduction to the top level classifier by choosing relevant image features according to a measure derived from statistical learning theory. Experiments with a face detection system show that combining feature reduction with hierarchical classification leads to a speed-up by a factor of 335 with similar classification performance.  相似文献   

13.
Support vector machines (SVM) have been showing high accuracy of prediction in many applications. However, as any statistical learning algorithm, SVM's accuracy drops if some of the training points are contaminated by an unknown source of noise. The choice of clean training points is critical to avoid the overfitting problem which occurs generally when the model is excessively complex, which is reflected by a high accuracy over the training set and a low accuracy over the testing set (unseen points). In this paper we present a new multi-level SVM architecture that splits the training set into points that are labeled as ‘easily classifiable’ which do not cause an increase in the model complexity and ‘non-easily classifiable’ which are responsible for increasing the complexity. This method is used to create an SVM architecture that yields on average a higher accuracy than a traditional soft margin SVM trained with the same training set. The architecture is tested on the well known US postal handwritten digit recognition problem, the Wisconsin breast cancer dataset and on the agitation detection dataset. The results show an increase in the overall accuracy for the three datasets. Throughout this paper the word confidence is used to denote the confidence over the decision as commonly used in the literature.  相似文献   

14.
Predicting defect-prone software modules using support vector machines   总被引:2,自引:0,他引:2  
Effective prediction of defect-prone software modules can enable software developers to focus quality assurance activities and allocate effort and resources more efficiently. Support vector machines (SVM) have been successfully applied for solving both classification and regression problems in many applications. This paper evaluates the capability of SVM in predicting defect-prone software modules and compares its prediction performance against eight statistical and machine learning models in the context of four NASA datasets. The results indicate that the prediction performance of SVM is generally better than, or at least, is competitive against the compared models.  相似文献   

15.
Selecting relevant features for support vector machine (SVM) classifiers is important for a variety of reasons such as generalization performance, computational efficiency, and feature interpretability. Traditional SVM approaches to feature selection typically extract features and learn SVM parameters independently. Independently performing these two steps might result in a loss of information related to the classification process. This paper proposes a convex energy-based framework to jointly perform feature selection and SVM parameter learning for linear and non-linear kernels. Experiments on various databases show significant reduction of features used while maintaining classification performance.  相似文献   

16.
In many pattern recognition applications, high-dimensional feature vectors impose a high computational cost as well as the risk of “overfitting”. Feature Selection addresses the dimensionality reduction problem by determining a subset of available features which is most essential for classification. This paper presents a novel feature selection method named filtered and supported sequential forward search (FS_SFS) in the context of support vector machines (SVM). In comparison with conventional wrapper methods that employ the SFS strategy, FS_SFS has two important properties to reduce the time of computation. First, it dynamically maintains a subset of samples for the training of SVM. Because not all the available samples participate in the training process, the computational cost to obtain a single SVM classifier is decreased. Secondly, a new criterion, which takes into consideration both the discriminant ability of individual features and the correlation between them, is proposed to effectively filter out nonessential features. As a result, the total number of training is significantly reduced and the overfitting problem is alleviated. The proposed approach is tested on both synthetic and real data to demonstrate its effectiveness and efficiency.  相似文献   

17.
Support Vector Machines (SVMs) have achieved very good performance on different learning problems. However, the success of SVMs depends on the adequate choice of the values of a number of parameters (e.g., the kernel and regularization parameters). In the current work, we propose the combination of meta-learning and search algorithms to deal with the problem of SVM parameter selection. In this combination, given a new problem to be solved, meta-learning is employed to recommend SVM parameter values based on parameter configurations that have been successfully adopted in previous similar problems. The parameter values returned by meta-learning are then used as initial search points by a search technique, which will further explore the parameter space. In this proposal, we envisioned that the initial solutions provided by meta-learning are located in good regions of the search space (i.e. they are closer to optimum solutions). Hence, the search algorithm would need to evaluate a lower number of candidate solutions when looking for an adequate solution. In this work, we investigate the combination of meta-learning with two search algorithms: Particle Swarm Optimization and Tabu Search. The implemented hybrid algorithms were used to select the values of two SVM parameters in the regression domain. These combinations were compared with the use of the search algorithms without meta-learning. The experimental results on a set of 40 regression problems showed that, on average, the proposed hybrid methods obtained lower error rates when compared to their components applied in isolation.  相似文献   

18.
We present new fingerprint classification algorithms based on two machine learning approaches: support vector machines (SVMs) and recursive neural networks (RNNs). RNNs are trained on a structured representation of the fingerprint image. They are also used to extract a set of distributed features of the fingerprint which can be integrated in the SVM. SVMs are combined with a new error-correcting code scheme. This approach has two main advantages: (a) It can tolerate the presence of ambiguous fingerprint images in the training set and (b) it can effectively identify the most difficult fingerprint images in the test set. By rejecting these images the accuracy of the system improves significantly. We report experiments on the fingerprint database NIST-4. Our best classification accuracy is of 95.6 percent at 20 percent rejection rate and is obtained by training SVMs on both FingerCode and RNN-extracted features. This result indicates the benefit of integrating global and structured representations and suggests that SVMs are a promising approach for fingerprint classification.  相似文献   

19.
A method of document clustering based on locality preserving indexing (LPI) and support vector machines (SVM) is presented. The document space is generally of high dimensionality, and clustering in such a high-dimensional space is often infeasible due to the curse of dimensionality. In this paper, by using LPI, the documents are projected into a lower-dimension semantic space in which the documents related to the same semantic are close to each other. Then, by using SVM, the vectors in semantic space are mapped by means of a Gaussian kernel to a high-dimensional feature space in which the minimal enclosing sphere is searched. The sphere, when mapped back to semantics space, can separate into several independent components by the support vectors, each enclosing a separate cluster of documents. By combining the LPI and SVM, not only higher clustering accuracies in a more unsupervised effective way, but also better generalization properties can be obtained. Extensive demonstrations are performed on the Reuters-21578 and TDT2 data sets. This work was supported by National Science Foundation of China under Grant 60471055, Specialized Research Fund for the Doctoral Program of Higher Education under Grant 20040614017.  相似文献   

20.
Support vector machines (SVM) based on the statistical learning theory is currently one of the most popular and efficient approaches for pattern recognition problem, because of their remarkable performance in terms of prediction accuracy. It is, however, required to choose a proper normalization method for input vectors in order to improve the system performance. Various normalization methods for SVMs have been studied in this research and the results showed that the normalization methods could affect the prediction performance. The results could be useful for determining a proper normalization method to achieve the best performance in SVMs.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号