首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Selecting relevant features for support vector machine (SVM) classifiers is important for a variety of reasons such as generalization performance, computational efficiency, and feature interpretability. Traditional SVM approaches to feature selection typically extract features and learn SVM parameters independently. Independently performing these two steps might result in a loss of information related to the classification process. This paper proposes a convex energy-based framework to jointly perform feature selection and SVM parameter learning for linear and non-linear kernels. Experiments on various databases show significant reduction of features used while maintaining classification performance.  相似文献   

2.
We introduce an embedded method that simultaneously selects relevant features during classifier construction by penalizing each feature’s use in the dual formulation of support vector machines (SVM). This approach called kernel-penalized SVM (KP-SVM) optimizes the shape of an anisotropic RBF Kernel eliminating features that have low relevance for the classifier. Additionally, KP-SVM employs an explicit stopping condition, avoiding the elimination of features that would negatively affect the classifier’s performance. We performed experiments on four real-world benchmark problems comparing our approach with well-known feature selection techniques. KP-SVM outperformed the alternative approaches and determined consistently fewer relevant features.  相似文献   

3.
In many pattern recognition applications, high-dimensional feature vectors impose a high computational cost as well as the risk of “overfitting”. Feature Selection addresses the dimensionality reduction problem by determining a subset of available features which is most essential for classification. This paper presents a novel feature selection method named filtered and supported sequential forward search (FS_SFS) in the context of support vector machines (SVM). In comparison with conventional wrapper methods that employ the SFS strategy, FS_SFS has two important properties to reduce the time of computation. First, it dynamically maintains a subset of samples for the training of SVM. Because not all the available samples participate in the training process, the computational cost to obtain a single SVM classifier is decreased. Secondly, a new criterion, which takes into consideration both the discriminant ability of individual features and the correlation between them, is proposed to effectively filter out nonessential features. As a result, the total number of training is significantly reduced and the overfitting problem is alleviated. The proposed approach is tested on both synthetic and real data to demonstrate its effectiveness and efficiency.  相似文献   

4.
5.
This paper proposes a new classifier called density-induced margin support vector machines (DMSVMs). DMSVMs belong to a family of SVM-like classifiers. Thus, DMSVMs inherit good properties from support vector machines (SVMs), e.g., unique and global solution, and sparse representation for the decision function. For a given data set, DMSVMs require to extract relative density degrees for all training data points. These density degrees can be taken as relative margins of corresponding training data points. Moreover, we propose a method for estimating relative density degrees by using the K nearest neighbor method. We also show the upper bound on the leave-out-one error of DMSVMs for a binary classification problem and prove it. Promising results are obtained on toy as well as real-world data sets.  相似文献   

6.
The efficiency of the intrusion detection is mainly depended on the dimension of data features. By using the gradually feature removal method, 19 critical features are chosen to represent for the various network visit. With the combination of clustering method, ant colony algorithm and support vector machine (SVM), an efficient and reliable classifier is developed to judge a network visit to be normal or not. Moreover, the accuracy achieves 98.6249% in 10-fold cross validation and the average Matthews correlation coefficient (MCC) achieves 0.861161.  相似文献   

7.
特征子集选择和训练参数的优化一直是SVM研究中的两个重要方面,选择合适的特征和合理的训练参数可以提高SVM分类器的性能,以往的研究是将两个问题分别进行解决。随着遗传优化等自然计算技术在人工智能领域的应用,开始出现特征选择及参数的同时优化研究。研究采用免疫遗传算法(IGA)对特征选择及SVM 参数的同时优化,提出了一种IGA-SVM 算法。实验表明,该方法可找出合适的特征子集及SVM 参数,并取得较好的分类效果,证明算法的有效性。  相似文献   

8.
Early detection of ventricular fibrillation (VF) is crucial for the success of the defibrillation therapy in automatic devices. A high number of detectors have been proposed based on temporal, spectral, and time-frequency parameters extracted from the surface electrocardiogram (ECG), showing always a limited performance. The combination ECG parameters on different domain (time, frequency, and time-frequency) using machine learning algorithms has been used to improve detection efficiency. However, the potential utilization of a wide number of parameters benefiting machine learning schemes has raised the need of efficient feature selection (FS) procedures. In this study, we propose a novel FS algorithm based on support vector machines (SVM) classifiers and bootstrap resampling (BR) techniques. We define a backward FS procedure that relies on evaluating changes in SVM performance when removing features from the input space. This evaluation is achieved according to a nonparametric statistic based on BR. After simulation studies, we benchmark the performance of our FS algorithm in AHA and MIT-BIH ECG databases. Our results show that the proposed FS algorithm outperforms the recursive feature elimination method in synthetic examples, and that the VF detector performance improves with the reduced feature set.  相似文献   

9.
Support vector machine (SVM) is a novel pattern classification method that is valuable in many applications. Kernel parameter setting in the SVM training process, along with the feature selection, significantly affects classification accuracy. The objective of this study is to obtain the better parameter values while also finding a subset of features that does not degrade the SVM classification accuracy. This study develops a simulated annealing (SA) approach for parameter determination and feature selection in the SVM, termed SA-SVM.To measure the proposed SA-SVM approach, several datasets in UCI machine learning repository are adopted to calculate the classification accuracy rate. The proposed approach was compared with grid search which is a conventional method of performing parameter setting, and various other methods. Experimental results indicate that the classification accuracy rates of the proposed approach exceed those of grid search and other approaches. The SA-SVM is thus useful for parameter determination and feature selection in the SVM.  相似文献   

10.
As a promising method for pattern recognition and function estimation, least squares support vector machines (LS-SVM) express the training in terms of solving a linear system instead of a quadratic programming problem as for conventional support vector machines (SVM). In this paper, by using the information provided by the equality constraint, we transform the minimization problem with a single equality constraint in LS-SVM into an unconstrained minimization problem, then propose reduced formulations for LS-SVM. By introducing this transformation, the times of using conjugate gradient (CG) method, which is a greatly time-consuming step in obtaining the numerical solution, are reduced to one instead of two as proposed by Suykens et al. (1999). The comparison on computational speed of our method with the CG method proposed by Suykens et al. and the first order and second order SMO methods on several benchmark data sets shows a reduction of training time by up to 44%.  相似文献   

11.
The well-known sequential minimal optimization (SMO) algorithm is the most commonly used algorithm for numerical solutions of the support vector learning problems. At each iteration in the traditional SMO algorithm, also called 2PSMO algorithm in this paper, it jointly optimizes only two chosen parameters. The two parameters are selected either heuristically or randomly, whilst the optimization with respect to the two chosen parameters is performed analytically. The 2PSMO algorithm is naturally generalized to the three-parameter sequential minimal optimization (3PSMO) algorithm in this paper. At each iteration of this new algorithm, it jointly optimizes three chosen parameters. As in 2PSMO algorithm, the three parameters are selected either heuristically or randomly, whilst the optimization with respect to the three chosen parameters is performed analytically. Consequently, the main difference between these two algorithms is that the optimization is performed at each iteration of the 2PSMO algorithm on a line segment, whilst that of the 3PSMO algorithm on a two-dimensional region consisting of infinitely many line segments. This implies that the maximum can be attained more efficiently by 3PSMO algorithm. Main updating formulae of both algorithms for each support vector learning problem are presented. To assess the efficiency of the 3PSMO algorithm compared with the 2PSMO algorithm, 14 benchmark datasets, 7 for classification and 7 for regression, will be tested and numerical performances are compared. Simulation results demonstrate that the 3PSMO outperforms the 2PSMO algorithm significantly in both executing time and computation complexity.  相似文献   

12.
This paper presents a hybrid filter-wrapper feature subset selection algorithm based on particle swarm optimization (PSO) for support vector machine (SVM) classification. The filter model is based on the mutual information and is a composite measure of feature relevance and redundancy with respect to the feature subset selected. The wrapper model is a modified discrete PSO algorithm. This hybrid algorithm, called maximum relevance minimum redundancy PSO (mr2PSO), is novel in the sense that it uses the mutual information available from the filter model to weigh the bit selection probabilities in the discrete PSO. Hence, mr2PSO uniquely brings together the efficiency of filters and the greater accuracy of wrappers. The proposed algorithm is tested over several well-known benchmarking datasets. The performance of the proposed algorithm is also compared with a recent hybrid filter-wrapper algorithm based on a genetic algorithm and a wrapper algorithm based on PSO. The results show that the mr2PSO algorithm is competitive in terms of both classification accuracy and computational performance.  相似文献   

13.
基于网格模式搜索的支持向量机模型选择   总被引:2,自引:0,他引:2  
支持向量机的模型选择问题就是对于一个给定的核函数,调节核参数和惩罚因子C。分析了网格搜索算法和模式搜索算法,通过结合上述两种算法的优点提出了网格模式搜索算法。其核心原理是先用网格算法在全局范围内进行快速搜索,找到最优解的最小区间,再在这个最小区间内用模式搜索算法找到最优解。实验证明,网格模式搜索具有学习精度高和速度快的优点。  相似文献   

14.
Feature selection for text categorization is a well-studied problem and its goal is to improve the effectiveness of categorization, or the efficiency of computation, or both. The system of text categorization based on traditional term-matching is used to represent the vector space model as a document; however, it needs a high dimensional space to represent the document, and does not take into account the semantic relationship between terms, which leads to a poor categorization accuracy. The latent semantic indexing method can overcome this problem by using statistically derived conceptual indices to replace the individual terms. With the purpose of improving the accuracy and efficiency of categorization, in this paper we propose a two-stage feature selection method. Firstly, we apply a novel feature selection method to reduce the dimension of terms; and then we construct a new semantic space, between terms, based on the latent semantic indexing method. Through some applications involving the spam database categorization, we find that our two-stage feature selection method performs better.  相似文献   

15.
We present a two-step method to speed-up object detection systems in computer vision that use support vector machines as classifiers. In the first step we build a hierarchy of classifiers. On the bottom level, a simple and fast linear classifier analyzes the whole image and rejects large parts of the background. On the top level, a slower but more accurate classifier performs the final detection. We propose a new method for automatically building and training a hierarchy of classifiers. In the second step we apply feature reduction to the top level classifier by choosing relevant image features according to a measure derived from statistical learning theory. Experiments with a face detection system show that combining feature reduction with hierarchical classification leads to a speed-up by a factor of 335 with similar classification performance.  相似文献   

16.
基于启发式遗传算法的SVM模型自动选择   总被引:6,自引:0,他引:6  
支撑矢量机(SVM)模型的自动选择是其实际应用的关键.常用的基于穷举搜索的留一法(LOO)很繁杂且效率很低.到目前为止,大多数的算法并不能有效地实现模型自动选择.本文利用实值编码的启发式遗传算法实现基于高斯核函数的SVM模型自动选择.在重点分析了SVM超参数对其性能的影响和两种SVM性能估计的基础上,确定了合适的遗传算法适应度函数.人造数据及实际数据的仿真结果表明了所提方法的可行性和高效性.  相似文献   

17.
One of the most powerful, popular and accurate classification techniques is support vector machines (SVMs). In this work, we want to evaluate whether the accuracy of SVMs can be further improved using training set selection (TSS), where only a subset of training instances is used to build the SVM model. By contrast to existing approaches, we focus on wrapper TSS techniques, where candidate subsets of training instances are evaluated using the SVM training accuracy. We consider five wrapper TSS strategies and show that those based on evolutionary approaches can significantly improve the accuracy of SVMs.  相似文献   

18.
Support vector machines (SVMs) are a class of popular classification algorithms for their high generalization ability. However, it is time-consuming to train SVMs with a large set of learning samples. Improving learning efficiency is one of most important research tasks on SVMs. It is known that although there are many candidate training samples in some learning tasks, only the samples near decision boundary which are called support vectors have impact on the optimal classification hyper-planes. Finding these samples and training SVMs with them will greatly decrease training time and space complexity. Based on the observation, we introduce neighborhood based rough set model to search boundary samples. Using the model, we firstly divide sample spaces into three subsets: positive region, boundary and noise. Furthermore, we partition the input features into four subsets: strongly relevant features, weakly relevant and indispensable features, weakly relevant and superfluous features, and irrelevant features. Then we train SVMs only with the boundary samples in the relevant and indispensable feature subspaces, thus feature and sample selection is simultaneously conducted with the proposed model. A set of experimental results show the model can select very few features and samples for training; in the mean time the classification performances are preserved or even improved.  相似文献   

19.
Model selection for support vector machines via uniform design   总被引:2,自引:0,他引:2  
The problem of choosing a good parameter setting for a better generalization performance in a learning task is the so-called model selection. A nested uniform design (UD) methodology is proposed for efficient, robust and automatic model selection for support vector machines (SVMs). The proposed method is applied to select the candidate set of parameter combinations and carry out a k-fold cross-validation to evaluate the generalization performance of each parameter combination. In contrast to conventional exhaustive grid search, this method can be treated as a deterministic analog of random search. It can dramatically cut down the number of parameter trials and also provide the flexibility to adjust the candidate set size under computational time constraint. The key theoretic advantage of the UD model selection over the grid search is that the UD points are “far more uniform”and “far more space filling” than lattice grid points. The better uniformity and space-filling phenomena make the UD selection scheme more efficient by avoiding wasteful function evaluations of close-by patterns. The proposed method is evaluated on different learning tasks, different data sets as well as different SVM algorithms.  相似文献   

20.
In the areas of investment research and applications, feasible quantitative models include methodologies stemming from soft computing for prediction of financial time series, multi-objective optimization of investment return and risk reduction, as well as selection of investment instruments for portfolio management based on asset ranking using a variety of input variables and historical data, etc. Among all these, stock selection has long been identified as a challenging and important task. This line of research is highly contingent upon reliable stock ranking for successful portfolio construction. Recent advances in machine learning and data mining are leading to significant opportunities to solve these problems more effectively. In this study, we aim at developing a methodology for effective stock selection using support vector regression (SVR) as well as genetic algorithms (GAs). We first employ the SVR method to generate surrogates for actual stock returns that in turn serve to provide reliable rankings of stocks. Top-ranked stocks can thus be selected to form a portfolio. On top of this model, the GA is employed for the optimization of model parameters, and feature selection to acquire optimal subsets of input variables to the SVR model. We will show that the investment returns provided by our proposed methodology significantly outperform the benchmark. Based upon these promising results, we expect this hybrid GA-SVR methodology to advance the research in soft computing for finance and provide an effective solution to stock selection in practice.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号