首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Relationship Between Support Vector Set and Kernel Functions in SVM   总被引:15,自引:0,他引:15       下载免费PDF全文
Based on a constructive learning approach,covering algorithms,we investigate the relationship between support vector sets and kernel functions in support vector machines (SVM).An interesting result is obtained.That is,in the linearly non-separable case,any sample of a given sample set K can become a support vector under a certain kernel function.The result shows that when the sample set K is linearly non-separable,although the chosen kernel function satisfies Mercer‘s condition its corresponding support vector set is not necessarily the subset of K that plays a crucial role in classifying K.For a given sample set,what is the subset that plays the crucial role in classification?In order to explore the problem,a new concept,boundary or boundary points,is defined and its properties are discussed.Given a sample set K,we show that the decision functions for classifying the boundary points of K are the same as that for classifying the K itself.And the boundary points of K only depend on K and the structure of the space at which k is located and independent of the chosen approach for finding the boundary.Therefore,the boundary point set may become the subset of K that plays a crucial role in classification.These results are of importance to understand the principle of the support vector machine(SVM) and to develop new learning algorithms.  相似文献   

2.
The kernel function method in support vector machine (SVM) is an excellent tool for nonlinear classification. How to design a kernel function is difficult for an SVM nonlinear classification problem, even for the polynomial kernel function. In this paper, we propose a new kind of polynomial kernel functions, called semi-tensor product kernel (STP-kernel), for an SVM nonlinear classification problem by semi-tensor product of matrix (STP) theory. We have shown the existence of the STP-kernel function and verified that it is just a polynomial kernel. In addition, we have shown the existence of the reproducing kernel Hilbert space (RKHS) associated with the STP-kernel function. Compared to the existing methods, it is much easier to construct the nonlinear feature mapping for an SVM nonlinear classification problem via an STP operator.  相似文献   

3.
Recent machine learning techniques have demonstrated their capability for identifying image categories using image features. Among these techniques, Support Vector Machines (SVM) present good results for example in Pascal Voc challenge 2011 [8], particularly when they are associated with a kernel function [28, 35]. However, nowadays image categorization task is very challenging owing to the sizes of benchmark datasets and the number of categories to be classified. In such a context, lot of effort has to be put in the design of the kernel functions and underlying semantic features. In the following of the paper we call semantic features the features describing the (semantic) content of an image. In this paper, we propose a framework to learn an effective kernel function using the Boosting paradigm to linearly combine weak kernels. We then use a SVM with this kernel to categorize image databases. More specifically, this method create embedding functions to map images in a Hilbert space where they are better classified. Furthermore, our algorithm benefits from boosting process to learn this kernel with a complexity linear with the size of the training set. Experiments are carried out on popular benchmarks and databases to show the properties and behavior of the proposed method. On the PASCAL VOC2006 database, we compare our method to simple early fusion, and on the Oxford Flowers databases we show that our method outperforms the best Multiple Kernel Learning (MKL) techniques of the literature.  相似文献   

4.
This paper describes the use of kernel methods to classify tissue samples using near-infrared spectra in order to discriminate between samples, either with or without elastane. The aim of this real-world study is to identify an alternative method to classify textile products using near-infrared (NIR) spectroscopy in order to improve quality control, and to aid in the detection of counterfeit garments.The principles behind support vector machines (SVMs), of which the main idea is to linearly separate data, are recalled progressively in order to demonstrate that the decision function obtained is a global optimal solution of a quadratic programming problem. Generally, this solution is found after embedding data in another space F with a higher dimension by the means of a specific non-linear function, the kernel.For a selected kernel, one of the most important and difficult subjects concerning SVM is the determination of tuning parameters. Generally, different combinations of these parameters are tested in order to obtain a machine with adequate classification ability. With the kernel alignment method used in this paper, the most appropriate kernel parameters are identified rapidly. Since in many cases, data are embedded in F, a linear principal component (PC) analysis (PCA) can be considered and studied. The main properties and the algorithm of k-PCA are described here. This paper compares the results obtained in prediction for a linear classifier built in the initial space with the PCs from a PCA and those obtained in F with non-linear PCs from a k-PCA.In the present study, even if there are potentially discriminating wavelengths seen on the NIR spectra, linear discriminant analysis and soft independent modelling of class analogy results show that these wavelengths are not sufficient to build a machine with correct generalisation ability. The use of a non-linear method, such as SVM and its corollary methods, kernel alignment and k-PCA, is then justified.  相似文献   

5.
Domain adaptation learning(DAL) methods have shown promising results by utilizing labeled samples from the source(or auxiliary) domain(s) to learn a robust classifier for the target domain which has a few or even no labeled samples.However,there exist several key issues which need to be addressed in the state-of-theart DAL methods such as sufficient and effective distribution discrepancy metric learning,effective kernel space learning,and multiple source domains transfer learning,etc.Aiming at the mentioned-above issues,in this paper,we propose a unified kernel learning framework for domain adaptation learning and its effective extension based on multiple kernel learning(MKL) schema,regularized by the proposed new minimum distribution distance metric criterion which minimizes both the distribution mean discrepancy and the distribution scatter discrepancy between source and target domains,into which many existing kernel methods(like support vector machine(SVM),v-SVM,and least-square SVM) can be readily incorporated.Our framework,referred to as kernel learning for domain adaptation learning(KLDAL),simultaneously learns an optimal kernel space and a robust classifier by minimizing both the structural risk functional and the distribution discrepancy between different domains.Moreover,we extend the framework KLDAL to multiple kernel learning framework referred to as MKLDAL.Under the KLDAL or MKLDAL framework,we also propose three effective formulations called KLDAL-SVM or MKLDAL-SVM with respect to SVM and its variant μ-KLDALSVM or μ-MKLDALSVM with respect to v-SVM,and KLDAL-LSSVM or MKLDAL-LSSVM with respect to the least-square SVM,respectively.Comprehensive experiments on real-world data sets verify the outperformed or comparable effectiveness of the proposed frameworks.  相似文献   

6.
We present a mechanism to train support vector machines (SVMs) with a hybrid kernel and minimal Vapnik-Chervonenkis (VC) dimension. After describing the VC dimension of sets of separating hyperplanes in a high-dimensional feature space produced by a mapping related to kernels from the input space, we proposed an optimization criterion to design SVMs by minimizing the upper bound of the VC dimension. This method realizes a structural risk minimization and utilizes a flexible kernel function such that a superior generalization over test data can be obtained. In order to obtain a flexible kernel function, we develop a hybrid kernel function and a sufficient condition to be an admissible Mercer kernel based on common Mercer kernels (polynomial, radial basis function, two-layer neural network, etc.). The nonnegative combination coefficients and parameters of the hybrid kernel are determined subject to the minimal upper bound of the VC dimension of the learning machine. The use of the hybrid kernel results in a better performance than those with a single common kernel. Experimental results are discussed to illustrate the proposed method and show that the SVM with the hybrid kernel outperforms that with a single common kernel in terms of generalization power.  相似文献   

7.
The kernel strategy and its use for the study of combinatory logic   总被引:1,自引:0,他引:1  
Barendregt defines combinatory logic as an equational system satisfying the combinatorsS andK with ((Sx)y)z=(xz)(yz) and (Kx)y=x; the set consisting ofS andK provides abasis for all of combinatory logic. Rather than studying all of the logic, logicians often focus onfragments of the logic, subsets whose basis is obtained by replacingS orK or both by other combinators. In this article, we present a powerful new strategy, called thekernel strategy, for studying fragments in the context of questions concerned with fixed point properties. Interest in such properties rests in part with their relation to normal forms and paradoxes. We show how the kernel strategy was used to answer a number of open questions, offering abundant evidence that the availability of the kernel strategy marks a singular advance for automated reasoning. In all of our experiments with this strategy applied by an automated reasoning program, the rate of success has been impressively high and the CPU time to obtain the desired information startlingly small. For each fragment we study, we use the kernel strategy to attempt to determine whether the strong or the weak fixed point property holds. WhereA is a given fragment with basisB, the strong fixed point property holds forA if and only if there exists a combinatory such that, for all combinatorsx,yx=x(yx), wherey is expressed purely in terms of elements ofB. The weak fixed point property holds forA if and only if for all combinatorsx there exists a combinatory such thaty=xy, wherey is expressed purely in terms of the elements ofB and the combinatorx. Because the use of the kernel strategy is so effective in addressing questions focusing on either fixed point property, its formulation marks an important advance for combinatory logic. Perhaps of especial interest to logicians is an infinite class of infinite sets of tightly coupled fixed point combinators (presented here), whose unexpected discovery resulted directly from the application of the strategy by an automated reasoning program. We also offer various open questions for possible research and focus on an automated reasoning program and input files that may prove useful for such research.This work was supported by the Applied Mathematical Sciences subprogram of the Office of Energy Research, U.S. Department of Energy, under Contract W-31-109-Eng-38.  相似文献   

8.
In this paper, we continue our study of learning an optimal kernel in a prescribed convex set of kernels (Micchelli & Pontil, 2005) . We present a reformulation of this problem within a feature space environment. This leads us to study regularization in the dual space of all continuous functions on a compact domain with values in a Hilbert space with a mix norm. We also relate this problem in a special case to regularization. Editors: Olivier Bousquet and Andre Elisseeff This work was supported by NSF Grant ITR-0312113, EPSRC Grant GR/T18707/01 and by the IST Programme of the European Community, under the PASCAL Network of Excellence IST-2002-506778.  相似文献   

9.
In machine learning literature, deep learning methods have been moving toward greater heights by giving due importance in both data representation and classification methods. The recently developed multilayered arc-cosine kernel leverages the possibilities of extending deep learning features into the kernel machines. Even though this kernel has been widely used in conjunction with support vector machines (SVM) on small-size datasets, it does not seem to be a feasible solution for the modern real-world applications that involve very large size datasets. There are lot of avenues where the scalability aspects of deep kernel machines in handling large dataset need to be evaluated. In machine learning literature, core vector machine (CVM) is being used as a scaling up mechanism for traditional SVMs. In CVM, the quadratic programming problem involved in SVM is reformulated as an equivalent minimum enclosing ball problem and then solved by using a subset of training sample (Core Set) obtained by a faster \((1+\epsilon )\) approximation algorithm. This paper explores the possibilities of using principles of core vector machines as a scaling up mechanism for deep support vector machine with arc-cosine kernel. Experiments on different datasets show that the proposed system gives high classification accuracy with reasonable training time compared to traditional core vector machines, deep support vector machines with arc-cosine kernel and deep convolutional neural network.  相似文献   

10.
Gaussian mixture model (GMM) based approaches have been commonly used for speaker recognition tasks. Methods for estimation of parameters of GMMs include the expectation-maximization method which is a non-discriminative learning based method. Discriminative classifier based approaches to speaker recognition include support vector machine (SVM) based classifiers using dynamic kernels such as generalized linear discriminant sequence kernel, probabilistic sequence kernel, GMM supervector kernel, GMM-UBM mean interval kernel (GUMI) and intermediate matching kernel. Recently, the pyramid match kernel (PMK) using grids in the feature space as histogram bins and vocabulary-guided PMK (VGPMK) using clusters in the feature space as histogram bins have been proposed for recognition of objects in an image represented as a set of local feature vectors. In PMK, a set of feature vectors is mapped onto a multi-resolution histogram pyramid. The kernel is computed between a pair of examples by comparing the pyramids using a weighted histogram intersection function at each level of pyramid. We propose to use the PMK-based SVM classifier for speaker identification and verification from the speech signal of an utterance represented as a set of local feature vectors. The main issue in building the PMK-based SVM classifier is construction of a pyramid of histograms. We first propose to form hard clusters, using k-means clustering method, with increasing number of clusters at different levels of pyramid to design the codebook-based PMK (CBPMK). Then we propose the GMM-based PMK (GMMPMK) that uses soft clustering. We compare the performance of the GMM-based approaches, and the PMK and other dynamic kernel SVM-based approaches to speaker identification and verification. The 2002 and 2003 NIST speaker recognition corpora are used in evaluation of different approaches to speaker identification and verification. Results of our studies show that the dynamic kernel SVM-based approaches give a significantly better performance than the state-of-the-art GMM-based approaches. For speaker recognition task, the GMMPMK-based SVM gives a performance that is better than that of SVMs using many other dynamic kernels and comparable to that of SVMs using state-of-the-art dynamic kernel, GUMI kernel. The storage requirements of the GMMPMK-based SVMs are less than that of SVMs using any other dynamic kernel.  相似文献   

11.
ABSTRACT

In this paper, we present a general technique for solving a class of linear/nonlinear optimal control problems. In fact, an analytical solution of the state variable is represented in the form of a series in a reproducing kernel Hilbert space. Sometimes with the aid of this series form, we can also present the optimal control variable in a series form. An iterative method is given to obtain the approximate optimal control and state variables and the cost functional is numerically obtained. Convergence analysis of the method is also provided. Several numerical examples are tested to demonstrate the applicability and efficiency of the method.  相似文献   

12.
The main goal of this paper is to prove inequalities on the reconstruction error for kernel principal component analysis. With respect to previous work on this topic, our contribution is twofold: (1) we give bounds that explicitly take into account the empirical centering step in this algorithm, and (2) we show that a “localized” approach allows to obtain more accurate bounds. In particular, we show faster rates of convergence towards the minimum reconstruction error; more precisely, we prove that the convergence rate can typically be faster than n −1/2. We also obtain a new relative bound on the error. A secondary goal, for which we present similar contributions, is to obtain convergence bounds for the partial sums of the biggest or smallest eigenvalues of the kernel Gram matrix towards eigenvalues of the corresponding kernel operator. These quantities are naturally linked to the KPCA procedure; furthermore these results can have applications to the study of various other kernel algorithms. The results are presented in a functional analytic framework, which is suited to deal rigorously with reproducing kernel Hilbert spaces of infinite dimension. Editor: Nicolo Cesa-Bianchi An erratum to this article is available at .  相似文献   

13.
We propose an ensemble method for kernel machines. The training data is randomly split into a number of mutually exclusive partitions defined by a row and column parameter. Each partition forms an input space and is transformed by an automatically selected kernel function into a kernel matrix K. Subsequently, each K is used as training data for a base binary classifier (Random Forest). This results in a number of predictions equal to the number of partitions. A weighted average combines the predictions into one final prediction. To optimize the weights, a genetic algorithm is used. This approach has the advantage of simultaneously promoting (1) diversity, (2) accuracy, and (3) computational speed. (1) Diversity is fostered because the individual K’s are based on a subset of features and observations, (2) accuracy is sought by automatic kernel selection and the genetic algorithm, and (3) computational speed is obtained because the computation of each K can be parallelized. Using five times twofold cross validation we benchmark the classification performance of Kernel Factory against Random Forest and Kernel-Induced Random Forest (KIRF). We find that Kernel Factory has significantly better performance than Kernel-Induced Random Forest. When the right kernel is selected Kernel Factory is also significantly better than Random Forest. In addition, an open-source R-software package of the algorithm (kernelFactory) is available from CRAN.  相似文献   

14.
This paper focuses on the problem of how data representation influences the generalization error of kernel based learning machines like support vector machines (SVM) for classification. Frame theory provides a well founded mathematical framework for representing data in many different ways. We analyze the effects of sparse and dense data representations on the generalization error of such learning machines measured by using leave-one-out error given a finite amount of training data. We show that, in the case of sparse data representations, the generalization error of an SVM trained by using polynomial or Gaussian kernel functions is equal to the one of a linear SVM. This is equivalent to saying that the capacity of separating points of functions belonging to hypothesis spaces induced by polynomial or Gaussian kernel functions reduces to the capacity of a separating hyperplane in the input space. Moreover, we show that, in general, sparse data representations increase or leave unchanged the generalization error of kernel based methods. Dense data representations, on the contrary, reduce the generalization error in the case of very large frames. We use two different schemes for representing data in overcomplete systems of Haar and Gabor functions, and measure SVM generalization error on benchmarked data sets.  相似文献   

15.
Linear kernel Support Vector Machine Recursive Feature Elimination (SVM-RFE) is known as an excellent feature selection algorithm. Nonlinear SVM is a black box classifier for which we do not know the mapping function F{\Phi} explicitly. Thus, the weight vector w cannot be explicitly computed. In this paper, we proposed a feature selection algorithm utilizing Support Vector Machine with RBF kernel based on Recursive Feature Elimination(SVM-RBF-RFE), which expands nonlinear RBF kernel into its Maclaurin series, and then the weight vector w is computed from the series according to the contribution made to classification hyperplane by each feature. Using wi2{w_i^2} as ranking criterion, SVM-RBF-RFE starts with all the features, and eliminates one feature with the least squared weight at each step until all the features are ranked. We use SVM and KNN classifiers to evaluate nested subsets of features selected by SVM-RBF-RFE. Experimental results based on 3 UCI and 3 microarray datasets show SVM-RBF-RFE generally performs better than information gain and SVM-RFE.  相似文献   

16.
Clustering algorithms are a useful tool to explore data structures and have been employed in many disciplines. The focus of this paper is the partitioning clustering problem with a special interest in two recent approaches: kernel and spectral methods. The aim of this paper is to present a survey of kernel and spectral clustering methods, two approaches able to produce nonlinear separating hypersurfaces between clusters. The presented kernel clustering methods are the kernel version of many classical clustering algorithms, e.g., K-means, SOM and neural gas. Spectral clustering arise from concepts in spectral graph theory and the clustering problem is configured as a graph cut problem where an appropriate objective function has to be optimized. An explicit proof of the fact that these two paradigms have the same objective is reported since it has been proven that these two seemingly different approaches have the same mathematical foundation. Besides, fuzzy kernel clustering methods are presented as extensions of kernel K-means clustering algorithm.  相似文献   

17.
A classifier ensemble is a set of classifiers whose individual decisions are combined to classify new examples. Classifiers, which can represent complex decision boundaries are accurate. Kernel functions can also represent complex decision boundaries. In this paper, we study the usefulness of kernel features for decision tree ensembles as they can improve the representational power of individual classifiers. We first propose decision tree ensembles based on kernel features and found that the performance of these ensembles is strongly dependent on the kernel parameters; the selected kernel and the dimension of the kernel feature space. To overcome this problem, we present another approach to create ensembles that combines the existing ensemble methods with the kernel machine philosophy. In this approach, kernel features are created and concatenated with the original features. The classifiers of an ensemble are trained on these extended feature spaces. Experimental results suggest that the approach is quite robust to the selection of parameters. Experiments also show that different ensemble methods (Random Subspace, Bagging, Adaboost.M1 and Random Forests) can be improved by using this approach.  相似文献   

18.
Emilio G.  Sancho   ngel M.  Jose A. 《Neurocomputing》2009,72(16-18):3683
The selection of hyper-parameters in support vector machines (SVM) is a key point in the training process of these models when applied to regression problems. Unfortunately, an exact method to obtain the optimal set of SVM hyper-parameters is unknown, and search algorithms are usually applied to obtain the best possible set of hyper-parameters. In general these search algorithms are implemented as grid searches, which are time consuming, so the computational cost of the SVM training process increases considerably. This paper presents a novel study of the effect of including reductions in the range of SVM hyper-parameters, in order to reduce the SVM training time, but with the minimum possible impact in its performance. The paper presents reduction in parameter C, by considering its relation with the rest of SVM hyper-parameters (γ and ε), through an approximation of the SVM model. On the other hand, we use some characteristics of the Gaussian kernel function and a previous result in the literature to obtain novel bounds for γ and ε hyper-parameters. The search space reductions proposed are evaluated in different regression problems from UCI and StatLib databases. All the experiments carried out applying the popular LIBSVM solver have shown that our approach reduces the SVM training time, maintaining the SVM performance similar to when the complete range in SVM parameters is considered.  相似文献   

19.
This paper presents a novel regression framework to model both the translational equivalence problem and the parameter estimation problem in statistical machine translation (SMT). The proposed method kernelizes the training process by formulating the translation problem as a linear mapping among source and target word chunks (word n-grams of various length), which yields a regression problem with vector outputs. A kernel ridge regression model and a one-class classifier called maximum margin regression are explored for comparison, between which the former is proved to perform better in this task. The experimental results conceptually demonstrate its advantages of handling very high-dimensional features implicitly and flexibly. However, it shares the common drawback of kernel methods, i.e. the lack of scalability. For real-world application, a more practical solution based on locally linear regression hyperplane approximation is proposed by using online relevant training examples subsetting. In addition, we also introduce a novel way to integrate language models into this particular machine translation framework, which utilizes the language model as a penalty item in the objective function of the regression model, since its n-gram representation exactly matches the definition of our feature space.  相似文献   

20.
《国际计算机数学杂志》2012,89(14):3196-3198
In [Y.l. Wang, T. Chaolu, Z. Chen, Using reproducing kernel for solving a class of singular weakly nonlinear boundary value problems, Int. J. Comput. Math. 87(2) (2010), pp. 367–380], we present three algorithms to solve a class of ordinary differential equations boundary value problems in reproducing kernel space. It is worth noting that our methods can get the solution of partial integro-differential equation. In this note, we use method 2 [M. Dehghan, Solution of a partial integro-differential equation arising from viscoelasticity, Int. J. Comput. Math. 83(1) (2006), pp. 123–129] to solve a class of partial integro-differential equation in reproducing kernel space. Numerical example shows our method is effective and has high accuracy.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号