期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A Tutorial on Support Vector Machines for Pattern Recognition 总被引：733，自引：4，他引：733

Christopher J.C. Burges 《Data mining and knowledge discovery》1998,2(2):121-167

The tutorial starts with an overview of the concepts of VC dimension and structural risk minimization. We then describe linear Support Vector Machines (SVMs) for separable and non-separable data, working through a non-trivial example in detail. We describe a mechanical analogy, and discuss when SVM solutions are unique and when they are global. We describe how support vector training can be practically implemented, and discuss in detail the kernel mapping technique which is used to construct SVM solutions which are nonlinear in the data. We show how Support Vector machines can have very large (even infinite) VC dimension by computing the VC dimension for homogeneous polynomial and Gaussian radial basis function kernels. While very high VC dimension would normally bode ill for generalization performance, and while at present there exists no theory which shows that good generalization performance is guaranteed for SVMs, there are several arguments which support the observed high accuracy of SVMs, which we review. Results of some experiments which were inspired by these arguments are also presented. We give numerous examples and proofs of most of the key theorems. There is new material, and I hope that the reader will find that even old material is cast in a fresh light. 相似文献

2.

Improved Generalization Through Explicit Optimization of Margins 总被引：2，自引：0，他引：2

Mason Llew Bartlett Peter L. Baxter Jonathan 《Machine Learning》2000,38(3):243-255

Recent theoretical results have shown that the generalization performance of thresholded convex combinations of base classifiers is greatly improved if the underlying convex combination has large margins on the training data (i.e., correct examples are classified well away from the decision boundary). Neural network algorithms and AdaBoost have been shown to implicitly maximize margins, thus providing some theoretical justification for their remarkably good generalization performance. In this paper we are concerned with maximizing the margin explicitly. In particular, we prove a theorem bounding the generalization performance of convex combinations in terms of general cost functions of the margin, in contrast to previous results, which were stated in terms of the particular cost function sgn( – margin). We then present a new algorithm, DOOM, for directly optimizing a piecewise-linear family of cost functions satisfying the conditions of the theorem. Experiments on several of the datasets in the UC Irvine database are presented in which AdaBoost was used to generate a set of base classifiers and then DOOM was used to find the optimal convex combination of those classifiers. In all but one case the convex combination generated by DOOM had lower test error than AdaBoost's combination. In many cases DOOM achieves these lower test errors by sacrificing training error, in the interests of reducing the new cost function. In our experiments the margin plots suggest that the size of the minimum margin is not the critical factor in determining generalization performance. 相似文献

3.

Explanation-Based Generalization: A Unifying View 总被引：36，自引：25，他引：11

Mitchell Tom M. Keller Richard M. Kedar-Cabelli Smadar T. 《Machine Learning》1986,1(1):47-80

The problem of formulating general concepts from specific training examples has long been a major focus of machine learning research. While most previous research has focused on empirical methods for generalizing from a large number of training examples using no domain-specific knowledge, in the past few years new methods have been developed for applying domain-specific knowledge to for-mulate valid generalizations from single training examples. The characteristic common to these methods is that their ability to generalize from a single example follows from their ability to explain why the training example is a member of the concept being learned. This paper proposes a general, domain-independent mechanism, called EBG, that unifies previous approaches to explanation-based generalization. The EBG method is illustrated in the context of several example problems, and used to contrast several existing systems for explanation-based generalization. The perspective on explanation-based generalization afforded by this general method is also used to identify open research problems in this area. 相似文献

4.

一种基于边界的贪心组合剪枝方法

郭华平范明职为梅《模式识别与人工智能》2013,26(2):136-143

理论及实验表明,在训练集上具有较大边界分布的组合分类器泛化能力较强。文中将边界概念引入到组合剪枝中,并用它指导组合剪枝方法的设计。基于此,构造一个度量标准(MBM)用于评估基分类器相对于组合分类器的重要性,进而提出一种贪心组合选择方法(MBMEP)以降低组合分类器规模并提高它的分类准确率。在随机选择的30个UCI数据集上的实验表明,与其它一些高级的贪心组合选择算法相比,MBMEP选择出的子组合分类器具有更好的泛化能力。相似文献

5.

Estimating Generalization Error on Two-Class Datasets Using Out-of-Bag Estimates

Bylander Tom 《Machine Learning》2002,48(1-3):287-297

For two-class datasets, we provide a method for estimating the generalization error of a bag using out-of-bag estimates. In bagging, each predictor (single hypothesis) is learned from a bootstrap sample of the training examples; the output of a bag (a set of predictors) on an example is determined by voting. The out-of-bag estimate is based on recording the votes of each predictor on those training examples omitted from its bootstrap sample. Because no additional predictors are generated, the out-of-bag estimate requires considerably less time than 10-fold cross-validation. We address the question of how to use the out-of-bag estimate to estimate generalization error on two-class datasets. Our experiments on several datasets show that the out-of-bag estimate and 10-fold cross-validation have similar performance, but are both biased. We can eliminate most of the bias in the out-of-bag estimate and increase accuracy by incorporating a correction based on the distribution of the out-of-bag votes. 相似文献

6.

Generalization and PAC learning: some new results for the class ofgeneralized single-layer networks

Holden S.B. Rayner P.J.W. 《Neural Networks, IEEE Transactions on》1995,6(2):368-380

The ability of connectionist networks to generalize is often cited as one of their most important properties. We analyze the generalization ability of the class of generalized single-layer networks (GSLNs), which includes Volterra networks, radial basis function networks, regularization networks, and the modified Kanerva model, using techniques based on the theory of probably approximately correct (PAC) learning which have previously been used to analyze the generalization ability of feedforward networks of linear threshold elements (LTEs). An introduction to the relevant computational learning theory is included. We derive necessary and sufficient conditions on the number of training examples required by a GSLN to guarantee a particular generalization performance. We compare our results to those given previously for feedforward networks of LTEs and show that, on the basis of the currently available bounds, the sufficient number of training examples for GSLNs is typically considerably less than for feedforward networks of LTEs with the same number of weights. We show that the use of self-structuring techniques for GSLNs may reduce the number of training examples sufficient to guarantee good generalization performance, and we provide an explanation for the fact that GSLNs can require a relatively large number of weights. 相似文献

7.

Subspace information criterion for model selection 总被引：7，自引：0，他引：7

Sugiyama M Ogawa H 《Neural computation》2001,13(8):1863-1889

The problem of model selection is considerably important for acquiring higher levels of generalization capability in supervised learning. In this article, we propose a new criterion for model selection, the subspace information criterion (SIC), which is a generalization of Mallows's C(L). It is assumed that the learning target function belongs to a specified functional Hilbert space and the generalization error is defined as the Hilbert space squared norm of the difference between the learning result function and target function. SIC gives an unbiased estimate of the generalization error so defined. SIC assumes the availability of an unbiased estimate of the target function and the noise covariance matrix, which are generally unknown. A practical calculation method of SIC for least-mean-squares learning is provided under the assumption that the dimension of the Hilbert space is less than the number of training examples. Finally, computer simulations in two examples show that SIC works well even when the number of training examples is small. 相似文献

8.

The Relaxed Online Maximum Margin Algorithm

Yi Li Philip M. Long 《Machine Learning》2002,46(1-3):361-387

We describe a new incremental algorithm for training linear threshold functions: the Relaxed Online Maximum Margin Algorithm, or ROMMA. ROMMA can be viewed as an approximation to the algorithm that repeatedly chooses the hyperplane that classifies previously seen examples correctly with the maximum margin. It is known that such a maximum-margin hypothesis can be computed by minimizing the length of the weight vector subject to a number of linear constraints. ROMMA works by maintaining a relatively simple relaxation of these constraints that can be efficiently updated. We prove a mistake bound for ROMMA that is the same as that proved for the perceptron algorithm. Our analysis implies that the maximum-margin algorithm also satisfies this mistake bound; this is the first worst-case performance guarantee for this algorithm. We describe some experiments using ROMMA and a variant that updates its hypothesis more aggressively as batch algorithms to recognize handwritten digits. The computational complexity and simplicity of these algorithms is similar to that of perceptron algorithm, but their generalization is much better. We show that a batch algorithm based on aggressive ROMMA converges to the fixed threshold SVM hypothesis. 相似文献

9.

一种新的支持向量机主动学习策略及其在文本分类中的应用

刘宏屠轶清黄上腾《计算机科学》2003,30(6):110-112

There are two well-known characteristics about text classification.One is that the dimension of the sample space is very high,while the number of examples available usually is very small.The other is that the example vectors are sparse.Meanwhile,we find existing support vector machines active learning approaches are subject to the influence of outliers.Based on these observations,this paper presents a new hybrid active learning approach.In this approach,to select the unlabelled example(s) to query,the learner takes into account both sparseness and high-di-mension characteristics of examples as well as its uncertainty about the examples‘‘ categorization.This way, the active learner needs less labeled examples,but still can get a good generalization performance more quickly than competing methods.Our empirical results indicate that this new approach is effective. 相似文献

10.

Theory and practice of vector quantizers trained on small trainingsets

Cohn D. Riskin E.A. Ladner R. 《IEEE transactions on pattern analysis and machine intelligence》1994,16(1):54-65

Examines how the performance of a memoryless vector quantizer changes as a function of its training set size. Specifically, the authors study how well the training set distortion predicts test distortion when the training set is a randomly drawn subset of blocks from the test or training image(s). Using the Vapnik-Chervonenkis (VC) dimension, the authors derive formal bounds for the difference of test and training distortion of vector quantizer codebooks. The authors then describe extensive empirical simulations that test these bounds for a variety of codebook sizes and vector dimensions, and give practical suggestions for determining the training set size necessary to achieve good generalization from a codebook. The authors conclude that, by using training sets comprising only a small fraction of the available data, one can produce results that are close to the results obtainable when all available data are used 相似文献