首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 371 毫秒
1.
We establish some general results concerning PAC learning: We find a characterization of the property that any consistent algorithm is PAC. It is shown that the shrinking width property is equivalent to PUAC learnability. By counterexample, PAC and PUAC learning are shown to be different concepts. We find conditions ensuring that any nearly consistent algorithm is PAC or PUAC, respectively.?The VC dimension of recurrent neural networks and folding networks is infinite. For restricted inputs, however, bounds exist. The bounds for restricted inputs are transferred to folding networks.?We find conditions on the probability of the input space ensuring polynomial learnability: the probability of sequences or trees has to converge to zero sufficiently fast with increasing length or height.?Finally, we find an example for a concept class that requires exponentially growing sample sizes for accurate generalization. Date received: September 5, 1997. Date revised: May 29, 1998.  相似文献   

2.
Parameter convergence and learning curves for neural networks   总被引:1,自引:0,他引:1  
We revisit the oft-studied asymptotic (in sample size) behavior of the parameter or weight estimate returned by any member of a large family of neural network training algorithms. By properly accounting for the characteristic property of neural networks that their empirical and generalization errors possess multiple minima, we rigorously establish conditions under which the parameter estimate converges strongly into the set of minima of the generalization error. Convergence of the parameter estimate to a particular value cannot be guaranteed under our assumptions. We then evaluate the asymptotic distribution of the distance between the parameter estimate and its nearest neighbor among the set of minima of the generalization error. Results on this question have appeared numerous times and generally assert asymptotic normality, the conclusion expected from familiar statistical arguments concerned with maximum likelihood estimators. These conclusions are usually reached on the basis of somewhat informal calculations, although we shall see that the situation is somewhat delicate. The preceding results then provide a derivation of learning curves for generalization and empirical errors that leads to bounds on rates of convergence.  相似文献   

3.
A tight bound on concept learning   总被引:1,自引:0,他引:1  
A tight bound on the generalization performance of concept learning is shown by a novel approach. Unlike existing theories, the new approach uses no assumption on large sample size as in Bayesian approach and does not consider the uniform learnability as in the VC dimension analysis. We analyze the generalization performance of some particular learning algorithm that is not necessarily well behaved, in the hope that once learning curves or sample complexity of this algorithm is obtained, it is applicable to real learning situations. The result is expressed in a dimension called Boolean interpolation dimension, and is tight in the sense that it meets the lower bound requirement of Baum and Haussler (1989). The Boolean interpolation dimension is not greater than the number of modifiable system parameters, and definable for almost all the real-world networks such as backpropagation networks and linear threshold multilayer networks. It is shown that the generalization error follows from a beta distribution of parameters m, the number of training examples, and d, the Boolean interpolation dimension. This implies that for large d, the learning results tend to the average-case result, known as the self-averaging property of the learning. The bound is shown to be applicable to the practical learning algorithms that can be modeled by the Gibbs algorithm with a uniform prior. The result is also extended to the case of inconsistent learning.  相似文献   

4.
We derive generalization bounds for learning algorithms based on their robustness: the property that if a testing sample is “similar” to a training sample, then the testing error is close to the training error. This provides a novel approach, different from complexity or stability arguments, to study generalization of learning algorithms. One advantage of the robustness approach, compared to previous methods, is the geometric intuition it conveys. Consequently, robustness-based analysis is easy to extend to learning in non-standard setups such as Markovian samples or quantile loss. We further show that a weak notion of robustness is both sufficient and necessary for generalizability, which implies that robustness is a fundamental property that is required for learning algorithms to work.  相似文献   

5.
M Ishii  I Kumazawa 《Neural computation》2001,13(12):2851-2863
In this article, we present a technique to improve the generalization ability of multilayer neural networks. The proposed method introduces linear constraints on weight representation based on the invariance natures of training targets. We propose a learning method that introduces effective linear constraints into an error function as a penalty term. Furthermore, introduction of such constraints leads to reduction of the VC dimension of neural networks. We show bounds on the VC dimension of the neural networks with such constraints. Finally, we demonstrate the effectiveness of the proposed method by some experiments.  相似文献   

6.
分析了神经网络集成泛化误差、个体神经网络泛化误差、个体神经网络差异度之间的关系,提出了一种个体神经网络主动学习方法.个体神经网络同时交互训练,既满足了个体神经网络的精度要求,又满足了个体神经网络的差异性要求.另外,给出了一种个体神经网络选择性集成方法,对个体神经网络加入偏置量,增加了个体神经网络的可选数量,降低了神经网络集成的泛化误差.理论分析和实验结果表明,使用这种个体神经网络训练方法、个体神经网络选择性集成方法能够构建有效的神经网络集成系统.  相似文献   

7.
The ability of connectionist networks to generalize is often cited as one of their most important properties. We analyze the generalization ability of the class of generalized single-layer networks (GSLNs), which includes Volterra networks, radial basis function networks, regularization networks, and the modified Kanerva model, using techniques based on the theory of probably approximately correct (PAC) learning which have previously been used to analyze the generalization ability of feedforward networks of linear threshold elements (LTEs). An introduction to the relevant computational learning theory is included. We derive necessary and sufficient conditions on the number of training examples required by a GSLN to guarantee a particular generalization performance. We compare our results to those given previously for feedforward networks of LTEs and show that, on the basis of the currently available bounds, the sufficient number of training examples for GSLNs is typically considerably less than for feedforward networks of LTEs with the same number of weights. We show that the use of self-structuring techniques for GSLNs may reduce the number of training examples sufficient to guarantee good generalization performance, and we provide an explanation for the fact that GSLNs can require a relatively large number of weights.  相似文献   

8.
We study the leave-one-out and generalization errors of voting combinations of learning machines. A special case considered is a variant of bagging. We analyze in detail combinations of kernel machines, such as support vector machines, and present theoretical estimates of their leave-one-out error. We also derive novel bounds on the stability of combinations of any classifiers. These bounds can be used to formally show that, for example, bagging increases the stability of unstable learning machines. We report experiments supporting the theoretical findings.  相似文献   

9.
We consider the generalization error of concept learning when using a fixed Boolean function of the outputs of a number of different classifiers. Here, we take into account the ‘margins’ of each of the constituent classifiers. A special case is that in which the constituent classifiers are linear threshold functions (or perceptrons) and the fixed Boolean function is the majority function. This corresponds to a ‘committee of perceptrons,’ an artificial neural network (or circuit) consisting of a single layer of perceptrons (or linear threshold units) in which the output of the network is defined to be the majority output of the perceptrons. Recent work of Auer et al. studied the computational properties of such networks (where they were called ‘parallel perceptrons’), proposed an incremental learning algorithm for them, and demonstrated empirically that the learning rule is effective. As a corollary of the results presented here, generalization error bounds are derived for this special case that provide further motivation for the use of this learning rule.  相似文献   

10.
This paper concerns learning binary-valued functions defined on R, and investigates how a particular type of ‘regularity’ of hypotheses can be used to obtain better generalization error bounds. We derive error bounds that depend on the sample width (a notion analogous to that of sample margin for real-valued functions). This motivates learning algorithms that seek to maximize sample width.  相似文献   

11.
We consider the problem of calculating learning curves (i.e., average generalization performance) of gaussian processes used for regression. On the basis of a simple expression for the generalization error, in terms of the eigenvalue decomposition of the covariance function, we derive a number of approximation schemes. We identify where these become exact and compare with existing bounds on learning curves; the new approximations, which can be used for any input space dimension, generally get substantially closer to the truth. We also study possible improvements to our approximations. Finally, we use a simple exactly solvable learning scenario to show that there are limits of principle on the quality of approximations and bounds expressible solely in terms of the eigenvalue spectrum of the covariance function.  相似文献   

12.
The generalization error bounds found by current error models using the number of effective parameters of a classifier and the number of training samples are usually very loose. These bounds are intended for the entire input space. However, support vector machine (SVM), radial basis function neural network (RBFNN), and multilayer perceptron neural network (MLPNN) are local learning machines for solving problems and treat unseen samples near the training samples to be more important. In this paper, we propose a localized generalization error model which bounds from above the generalization error within a neighborhood of the training samples using stochastic sensitivity measure. It is then used to develop an architecture selection technique for a classifier with maximal coverage of unseen samples by specifying a generalization error threshold. Experiments using 17 University of California at Irvine (UCI) data sets show that, in comparison with cross validation (CV), sequential learning, and two other ad hoc methods, our technique consistently yields the best testing classification accuracy with fewer hidden neurons and less training time.  相似文献   

13.
The approach of learning multiple “related” tasks simultaneously has proven quite successful in practice; however, theoretical justification for this success has remained elusive. The starting point for previous work on multiple task learning has been that the tasks to be learned jointly are somehow “algorithmically related”, in the sense that the results of applying a specific learning algorithm to these tasks are assumed to be similar. We offer an alternative approach, defining relatedness of tasks on the basis of similarity between the example generating distributions that underlie these tasks. We provide a formal framework for this notion of task relatedness, which captures a sub-domain of the wide scope of issues in which one may apply a multiple task learning approach. Our notion of task similarity is relevant to a variety of real life multitask learning scenarios and allows the formal derivation of generalization bounds that are strictly stronger than the previously known bounds for both the learning-to-learn and the multitask learning scenarios. We give precise conditions under which our bounds guarantee generalization on the basis of smaller sample sizes than the standard single-task approach. Editors: Daniel Silver, Kristin Bennett, Richard Caruana. A preliminary version of this paper appears in the proceedings of COLT’03, (Ben-David and Schuller 2003).  相似文献   

14.
In recent years, variational Bayesian learning has been used as an approximation of Bayesian learning. In spite of the computational tractability and good generalization in many applications, its statistical properties have yet to be clarified. In this paper, we focus on variational Bayesian learning of Bayesian networks which are widely used in information processing and uncertain artificial intelligence. We derive upper bounds for asymptotic variational free energy or stochastic complexities of bipartite Bayesian networks with discrete hidden variables. Our result theoretically supports the effectiveness of variational Bayesian learning as an approximation of Bayesian learning.  相似文献   

15.
Aoyagi M  Nagata K 《Neural computation》2012,24(6):1569-1610
The term algebraic statistics arises from the study of probabilistic models and techniques for statistical inference using methods from algebra and geometry (Sturmfels, 2009 ). The purpose of our study is to consider the generalization error and stochastic complexity in learning theory by using the log-canonical threshold in algebraic geometry. Such thresholds correspond to the main term of the generalization error in Bayesian estimation, which is called a learning coefficient (Watanabe, 2001a , 2001b ). The learning coefficient serves to measure the learning efficiencies in hierarchical learning models. In this letter, we consider learning coefficients for Vandermonde matrix-type singularities, by using a new approach: focusing on the generators of the ideal, which defines singularities. We give tight new bound values of learning coefficients for the Vandermonde matrix-type singularities and the explicit values with certain conditions. By applying our results, we can show the learning coefficients of three-layered neural networks and normal mixture models.  相似文献   

16.
Langford  John  Blum  Avrim 《Machine Learning》2003,51(2):165-179
A major topic in machine learning is to determine good upper bounds on the true error rates of learned hypotheses based upon their empirical performance on training data. In this paper, we demonstrate new adaptive bounds designed for learning algorithms that operate by making a sequence of choices. These bounds, which we call Microchoice bounds, are similar to Occam-style bounds and can be used to make learning algorithms self-bounding in the style of Freund (1998). We then show how to combine these bounds with Freund's query-tree approach producing a version of Freund's query-tree structure that can be implemented with much more algorithmic efficiency.  相似文献   

17.
In this letter, we study online learning in neural networks (NNs) obtained by approximating Bayesian learning. The approach is applied to Gibbs learning with the Rosenblatt potential in a nonstationary environment. The online scheme is obtained by the minimization (maximization) of the Kullback-Leibler divergence (cross entropy) between the true posterior distribution and the parameterized one. The complexity of the learning algorithm is further decreased by projecting the posterior onto a Gaussian distribution and imposing a spherical covariance matrix. We study in detail the particular case of learning linearly separable rules. In the case of a fixed rule, we observe an asymptotic generalization error egpropalpha-1 for both the spherical and the full covariance matrix approximations. However, in the case of drifting rule, only the full covariance matrix algorithm shows a good performance. This good performance is indeed a surprise since the algorithm is obtained by projecting without the benefit of the extra information on drifting  相似文献   

18.
Accurate prediction of the generalization ability of a learning algorithm is an important problem in computational learning theory. The classical Vapnik-Chervonenkis (VC) generalization bounds are too general and therefore overestimate the expected error. Recently obtained data-dependent bounds are still overestimated. To find out why the bounds are loose, we reject the uniform convergence principle and apply a purely combinatorial approach that is free of any probabilistic assumptions, makes no approximations, and provides an empirical control of looseness. We introduce new data-dependent complexity measures: a local shatter coefficient and a nonscalar local shatter profile, which can give much tighter bounds than the classical VC shatter coefficient. An experiment on real datasets shows that the effective local measures may take very small values; thus, the effective local VC dimension takes values in [0, 1] and therefore is not related to the dimension of the space. Konstantin Vorontsov. Born 1971. Graduated from the Faculty of Control and Applied Mathematics, Moscow Institute of Physics and Technology, in 1994. Received candidate’s degree in 1999. Currently is with the Dorodnicyn Computing Centre, Russian Academy of Sciences. Deputy director for research of Forecsys company (). Scientific interests: computational learning theory, machine learning, data mining, probability theory, and combinatorics. Author of 40 papers. Homepage:  相似文献   

19.
The lower bounds for the a posteriori prediction error of a nonlinear predictor realized as a neural network are provided. These are obtained for a priori adaptation and a posteriori error networks with sigmoid nonlinearities trained by gradient-descent learning algorithms. A contractivity condition is imposed on a nonlinear activation function of a neuron so that the a posteriori prediction error is smaller in magnitude than the corresponding a priori one. Furthermore, an upper bound is imposed on the learning rate eta so that the approach is feasible. The analysis is undertaken for both feedforward and recurrent nonlinear predictors realized as neural networks.  相似文献   

20.
The PAC-learning model is distribution-independent in the sense that the learner must reach a learning goal with a limited number of labeled random examples without any prior knowledge of the underlying domain distribution. In order to achieve this, one needs generalization error bounds that are valid uniformly for every domain distribution. These bounds are (almost) tight in the sense that there is a domain distribution which does not admit a generalization error being significantly smaller than the general bound. Note however that this leaves open the possibility to achieve the learning goal faster if the underlying distribution is “simple”. Informally speaking, we say a PAC-learner L is “smart” if, for a “vast majority” of domain distributions D, L does not require significantly more examples to reach the “learning goal” than the best learner whose strategy is specialized to D. In this paper, focusing on sample complexity and ignoring computational issues, we show that smart learners do exist. This implies (at least from an information-theoretical perspective) that full prior knowledge of the domain distribution (or access to a huge collection of unlabeled examples) does (for a vast majority of domain distributions) not significantly reduce the number of labeled examples required to achieve the learning goal.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号