首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
A well-known result by Stein (1956) shows that in particular situations, biased estimators can yield better parameter estimates than their generally preferred unbiased counterparts. This letter follows the same spirit, as we will stabilize the unbiased generalization error estimates by regularization and finally obtain more robust model selection criteria for learning. We trade a small bias against a larger variance reduction, which has the beneficial effect of being more precise on a single training set. We focus on the subspace information criterion (SIC), which is an unbiased estimator of the expected generalization error measured by the reproducing kernel Hilbert space norm. SIC can be applied to the kernel regression, and it was shown in earlier experiments that a small regularization of SIC has a stabilization effect. However, it remained open how to appropriately determine the degree of regularization in SIC. In this article, we derive an unbiased estimator of the expected squared error, between SIC and the expected generalization error and propose determining the degree of regularization of SIC such that the estimator of the expected squared error is minimized. Computer simulations with artificial and real data sets illustrate that the proposed method works effectively for improving the precision of SIC, especially in the high-noise-level cases. We furthermore compare the proposed method to the original SIC, the cross-validation, and an empirical Bayesian method in ridge parameter selection, with good results.  相似文献   

2.
3.
Generalization and selection of examples in feedforward neural networks   总被引:1,自引:0,他引:1  
Franco L  Cannas SA 《Neural computation》2000,12(10):2405-2426
In this work, we study how the selection of examples affects the learning procedure in a boolean neural network and its relationship with the complexity of the function under study and its architecture. We analyze the generalization capacity for different target functions with particular architectures through an analytical calculation of the minimum number of examples needed to obtain full generalization (i.e., zero generalization error). The analysis of the training sets associated with such parameter leads us to propose a general architecture-independent criterion for selection of training examples. The criterion was checked through numerical simulations for various particular target functions with particular architectures, as well as for random target functions in a nonoverlapping receptive field perceptron. In all cases, the selection sampling criterion lead to an improvement in the generalization capacity compared with a pure random sampling. We also show that for the parity problem, one of the most used problems for testing learning algorithms, only the use of the whole set of examples ensures global learning in a depth two architecture. We show that this difficulty can be overcome by considering a tree-structured network of depth 2log2(N)-1.  相似文献   

4.
Selecting concise training sets from clean data   总被引:3,自引:0,他引:3  
The authors derive a method for selecting exemplars for training a multilayer feedforward network architecture to estimate an unknown (deterministic) mapping from clean data, i.e., data measured either without error or with negligible error. The objective is to minimize the data requirement of learning. The authors choose a criterion for selecting training examples that works well in conjunction with the criterion used for learning, here, least squares. They proceed sequentially, selecting an example that, when added to the previous set of training examples and learned, maximizes the decrement of network squared error over the input space. When dealing with clean data and deterministic relationships, concise training sets that minimize the integrated squared bias (ISB) are desired. The ISB is used to derive a selection criterion for evaluating individual training examples, the DISB, that is maximized to select new exemplars. They conclude with graphical illustrations of the method, and demonstrate its use during network training. Experimental results indicate that training upon exemplars selected in this fashion can save computation in general purpose use as well.  相似文献   

5.
Hagiwara K 《Neural computation》2002,14(8):1979-2002
In considering a statistical model selection of neural networks and radial basis functions under an overrealizable case, the problem of unidentifiability emerges. Because the model selection criterion is an unbiased estimator of the generalization error based on the training error, this article analyzes the expected training error and the expected generalization error of neural networks and radial basis functions in overrealizable cases and clarifies the difference from regular models, for which identifiability holds. As a special case of an overrealizable scenario, we assumed a gaussian noise sequence as training data. In the least-squares estimation under this assumption, we first formulated the problem, in which the calculation of the expected errors of unidentifiable networks is reduced to the calculation of the expectation of the supremum of the chi2 process. Under this formulation, we gave an upper bound of the expected training error and a lower bound of the expected generalization error, where the generalization is measured at a set of training inputs. Furthermore, we gave stochastic bounds on the training error and the generalization error. The obtained upper bound of the expected training error is smaller than in regular models, and the lower bound of the expected generalization error is larger than in regular models. The result tells us that the degree of overfitting in neural networks and radial basis functions is higher than in regular models. Correspondingly, it also tells us that the generalization capability is worse than in the case of regular models. The article may be enough to show a difference between neural networks and regular models in the context of the least-squares estimation in a simple situation. This is a first step in constructing a model selection criterion in an overrealizable case. Further important problems in this direction are also included in this article.  相似文献   

6.
Based on quantale-enriched category, we consider algebras with compatible quantale-enriched structures, which can be viewed as fuzzification of ordered algebraic structures. We mainly study groupoids and semigroups with compatible quantale-enriched structures from this viewpoint. Some basic concepts such as ideals, homomorphisms, residuated quantale-enriched groupoids are developed and some examples of them are given. Our approach gives a complement to the approach initiated by Rosenfeld to study fuzzy abstract algebra, and these two approaches are combined in the present paper to study fuzzy aspects of abstract algebra structures.  相似文献   

7.
Kernel selection is one of the key issues both in recent research and application of kernel methods. This is usually done by minimizing either an estimate of generalization error or some other related performance measure. Use of notions of stability to estimate the generalization error has attracted much attention in recent years. Unfortunately, the existing notions of stability, proposed to derive the theoretical generalization error bounds, are difficult to be used for kernel selection in practice. It is well known that the kernel matrix contains most of the information needed by kernel methods, and the eigenvalues play an important role in the kernel matrix. Therefore, we aim at introducing a new notion of stability, called the spectral perturbation stability, to study the kernel selection problem. This proposed stability quantifies the spectral perturbation of the kernel matrix with respect to the changes in the training set. We establish the connection between the spectral perturbation stability and the generalization error. By minimizing the derived generalization error bound, we propose a new kernel selection criterion that can guarantee good generalization properties. In our criterion, the perturbation of the eigenvalues of the kernel matrix is efficiently computed by solving the derivative of a newly defined generalized kernel matrix. Both theoretical analysis and experimental results demonstrate that our criterion is sound and effective.  相似文献   

8.
The generalization error bounds found by current error models using the number of effective parameters of a classifier and the number of training samples are usually very loose. These bounds are intended for the entire input space. However, support vector machine (SVM), radial basis function neural network (RBFNN), and multilayer perceptron neural network (MLPNN) are local learning machines for solving problems and treat unseen samples near the training samples to be more important. In this paper, we propose a localized generalization error model which bounds from above the generalization error within a neighborhood of the training samples using stochastic sensitivity measure. It is then used to develop an architecture selection technique for a classifier with maximal coverage of unseen samples by specifying a generalization error threshold. Experiments using 17 University of California at Irvine (UCI) data sets show that, in comparison with cross validation (CV), sequential learning, and two other ad hoc methods, our technique consistently yields the best testing classification accuracy with fewer hidden neurons and less training time.  相似文献   

9.
数据分块数的选择是并行/分布式机器学习模型选择的基本问题之一,直接影响着机器学习算法的泛化性和运行效率。现有并行/分布式机器学习方法往往根据经验或处理器个数来选择数据分块数,没有明确的数据分块数选择准则。提出一个并行效率敏感的并行/分布式机器学习数据分块数选择准则,该准则可在保证并行/分布式机器学习模型测试精度的情况下,提高计算效率。首先推导并行/分布式机器学习模型的泛化误差与分块数目的关系。然后以此为基础,提出折衷泛化性与并行效率的数据分块数选择准则。最后,在ADMM框架下随机傅里叶特征空间中,给出采用该数据分块数选择准则的大规模支持向量机实现方案,并在高性能计算集群和大规模标准数据集上对所提出的数据分块数选择准则的有效性进行实验验证。  相似文献   

10.
The problem of model selection, or determination of the number of hidden units, can be approached statistically, by generalizing Akaike's information criterion (AIC) to be applicable to unfaithful (i.e., unrealizable) models with general loss criteria including regularization terms. The relation between the training error and the generalization error is studied in terms of the number of the training examples and the complexity of a network which reduces to the number of parameters in the ordinary statistical theory of AIC. This relation leads to a new network information criterion which is useful for selecting the optimal network model based on a given training set.  相似文献   

11.
Akaho S  Kappen HJ 《Neural computation》2000,12(6):1411-1427
Theories of learning and generalization hold that the generalization bias, defined as the difference between the training error and the generalization error, increases on average with the number of adaptive parameters. This article, however, shows that this general tendency is violated for a gaussian mixture model. For temperatures just below the first symmetry breaking point, the effective number of adaptive parameters increases and the generalization bias decreases. We compute the dependence of the neural information criterion on temperature around the symmetry breaking. Our results are confirmed by numerical cross-validation experiments.  相似文献   

12.
Gradient-based optimization of hyperparameters   总被引:3,自引:0,他引:3  
Bengio Y 《Neural computation》2000,12(8):1889-1900
Many machine learning algorithms can be formulated as the minimization of a training criterion that involves a hyperparameter. This hyperparameter is usually chosen by trial and error with a model selection criterion. In this article we present a methodology to optimize several hyperparameters, based on the computation of the gradient of a model selection criterion with respect to the hyperparameters. In the case of a quadratic training criterion, the gradient of the selection criterion with respect to the hyperparameters is efficiently computed by backpropagating through a Cholesky decomposition. In the more general case, we show that the implicit function theorem can be used to derive a formula for the hyperparameter gradient involving second derivatives of the training criterion.  相似文献   

13.
In classification tasks, active learning is often used to select out a set of informative examples from a big unlabeled dataset. The objective is to learn a classification pattern that can accurately predict labels of new examples by using the selection result which is expected to contain as few examples as possible. The selection of informative examples also reduces the manual effort for labeling, data complexity, and data redundancy, thus improves learning efficiency. In this paper, a new active learning strategy with pool-based settings, called inconsistency-based active learning, is proposed. This strategy is built up under the guidance of two classical works: (1) the learning philosophy of query-by-committee (QBC) algorithm; and (2) the structure of the traditional concept learning model: from-general-to-specific (GS) ordering. By constructing two extreme hypotheses of the current version space, the strategy evaluates unlabeled examples by a new sample selection criterion as inconsistency value, and the whole learning process could be implemented without any additional knowledge. Besides, since active learning is favorably applied to support vector machine (SVM) and its related applications, the strategy is further restricted to a specific algorithm called inconsistency-based active learning for SVM (I-ALSVM). By building up a GS structure, the sample selection process in our strategy is formed by searching through the initial version space. We compare the proposed I-ALSVM with several other pool-based methods for SVM on selected datasets. The experimental result shows that, in terms of generalization capability, our model exhibits good feasibility and competitiveness.  相似文献   

14.
Concerns the effect of noise on the performance of feedforward neural nets. We introduce and analyze various methods of injecting synaptic noise into dynamically driven recurrent nets during training. Theoretical results show that applying a controlled amount of noise during training may improve convergence and generalization performance. We analyze the effects of various noise parameters and predict that best overall performance can be achieved by injecting additive noise at each time step. Noise contributes a second-order gradient term to the error function which can be viewed as an anticipatory agent to aid convergence. This term appears to find promising regions of weight space in the beginning stages of training when the training error is large and should improve convergence on error surfaces with local minima. The first-order term is a regularization term that can improve generalization. Specifically, it can encourage internal representations where the state nodes operate in the saturated regions of the sigmoid discriminant function. While this effect can improve performance on automata inference problems with binary inputs and target outputs, it is unclear what effect it will have on other types of problems. To substantiate these predictions, we present simulations on learning the dual parity grammar from temporal strings for all noise models, and present simulations on learning a randomly generated six-state grammar using the predicted best noise model.  相似文献   

15.
Nonquadratic regularizers, in particular the l(1) norm regularizer can yield sparse solutions that generalize well. In this work we propose the generalized subspace information criterion (GSIC) that allows to predict the generalization error for this useful family of regularizers. We show that under some technical assumptions GSIC is an asymptotically unbiased estimator of the generalization error. GSIC is demonstrated to have a good performance in experiments with the l(1) norm regularizer as we compare with the network information criterion (NIC) and cross- validation in relatively large sample cases. However in the small sample case, GSIC tends to fail to capture the optimal model due to its large variance. Therefore, also a biased version of GSIC is introduced,which achieves reliable model selection in the relevant and challenging scenario of high-dimensional data and few samples.  相似文献   

16.
《国际计算机数学杂志》2012,89(7):1471-1483
This paper studies the regularized learning algorithm associated with the least-square loss and reproducing kernel Hilbert space. The target is the error analysis for the regression problem in learning theory. The upper and lower bounds of error are simultaneously estimated, which yield the optimal learning rate. The upper bound depends on the covering number and the approximation property of the reproducing kernel Hilbert space. The lower bound lies on the entropy number of the set that includes the regression function. Also, the rate is independent of the choice of the index q of the regular term.  相似文献   

17.
A new approach for estimating classification errors is presented. In the model, there are two types of classification error: empirical and generalization error. The first is the error observed over the training samples, and the second is the discrepancy between the error probability and empirical error. In this research, the Vapnik and Chervonenkis dimension (VCdim) is used as a measure for classifier complexity. Based on this complexity measure, an estimate for generalization error is developed. An optimal classifier design criterion (the generalized minimum empirical error criterion (GMEE)) is used. The GMEE criterion consists of two terms: the empirical and the estimate of generalization error. As an application, the criterion is used to design the optimal neural network classifier. A corollary to the Γ optimality of neural-network-based classifiers is proven. Thus, the approach provides a theoretic foundation for the connectionist approach to optimal classifier design. Experimental results to validate this approach  相似文献   

18.
We present ELEM2, a machine learning system that induces classification rules from a set of data based on a heuristic search over a hypothesis space. ELEM2 is distinguished from other rule induction systems in three aspects. First, it uses a new heuristtic function to guide the heuristic search. The function reflects the degree of relevance of an attribute-value pair to a target concept and leads to selection of the most relevant pairs for formulating rules. Second, ELEM2 handles inconsistent training examples by defining an unlearnable region of a concept based on the probability distribution of that concept in the training data. The unlearnable region is used as a stopping criterion for the concept learning process, which resolves conflicts without removing inconsistent examples. Third, ELEM2 employs a new rule quality measure in its post-pruning process to prevent rules from overfitting the data. The rule quality formula measures the extent to which a rule can discriminate between the positive and negative examples of a class. We describe features of ELEM2, its rule induction algorithm and its classification procedure. We report experimental results that compare ELEM2 with C4.5 and CN2 on a number of datasets.  相似文献   

19.
This paper presents a method to identify the structure of generalized adaptive neuro-fuzzy inference systems (GANFISs). The structure of GANFIS consists of a number of generalized radial basis function (GRBF) units. The radial basis functions are irregularly distributed in the form of hyper-patches in the input-output space. The minimum number of GRBF units is selected based on a heuristic using the fuzzy curve. For structure identification, a new criterion called structure identification criterion (SIC) is proposed. SIC deals with a trade off between performance and computational complexity of the GANFIS model. The computational complexity of gradient descent learning is formulated based on simulation study. Three methods of initialization of GANFIS, viz., fuzzy curve, fuzzy C-means in x/spl times/y space and modified mountain clustering have been compared in terms of cluster validity measure, Akaike's information criterion (AIC) and the proposed SIC.  相似文献   

20.
Training a classifier with good generalization capability is a major issue for pattern classification problems. A novel training objective function for Radial Basis Function (RBF) network using a localized generalization error model (L-GEM) is proposed in this paper. The localized generalization error model provides a generalization error bound for unseen samples located within a neighborhood that contains all training samples. The assumption of the same width for all dimensions of a hidden neuron in L-GEM is relaxed in this work. The parameters of RBF network are selected via minimization of the proposed objective function to minimize its localized generalization error bound. The characteristics of the proposed objective function are compared with those for regularization methods. For weight selection, RBF networks trained by minimizing the proposed objective function consistently outperform RBF networks trained by minimizing the training error, Tikhonov Regularization, Weight Decay or Locality Regularization. The proposed objective function is also applied to select center, width and weight in RBF network simultaneously. RBF networks trained by minimizing the proposed objective function yield better testing accuracies when compared to those that minimizes training error only.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号