期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Model Selection and Error Estimation 总被引：6，自引：0，他引：6

Bartlett Peter L. Boucheron Stéphane Lugosi Gábor 《Machine Learning》2002,48(1-3):85-113

We study model selection strategies based on penalized empirical loss minimization. We point out a tight relationship between error estimation and data-based complexity penalization: any good error estimate may be converted into a data-based penalty function and the performance of the estimate is governed by the quality of the error estimate. We consider several penalty functions, involving error estimates on independent test data, empirical VC dimension, empirical VC entropy, and margin-based quantities. We also consider the maximal difference between the error on the first half of the training data and the second half, and the expected maximal discrepancy, a closely related capacity estimate that can be calculated by Monte Carlo integration. Maximal discrepancy penalty functions are appealing for pattern classification problems, since their computation is equivalent to empirical risk minimization over the training data with some labels flipped. 相似文献

2.

Trading variance reduction with unbiasedness: the regularized subspace information criterion for robust model selection in kernel regression

Sugiyama M Kawanabe M Müller KR 《Neural computation》2004,16(5):1077-1104

A well-known result by Stein (1956) shows that in particular situations, biased estimators can yield better parameter estimates than their generally preferred unbiased counterparts. This letter follows the same spirit, as we will stabilize the unbiased generalization error estimates by regularization and finally obtain more robust model selection criteria for learning. We trade a small bias against a larger variance reduction, which has the beneficial effect of being more precise on a single training set. We focus on the subspace information criterion (SIC), which is an unbiased estimator of the expected generalization error measured by the reproducing kernel Hilbert space norm. SIC can be applied to the kernel regression, and it was shown in earlier experiments that a small regularization of SIC has a stabilization effect. However, it remained open how to appropriately determine the degree of regularization in SIC. In this article, we derive an unbiased estimator of the expected squared error, between SIC and the expected generalization error and propose determining the degree of regularization of SIC such that the estimator of the expected squared error is minimized. Computer simulations with artificial and real data sets illustrate that the proposed method works effectively for improving the precision of SIC, especially in the high-noise-level cases. We furthermore compare the proposed method to the original SIC, the cross-validation, and an empirical Bayesian method in ridge parameter selection, with good results. 相似文献

3.

Robust model selection using fast and robust bootstrap

Matias Salibian-Barrera 《Computational statistics & data analysis》2008,52(12):5121-5135

Robust model selection procedures control the undue influence that outliers can have on the selection criteria by using both robust point estimators and a bounded loss function when measuring either the goodness-of-fit or the expected prediction error of each model. Furthermore, to avoid favoring over-fitting models, these two measures can be combined with a penalty term for the size of the model. The expected prediction error conditional on the observed data may be estimated using the bootstrap. However, bootstrapping robust estimators becomes extremely time consuming on moderate to high dimensional data sets. It is shown that the expected prediction error can be estimated using a very fast and robust bootstrap method, and that this approach yields a consistent model selection method that is computationally feasible even for a relatively large number of covariates. Moreover, as opposed to other bootstrap methods, this proposal avoids the numerical problems associated with the small bootstrap samples required to obtain consistent model selection criteria. The finite-sample performance of the fast and robust bootstrap model selection method is investigated through a simulation study while its feasibility and good performance on moderately large regression models are illustrated on several real data examples. 相似文献

4.

Model complexity control for regression using VC generalizationbounds 总被引：8，自引：0，他引：8

Cherkassky V. Xuhui Shao Mulier F.M. Vapnik V.N. 《Neural Networks, IEEE Transactions on》1999,10(5):1075-1089

It is well known that for a given sample size there exists a model of optimal complexity corresponding to the smallest prediction (generalization) error. Hence, any method for learning from finite samples needs to have some provisions for complexity control. Existing implementations of complexity control include penalization (or regularization), weight decay (in neural networks), and various greedy procedures (aka constructive, growing, or pruning methods). There are numerous proposals for determining optimal model complexity (aka model selection) based on various (asymptotic) analytic estimates of the prediction risk and on resampling approaches. Nonasymptotic bounds on the prediction risk based on Vapnik-Chervonenkis (VC)-theory have been proposed by Vapnik. This paper describes application of VC-bounds to regression problems with the usual squared loss. An empirical study is performed for settings where the VC-bounds can be rigorously applied, i.e., linear models and penalized linear models where the VC-dimension can be accurately estimated, and the empirical risk can be reliably minimized. Empirical comparisons between model selection using VC-bounds and classical methods are performed for various noise levels, sample size, target functions and types of approximating functions. Our results demonstrate the advantages of VC-based complexity control with finite samples. 相似文献

5.

On the problem in model selection of neural network regression in overrealizable scenario

Hagiwara K 《Neural computation》2002,14(8):1979-2002

In considering a statistical model selection of neural networks and radial basis functions under an overrealizable case, the problem of unidentifiability emerges. Because the model selection criterion is an unbiased estimator of the generalization error based on the training error, this article analyzes the expected training error and the expected generalization error of neural networks and radial basis functions in overrealizable cases and clarifies the difference from regular models, for which identifiability holds. As a special case of an overrealizable scenario, we assumed a gaussian noise sequence as training data. In the least-squares estimation under this assumption, we first formulated the problem, in which the calculation of the expected errors of unidentifiable networks is reduced to the calculation of the expectation of the supremum of the chi2 process. Under this formulation, we gave an upper bound of the expected training error and a lower bound of the expected generalization error, where the generalization is measured at a set of training inputs. Furthermore, we gave stochastic bounds on the training error and the generalization error. The obtained upper bound of the expected training error is smaller than in regular models, and the lower bound of the expected generalization error is larger than in regular models. The result tells us that the degree of overfitting in neural networks and radial basis functions is higher than in regular models. Correspondingly, it also tells us that the generalization capability is worse than in the case of regular models. The article may be enough to show a difference between neural networks and regular models in the context of the least-squares estimation in a simple situation. This is a first step in constructing a model selection criterion in an overrealizable case. Further important problems in this direction are also included in this article. 相似文献

6.

On the performance of the flexible maximum entropy distributions within partially adaptive estimation

Ilhan Usta Yeliz Mert Kantar 《Computational statistics & data analysis》2011,55(6):2172-2182

The partially adaptive estimation based on the assumed error distribution has emerged as a popular approach for estimating a regression model with non-normal errors. In this approach, if the assumed distribution is flexible enough to accommodate the shape of the true underlying error distribution, the efficiency of the partially adaptive estimator is expected to be close to the efficiency of the maximum likelihood estimator based on knowledge of the true error distribution. In this context, the maximum entropy distributions have attracted interest since such distributions have a very flexible functional form and nest most of the statistical distributions. Therefore, several flexible MaxEnt distributions under certain moment constraints are determined to use within the partially adaptive estimation procedure and their performances are evaluated relative to well-known estimators. The simulation results indicate that the determined partially adaptive estimators perform well for non-normal error distributions. In particular, some can be useful in dealing with small sample sizes. In addition, various linear regression applications with non-normal errors are provided. 相似文献

7.

Bayesian sigmoid shrinkage with improper variance priors and an application to wavelet denoising

Cajo J.F. ter Braak 《Computational statistics & data analysis》2006,51(2):1232-1242

The normal Bayesian linear model is extended by assigning a flat prior to the δth power of the variance components of the regression coefficients in order to improve prediction accuracy. In the case of orthonormal regressors, easy-to-compute analytic expressions are derived for the posterior distribution of the shrinkage and regression coefficients. The expected shrinkage is a sigmoid function of the squared value of the least-squares estimate divided by its standard error. This gives a small amount of shrinkage for large values and, provided δ is small, heavy shrinkage for small values. The limit behavior for both small and large values approaches that of the ideal coordinatewise shrinker in terms of the expected squared error of prediction, when δ is close to 0. In a simulation study of wavelet denoising, the proposed Bayesian shrinkage model yielded a lower mean squared error than soft thresholding (lasso), and was competitive with two recent wavelet shrinkage methods based on mixture prior distributions. 相似文献

8.

A Novel Feature Selection Methodology for Automated Inspection Systems

Garcia Hugo C. Rene Villalobos Jesus Pan Rong Runger George C. 《IEEE transactions on pattern analysis and machine intelligence》2009,31(7):1338-1344

This paper proposes a new feature selection methodology. The methodology is based on the stepwise variable selection procedure, but, instead of using the traditional discriminant metrics such as Wilks' Lambda, it uses an estimation of the misclassification error as the figure of merit to evaluate the introduction of new features. The expected misclassification error rate (MER) is obtained by using the densities of a constructed function of random variables, which is the stochastic representation of the conditional distribution of the quadratic discriminant function estimate. The application of the proposed methodology results in significant savings of computational time in the estimation of classification error over the traditional simulation and cross-validation methods. One of the main advantages of the proposed method is that it provides a direct estimation of the expected misclassification error at the time of feature selection, which provides an immediate assessment of the benefits of introducing an additional feature into an inspection/classification algorithm. 相似文献

9.

Least squares approach for initial data recovery in dynamic data-driven applications simulations

C. Douglas Y. Efendiev R. Ewing V. Ginting R. Lazarov M. Cole G. Jones 《Computing and Visualization in Science》2010,13(8):365-375

In this paper, we consider the initial data recovery and the solution update based on the local measured data that are acquired during simulations. Each time new data is obtained, the initial condition, which is a representation of the solution at a previous time step, is updated. The update is performed using the least squares approach. The objective function is set up based on both a measurement error as well as a penalization term that depends on the prior knowledge about the solution at previous time steps (or initial data). Various numerical examples are considered, where the penalization term is varied during the simulations. Numerical examples demonstrate that the predictions are more accurate if the initial data are updated during the simulations. 相似文献

10.

Adaptive methods for frictionless contact problems

《Computers & Structures》2001,79(22-25):2197-2208

In this paper a posteriori error indicators for frictionless contact problems are presented. In detail, error indicators relying on superconvergence properties and error estimators based on duality principles are investigated. Applications are to 3D solids under the hypothesis of non-linear elastic material behaviour associated with finite deformations. A penalization technique is applied to enforce multilateral boundary conditions due to contact. The approximate solution of the problem is obtained by using the finite element method. Several numerical results are reported to show the applicability of the adaptive algorithm to the considered problems. 相似文献