首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 984 毫秒
1.
In many applications of model selection there is a large number of explanatory variables and thus a large set of candidate models. Selecting one single model for further inference ignores model selection uncertainty. Often several models fit the data equally well. However, these models may differ in terms of the variables included and might lead to different predictions. To account for model selection uncertainty, model averaging procedures have been proposed. Recently, an extended two-step bootstrap model averaging approach has been proposed. The first step of this approach is a screening step. It aims to eliminate variables with negligible effect on the outcome. In the second step the remaining variables are considered in bootstrap model averaging. A large simulation study is performed to compare the MSE and coverage rate of models derived with bootstrap model averaging, the full model, backward elimination using Akaike and Bayes information criterion and the model with the highest selection probability in bootstrap samples. In a data example, these approaches are also compared with Bayesian model averaging. Finally, some recommendations for the development of predictive models are given.  相似文献   

2.
The recently proposed ‘weighted average least squares’ (WALS) estimator is a Bayesian combination of frequentist estimators. It has been shown that the WALS estimator possesses major advantages over standard Bayesian model averaging (BMA) estimators: the WALS estimator has bounded risk, allows a coherent treatment of ignorance and its computational effort is negligible. However, the sampling properties of the WALS estimator as compared to BMA estimators are heretofore unexamined. The WALS theory is further extended to allow for nonspherical disturbances, and the estimator is illustrated with data from the Hong Kong real estate market. Monte Carlo evidence shows that the WALS estimator performs significantly better than standard BMA and pretest alternatives.  相似文献   

3.
Model averaging or combining is often considered as an alternative to model selection. Frequentist Model Averaging (FMA) is considered extensively and strategies for the application of FMA methods in the presence of missing data based on two distinct approaches are presented. The first approach combines estimates from a set of appropriate models which are weighted by scores of a missing data adjusted criterion developed in the recent literature of model selection. The second approach averages over the estimates of a set of models with weights based on conventional model selection criteria but with the missing data replaced by imputed values prior to estimating the models. For this purpose three easy-to-use imputation methods that have been programmed in currently available statistical software are considered, and a simple recursive algorithm is further adapted to implement a generalized regression imputation in a way such that the missing values are predicted successively. The latter algorithm is found to be quite useful when one is confronted with two or more missing values simultaneously in a given row of observations. Focusing on a binary logistic regression model, the properties of the FMA estimators resulting from these strategies are explored by means of a Monte Carlo study. The results show that in many situations, averaging after imputation is preferred to averaging using weights that adjust for the missing data, and model average estimators often provide better estimates than those resulting from any single model. As an illustration, the proposed methods are applied to a dataset from a study of Duchenne muscular dystrophy detection.  相似文献   

4.
Model selection and model averaging are two important techniques to obtain practical and useful models in applied research. However, it is now well-known that many complex issues arise, especially in the context of model selection, when the stochastic nature of the selection process is ignored and estimates, standard errors, and confidence intervals are calculated as if the selected model was known a priori. While model averaging aims to incorporate the uncertainty associated with the model selection process by combining estimates over a set of models, there is still some debate over appropriate interpretation and confidence interval construction. These problems become even more complex in the presence of missing data and it is currently not entirely clear how to proceed. To deal with such situations, a framework for model selection and model averaging in the context of missing data is proposed. The focus lies on multiple imputation as a strategy to deal with the missingness: a consequent combination with model averaging aims to incorporate both the uncertainty associated with the model selection and with the imputation process. Furthermore, the performance of bootstrapping as a flexible extension to our framework is evaluated. Monte Carlo simulations are used to reveal the nature of the proposed estimators in the context of the linear regression model. The practical implications of our approach are illustrated by means of a recent survival study on sputum culture conversion in pulmonary tuberculosis.  相似文献   

5.
Wavelet shrinkage estimation has become an attractive and efficient method for signal denoising and compression. Despite the ample variety of methods which have been used in the wavelet denoising context, it has proven elusive to construct threshold estimators with good adaptive properties. Recently, empirical Bayes selection criteria have been proposed to derive adaptive shrinkage estimators. We consider the application of empirical Bayes variable selection criteria to each level of the wavelet transform to obtain adaptive threshold estimates. A set of level-dependent hyperparameters has to be estimated to derive nonlinear data-dependent thresholding rules. We propose the use of an evolutionary algorithm to calibrate the multilevel parameters, in order to automate parameter selection and enhance adaptivity of the threshold estimators. Comparative simulations on a set of standard model functions show good performance. Applications to data drawn from various fields of application are used to explore the practical performance of the proposed approach.  相似文献   

6.
仪器共享平台的发展提高了各高校的仪器设备的使用率. 但是在设备的使用过程中, 对设备的故障检测方面还没有得到改善. 针对上述问题, 本文收集了医用影像设备的相关数据, 采用PSO_RF的双向特征选择方法进行特征选择, 然后构建了基于LightGBM (Light Gradient Boosting Machine)的故障检测模型, 并将其应用于医用影像设备的故障检测中. 通过标准评价体系的建立及不同模型对故障诊断结果的对比, 相对于传统的机器学习算法, 该模型在故障检测的精确率、召回率、F1值等评价指标上有较好的表现, 对于加快仪器故障点的发现以及提高仪器利用率具有积极推进作用.  相似文献   

7.
Robust model selection procedures control the undue influence that outliers can have on the selection criteria by using both robust point estimators and a bounded loss function when measuring either the goodness-of-fit or the expected prediction error of each model. Furthermore, to avoid favoring over-fitting models, these two measures can be combined with a penalty term for the size of the model. The expected prediction error conditional on the observed data may be estimated using the bootstrap. However, bootstrapping robust estimators becomes extremely time consuming on moderate to high dimensional data sets. It is shown that the expected prediction error can be estimated using a very fast and robust bootstrap method, and that this approach yields a consistent model selection method that is computationally feasible even for a relatively large number of covariates. Moreover, as opposed to other bootstrap methods, this proposal avoids the numerical problems associated with the small bootstrap samples required to obtain consistent model selection criteria. The finite-sample performance of the fast and robust bootstrap model selection method is investigated through a simulation study while its feasibility and good performance on moderately large regression models are illustrated on several real data examples.  相似文献   

8.
针对学校等大型科研平台拥有的精密器材种类繁多、安置位置分散、操作流程特异、操作人员流动性大的现象,基于监测的思想设计了此仪器管理系统,旨在督促实验人员按稳妥的使用流程操作仪器,并为器材的更迭维修提供参考。系统分为终端、服务端两部分,终端通过检测电流来判断仪器状态,使用者使用需用有效IC卡辨别身份,终端相应给予声光提示,并且拍照留据。相应的状态情况可以通过网络传输到服务器,例如使用者有效卡号、报警照片等且客户端可以从服务端获取人员信息更新等资料。通过对本工程中心内部分仪器的监管测试表明,较好的提高了操作人员使用素质,提高了仪器管理水平。  相似文献   

9.
Chao Sima 《Pattern recognition》2006,39(9):1763-1780
A cross-validation error estimator is obtained by repeatedly leaving out some data points, deriving classifiers on the remaining points, computing errors for these classifiers on the left-out points, and then averaging these errors. The 0.632 bootstrap estimator is obtained by averaging the errors of classifiers designed from points drawn with replacement and then taking a convex combination of this “zero bootstrap” error with the resubstitution error for the designed classifier. This gives a convex combination of the low-biased resubstitution and the high-biased zero bootstrap. Another convex error estimator suggested in the literature is the unweighted average of resubstitution and cross-validation. This paper treats the following question: Given a feature-label distribution and classification rule, what is the optimal convex combination of two error estimators, i.e. what are the optimal weights for the convex combination. This problem is considered by finding the weights to minimize the MSE of a convex estimator. It also considers optimality under the constraint that the resulting estimator be unbiased. Owing to the large amount of results coming from the various feature-label models and error estimators, a portion of the results are presented herein and the main body of results appears on a companion website. In the tabulated results, each table treats the classification rules considered for the model, various Bayes errors, and various sample sizes. Each table includes the optimal weights, mean errors and standard deviations for the relevant error measures, and the MSE and MAE for the optimal convex estimator. Many observations can be made by considering the full set of experiments. Some general trends are outlined in the paper. The general conclusion is that optimizing the weights of a convex estimator can provide substantial improvement, depending on the classification rule, data model, sample size and component estimators. Optimal convex bootstrap estimators are applied to feature-set ranking to illustrate their potential advantage over non-optimized convex estimators.  相似文献   

10.
Ranking and selection (R&S) procedures have been considered an effective tool to solve simulation optimization problems with a discrete and finite decision space. Control variate (CV) is a variance reduction technique that requires no intervention in the way the simulation experiment is performed, and the least-squares regression package needed to implement CV is readily available. In this paper we propose two provably valid selection procedures that employ weighted CV estimators in different ways. Both procedures are guaranteed to select the best system with a prespecified confidence level. Empirical results and simple analyses are performed to compare the efficiency of our new procedures with some existing procedures.  相似文献   

11.
Spatial distribution of sponge species richness (SSR) and its relationship with environment are important for marine ecosystem management, but they are either unavailable or unknown. Hence we applied random forest (RF), generalised linear model (GLM) and their hybrid methods with geostatistical techniques to SSR data by addressing relevant issues with variable selection and model selection. It was found that: 1) of five variable selection methods, one is suitable for selecting optimal RF predictive models; 2) traditional model selection methods are unsuitable for identifying GLM predictive models and joint application of RF and AIC can select accuracy-improved models; 3) highly correlated predictors may improve RF predictive accuracy; 4) hybrid methods for RF can accurately predict count data; and 5) effects of model averaging are method-dependent. This study depicted the non-linear relationships of SSR and predictors, generated spatial distribution of SSR with high accuracy and revealed the association of high SSR with hard seabed features.  相似文献   

12.
The problem of consistent estimation in measurement error models in a linear relation with not necessarily normally distributed measurement errors is considered. Three possible estimators which are constructed as different combinations of the estimators arising from direct and inverse regression are considered. The efficiency properties of these three estimators are derived and the effect of non-normally distributed measurement errors is analyzed. A Monte-Carlo experiment is conducted to study the performance of these estimators in finite samples.  相似文献   

13.
The main objective of this work is to demonstrate that sharp a posteriori error estimators can be employed as appropriate monitor functions for moving mesh methods. We illustrate the main ideas by considering elliptic obstacle problems. Some important issues such as how to derive the sharp estimators and how to smooth the monitor functions are addressed. The numerical schemes are applied to a number of test problems in two dimensions. It is shown that the moving mesh methods with the proposed monitor functions can effectively capture the free boundaries of the elliptic obstacle problems and reduce the numerical errors arising from the free boundaries.  相似文献   

14.
Two bootstrap-corrected variants of the Akaike information criterion are proposed for the purpose of small-sample mixed model selection. These two variants are asymptotically equivalent, and provide asymptotically unbiased estimators of the expected Kullback-Leibler discrepancy between the true model and a fitted candidate model. The performance of the criteria is investigated in a simulation study where the random effects and the errors for the true model are generated from a Gaussian distribution. The parametric bootstrap is employed. The simulation results suggest that both criteria provide effective tools for choosing a mixed model with an appropriate mean and covariance structure. A theoretical asymptotic justification for the variants is presented in the Appendix.  相似文献   

15.
A common problem, arising in many different applied contexts, consists in estimating the number of exponentially damped sinusoids whose weighted sum best fits a finite set of noisy data and in estimating their parameters. Many different methods exist to this purpose. The best of them are based on approximate Maximum Likelihood estimators, assuming to know the number of damped sinusoids, which can then be estimated by an order selection procedure. As the problem can be severely ill posed, a stochastic perturbation method is proposed which provides better results than Maximum Likelihood based methods when the signal-to-noise ratio is low. The method depends on some hyperparameters which turn out to be essentially independent of the application. Therefore they can be fixed once and for all, giving rise to a black box method.  相似文献   

16.
Instrument recognition is one of the music information retrieval research topics. This task becomes very challenging if several instruments are played simultaneously because of their varying physical characteristics: inharmonic attack noise, energy development during attack–decay–sustain–release envelope or overtone distribution. In our framework, we treat instrument detection as a machine-learning task based on a large amount of preprocessed audio features with target to build classification models. Since classification algorithms are very sensitive to feature input and the optimal feature set differs from instrument to instrument, we propose to run a multi-objective feature selection procedure before building of classification models. Two objectives are considered for evaluation: classification mean-squared error and feature rate (smaller amount of features stands for reduced costs and decreased risk of overfitting). The analysis of the extensive experimental study confirms that application of an evolutionary multi-objective algorithm is a good choice to optimize feature selection for music instrument identification.  相似文献   

17.
Elia  Michel  Francesco  Amaury 《Neurocomputing》2009,72(16-18):3692
The problem of residual variance estimation consists of estimating the best possible generalization error obtainable by any model based on a finite sample of data. Even though it is a natural generalization of linear correlation, residual variance estimation in its general form has attracted relatively little attention in machine learning.In this paper, we examine four different residual variance estimators and analyze their properties both theoretically and experimentally to understand better their applicability in machine learning problems. The theoretical treatment differs from previous work by being based on a general formulation of the problem covering also heteroscedastic noise in contrary to previous work, which concentrates on homoscedastic and additive noise.In the second part of the paper, we demonstrate practical applications in input and model structure selection. The experimental results show that using residual variance estimators in these tasks gives good results often with a reduced computational complexity, while the nearest neighbor estimators are simple and easy to implement.  相似文献   

18.
This paper considers the estimation of the error variance after a pre-test of an interval restriction on the coefficients. We derive the exact finite sample risks of the interval restricted and pre-test estimators of the error variance, and examine the risk properties of the estimators to model misspecification through the omission of relevant regressors. It is found that the pre-test estimator performs better than the interval restricted estimator in terms of the risk properties in a large region of the parameter space; moreover, its risk performance is more robust with respect to the degrees of model misspecification. Furthermore, we propose a bootstrap procedure for estimating the risks of the estimators, to overcome the difficulty of computing the exact risks.  相似文献   

19.
北碚是国家最早布局的全国三大仪表基地之一,拥有全国现存最大的国有仪表企业--四联集团,拥有近3 0 0家中小微型仪表企业,拥有人才、技术、品种、配套等诸多优势。仪表是北碚的一张名片,是北碚的支柱产业之一,但十几年来,北碚仪表已经被我国仪表后发达地区不断地赶超。本文从北碚仪表的现状、历史、存在的问题和原因作了深入的分析,并提出了三点政策建议。  相似文献   

20.
ABSTRACT

The article proposes two novel and relatively simple unsupervised procedures for the selection of informative small subsets of spectral bands in hyperspectral images. To ensure the informativeness of the subsets, bands featuring higher entropy are included. The correlation of band images is restricted to avoid redundancy of the subsets. The entropy multiple correlation ratio procedure employs the entropy-correlation ratio for the selection of spectral bands. The entropy-based correlated band grouping (ECBG) procedure divides the spectrum into groups of bands featuring highly correlated images. The subsets obtained were characterized by the performance of classifiers using only data from included bands. The ECBG procedure provided better results than the alternatives if the number of selected bands was low. Another advantage of this procedure is the possibility of averaging the images obtained for spectral bands within the groups found. It is shown that classification results are significantly improved if such an averaging is used. In the data acquisition practice, it can be used for a purposeful merging of spectral bands in the configuration of hyperspectral imagers, which allows one to reduce the amount of data to be saved in real time and thus helps one to improve the achievable spatial resolution.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号