首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 593 毫秒
1.
Model selection and model averaging are two important techniques to obtain practical and useful models in applied research. However, it is now well-known that many complex issues arise, especially in the context of model selection, when the stochastic nature of the selection process is ignored and estimates, standard errors, and confidence intervals are calculated as if the selected model was known a priori. While model averaging aims to incorporate the uncertainty associated with the model selection process by combining estimates over a set of models, there is still some debate over appropriate interpretation and confidence interval construction. These problems become even more complex in the presence of missing data and it is currently not entirely clear how to proceed. To deal with such situations, a framework for model selection and model averaging in the context of missing data is proposed. The focus lies on multiple imputation as a strategy to deal with the missingness: a consequent combination with model averaging aims to incorporate both the uncertainty associated with the model selection and with the imputation process. Furthermore, the performance of bootstrapping as a flexible extension to our framework is evaluated. Monte Carlo simulations are used to reveal the nature of the proposed estimators in the context of the linear regression model. The practical implications of our approach are illustrated by means of a recent survival study on sputum culture conversion in pulmonary tuberculosis.  相似文献   

2.
Robust model selection procedures control the undue influence that outliers can have on the selection criteria by using both robust point estimators and a bounded loss function when measuring either the goodness-of-fit or the expected prediction error of each model. Furthermore, to avoid favoring over-fitting models, these two measures can be combined with a penalty term for the size of the model. The expected prediction error conditional on the observed data may be estimated using the bootstrap. However, bootstrapping robust estimators becomes extremely time consuming on moderate to high dimensional data sets. It is shown that the expected prediction error can be estimated using a very fast and robust bootstrap method, and that this approach yields a consistent model selection method that is computationally feasible even for a relatively large number of covariates. Moreover, as opposed to other bootstrap methods, this proposal avoids the numerical problems associated with the small bootstrap samples required to obtain consistent model selection criteria. The finite-sample performance of the fast and robust bootstrap model selection method is investigated through a simulation study while its feasibility and good performance on moderately large regression models are illustrated on several real data examples.  相似文献   

3.
Diffusion magnetic resonance imaging (dMRI) tractography has the unique ability to reconstruct major white matter tracts non-invasively and is, therefore, widely used in neurosurgical planning and neuroscience. In this work, we reduce two sources of uncertainty within the tractography pipeline. The first one is the model uncertainty that arises in crossing fibre tractography, from having to estimate the number of relevant fibre compartments in each voxel. We propose a mathematical framework to estimate model uncertainty, and we reduce this type of uncertainty with a model averaging approach that combines the fibre direction estimates from all candidate models, weighted by the posterior probability of the respective model. The second source of uncertainty is measurement noise. We use bootstrapping to estimate this data uncertainty, and consolidate the fibre direction estimates from all bootstraps into a consensus model. We observe that, in most voxels, a traditional model selection strategy selects different models across bootstraps. In this sense, the bootstrap consensus also reduces model uncertainty. Either approach significantly increases the accuracy of crossing fibre tractography in multiple subjects, and combining them provides an additional benefit. However, model averaging is much more efficient computationally.  相似文献   

4.
Model averaging (MA) estimators in the linear instrumental variables regression framework are considered. The obtaining of weights for averaging across individual estimates by direct smoothing of selection criteria arising from the estimation stage is proposed. This is particularly relevant in applications in which there is a large number of candidate instruments and, therefore, a considerable number of instrument sets arising from different combinations of the available instruments. The asymptotic properties of the estimator are derived under homoskedastic and heteroskedastic errors. A simple Monte Carlo study contrasts the performance of MA procedures with existing instrument selection procedures, showing that MA estimators compare very favorably in many relevant setups. Finally, this method is illustrated with an empirical application to returns to education.  相似文献   

5.
Considering the uncertainty of hidden neurons, choosing significant hidden nodes, called as model selection, has played an important role in the applications of extreme learning machines(ELMs). How to define and measure this uncertainty is a key issue of model selection for ELM. From the information geometry point of view, this paper presents a new model selection method of ELM for regression problems based on Riemannian metric. First, this paper proves theoretically that the uncertainty can be characterized by a form of Riemannian metric. As a result, a new uncertainty evaluation of ELM is proposed through averaging the Riemannian metric of all hidden neurons. Finally, the hidden nodes are added to the network one by one, and at each step, a multi-objective optimization algorithm is used to select optimal input weights by minimizing this uncertainty evaluation and the norm of output weight simultaneously in order to obtain better generalization performance. Experiments on five UCI regression data sets and cylindrical shell vibration data set are conducted, demonstrating that the proposed method can generally obtain lower generalization error than the original ELM, evolutionary ELM, ELM with model selection, and multi-dimensional support vector machine. Moreover, the proposed algorithm generally needs less hidden neurons and computational time than the traditional approaches, which is very favorable in engineering applications.  相似文献   

6.
Thomas Most 《Computers & Structures》2011,89(17-18):1664-1672
In this paper several methods for model assessment considering uncertainties are discussed. Sensitivity analysis is performed to quantify the influence of the individual model input parameters. In addition to the well-known analysis of a single model, a new procedure for quantifying the influence of the model choice on the uncertainty of the model prediction is proposed. Furthermore, a procedure is presented which can be used to estimate the model framework uncertainty and which enables the selection of the optimal model with the best compromise between model input and framework uncertainty. Finally Bayesian methods for model selection are extended for model assessment without measurements using model averaging as reference.  相似文献   

7.
In recent years, considerable research has been devoted to developing complex regression models that can deal simultaneously with nonlinear covariate effects and time trends, unit- or cluster specific heterogeneity, spatial heterogeneity and complex interactions between covariates of different types. Much less effort, however, has been devoted to model and variable selection. The paper develops a methodology for the simultaneous selection of variables and the degree of smoothness in regression models with a structured additive predictor. These models are quite general, containing additive (mixed) models, geoadditive models and varying coefficient models as special cases. This approach allows one to decide whether a particular covariate enters the model linearly or nonlinearly or is removed from the model. Moreover, it is possible to decide whether a spatial or cluster specific effect should be incorporated into the model to cope with spatial or cluster specific heterogeneity. Particular emphasis is also placed on selecting complex interactions between covariates and effects of different types. A new penalty for two-dimensional smoothing is proposed, that allows for ANOVA-type decompositions into main effects and an interaction effect without explicitly specifying the main effects. The penalty is an additive combination of other penalties. Fast algorithms and software are developed that allow one to even handle situations with many covariate effects and observations. The algorithms are related to backfitting and Markov chain Monte Carlo techniques, which divide the problem in a divide and conquer strategy into smaller pieces. Confidence intervals taking model uncertainty into account are based on the bootstrap in combination with MCMC techniques.  相似文献   

8.
We present an entropic component analysis for identifying key parameters or variables and the joint effects of various parameters that characterize complex systems. This approach identifies key parameters through solving the variable selection problem. It consists of two steps. First, a Bayesian approach is utilized to convert the variable selection problem into the model selection problem. Second, the model selection is achieved uniquely by evaluating the information difference of models by relative entropies of these models and a reference model. We study a geological sample classification problem, where a brine sample from Texas and Oklahoma oil field is considered, to illustrate and examine the proposed approach. The results are consistent with qualitative analysis of the lithology and quantitative discriminant function analysis. Furthermore, the proposed approach reveals the joint effects of the parameters, while it is unclear from the discriminant function analysis. The proposed approach could be thus promising to various geological data analysis.  相似文献   

9.
Uniform resampling is the easiest to apply and is a general recipe for all problems, but it may require a large replication size B. To save computational effort in uniform resampling, balanced bootstrap resampling is proposed to change the bootstrap resampling plan. This resampling plan is effective for approximating the center of the bootstrap distribution. Therefore, this paper applies it to neural model selection. Numerical experiments indicate that it is possible to considerably reduce the replication size B. Moreover, the efficiency of balanced bootstrap resampling is also discussed in this paper.  相似文献   

10.
In recent years, considerable research has been devoted to developing complex regression models that can deal simultaneously with nonlinear covariate effects and time trends, unit- or cluster specific heterogeneity, spatial heterogeneity and complex interactions between covariates of different types. Much less effort, however, has been devoted to model and variable selection. The paper develops a methodology for the simultaneous selection of variables and the degree of smoothness in regression models with a structured additive predictor. These models are quite general, containing additive (mixed) models, geoadditive models and varying coefficient models as special cases. This approach allows one to decide whether a particular covariate enters the model linearly or nonlinearly or is removed from the model. Moreover, it is possible to decide whether a spatial or cluster specific effect should be incorporated into the model to cope with spatial or cluster specific heterogeneity. Particular emphasis is also placed on selecting complex interactions between covariates and effects of different types. A new penalty for two-dimensional smoothing is proposed, that allows for ANOVA-type decompositions into main effects and an interaction effect without explicitly specifying the main effects. The penalty is an additive combination of other penalties. Fast algorithms and software are developed that allow one to even handle situations with many covariate effects and observations. The algorithms are related to backfitting and Markov chain Monte Carlo techniques, which divide the problem in a divide and conquer strategy into smaller pieces. Confidence intervals taking model uncertainty into account are based on the bootstrap in combination with MCMC techniques.  相似文献   

11.
In the areas of investment research and applications, feasible quantitative models include methodologies stemming from soft computing for prediction of financial time series, multi-objective optimization of investment return and risk reduction, as well as selection of investment instruments for portfolio management based on asset ranking using a variety of input variables and historical data, etc. Among all these, stock selection has long been identified as a challenging and important task. This line of research is highly contingent upon reliable stock ranking for successful portfolio construction. Recent advances in machine learning and data mining are leading to significant opportunities to solve these problems more effectively. In this study, we aim at developing a methodology for effective stock selection using support vector regression (SVR) as well as genetic algorithms (GAs). We first employ the SVR method to generate surrogates for actual stock returns that in turn serve to provide reliable rankings of stocks. Top-ranked stocks can thus be selected to form a portfolio. On top of this model, the GA is employed for the optimization of model parameters, and feature selection to acquire optimal subsets of input variables to the SVR model. We will show that the investment returns provided by our proposed methodology significantly outperform the benchmark. Based upon these promising results, we expect this hybrid GA-SVR methodology to advance the research in soft computing for finance and provide an effective solution to stock selection in practice.  相似文献   

12.
We consider regression models with a group structure in explanatory variables. This structure is commonly seen in practice, but it is only recently realized that taking the information into account in the modeling process may improve both the interpretability and accuracy of the model. In this paper, we study a new approach to group variable selection using random-effect models. Specific distributional assumptions on random effects pertaining to a given structure lead to a new class of penalties that include some existing penalties. We also develop an efficient computational algorithm. Numerical studies are provided to demonstrate better sensitivity and specificity properties without sacrificing the prediction accuracy. Finally, we present some real-data applications of the proposed approach.  相似文献   

13.
On-line sensor monitoring aims at detecting anomalies in sensors and reconstructing their correct signals during operation. The techniques used for signal reconstruction are commonly based on auto-associative regression models. In full scale implementations however, the number of sensors to be monitored is often too large to be handled effectively by a single reconstruction model. In this paper we propose to tackle the problem by resorting to a pool (ensemble) of reconstruction models, each one handling an individual group of signals. This approach involves two main technical steps: firstly, a procedure for constructing signal groups, and secondly a procedure for combining the outputs of the reconstruction models associated to the groups. For the signal grouping step, a wrapper optimization search is proposed to identify the optimal number of groups in the ensemble and the size of the groups. For the model output aggregation step, a simple arithmetic average is adopted. Ensemble accuracy and robustness is achieved by promoting diversity between the signal groups through the use of the Random Feature Selection Ensemble (RFSE) technique in combination with the Bootstrapping AGGregatING (BAGGING) technique for training data selection. The individual reconstruction models are based on Principal Components Analysis (PCA). The proposed approach has been applied to a real case study concerning 215 signals monitored at a Finnish nuclear pressurized water reactor. The results obtained have been compared with those achieved by an equivalent ensemble of models based on a grouping directly optimized by a Multi-Objective Genetic Algorithm (MOGA).  相似文献   

14.
Model averaging or combining is often considered as an alternative to model selection. Frequentist Model Averaging (FMA) is considered extensively and strategies for the application of FMA methods in the presence of missing data based on two distinct approaches are presented. The first approach combines estimates from a set of appropriate models which are weighted by scores of a missing data adjusted criterion developed in the recent literature of model selection. The second approach averages over the estimates of a set of models with weights based on conventional model selection criteria but with the missing data replaced by imputed values prior to estimating the models. For this purpose three easy-to-use imputation methods that have been programmed in currently available statistical software are considered, and a simple recursive algorithm is further adapted to implement a generalized regression imputation in a way such that the missing values are predicted successively. The latter algorithm is found to be quite useful when one is confronted with two or more missing values simultaneously in a given row of observations. Focusing on a binary logistic regression model, the properties of the FMA estimators resulting from these strategies are explored by means of a Monte Carlo study. The results show that in many situations, averaging after imputation is preferred to averaging using weights that adjust for the missing data, and model average estimators often provide better estimates than those resulting from any single model. As an illustration, the proposed methods are applied to a dataset from a study of Duchenne muscular dystrophy detection.  相似文献   

15.
ContextComputation Independent Model (CIM) as a business model describes the requirements and environment of a business system and instructs the designing and development; it is a key to influencing software success. Although many studies currently focus on model driven development (MDD); those researches, to a large extent, study the PIM-level and PSM-level model, and few have dealt with CIM-level modelling for case in which the requirements are unclear or incomplete.ObjectiveThis paper proposes a CIM-level modelling approach, which applies a stepwise refinement approach to modelling the CIM-level model starting from a high-level goal model to a lower-level business process model. A key advantage of our approach is the combination of the requirement model with the business model, which helps software engineers to define business models exactly for cases in which the requirements are unclear or incomplete.MethodThis paper, based on the model driven approach, proposes a set of models at the CIM-level and model transformations to connect these models. Accordingly, the formalisation approach of this paper involves formalising the goal model using the category theory and the scenario model and business process model using Petri nets.ResultsWe have defined a set of metamodels and transformation rules making it possible to obtain automatically a scenario model from the goal model and a business process model from the scenario model. At the same time, we have defined a mapping rule to formalise these models. Our proposed CIM modelling approach and formalisation approach are implemented with an MDA tool, and it has been empirically validated by a travel agency case study.ConclusionThis study shows how a CIM modelling approach helps to build a complete and consistent model at the CIM level for cases in which the requirements are unclear or incomplete in advance.  相似文献   

16.
A bootstrap aggregated model approach to the estimation of product quality in refineries with varying crudes is proposed in this paper. The varying crudes cause the relationship between process variables and product quality variables to change, which makes product quality estimation by soft-sensors a difficult problem. The essential idea in this paper is to build an inferential estimation model for each type of feed oil and use an on-line feed oil classifier to determine the feed oil type. Bootstrap aggregated neural networks are used in developing the on-line feed oil classifier and a bootstrap aggregated partial least square regression model is developed for each data group corresponding to each type of feed crude oil. The amount of training data in crude oil distillation is usually small and this brings difficulties for classification and estimation modelling. In order to enhance model reliability and robustness, bootstrap aggregated models are developed. The inferential estimation results of kerosene dry point on both simulated data and industrial data show that the proposed method can significantly improve the overall inferential estimation performance.  相似文献   

17.
In fitting regression models data analysts are often faced with many predictor variables which may influence the outcome. Several strategies for selection of variables to identify a subset of ‘important’ predictors are available for many years. A further issue to model building is how to deal with non-linearity in the relationship between outcome and a continuous predictor. Traditionally, for such predictors either a linear functional relationship or a step function after grouping is assumed. However, the assumption of linearity may be incorrect, leading to a misspecified final model. For multivariable model building a systematic approach to investigate possible non-linear functional relationships based on fractional polynomials and the combination with backward elimination was proposed recently. So far a program was only available in Stata, certainly preventing a more general application of this useful procedure. The approach will be introduced, advantages will be shown in two examples, a new approach to present FP functions will be illustrated and a macro in SAS will be shortly introduced. Differences to Stata and R programs are noted.  相似文献   

18.
Complex ecosystem models are often used as a tool for resource managers in the application of ecosystem based management. The uncertainty associated with these models is a major stumbling block in their acceptance as a management tool. Yet, conducting a rigorous uncertainty analysis of complex models is often not feasible. We present an alternative approach to assessing the impact of parameter uncertainty on the outcome of management scenarios on a lake ecosystem. We applied the single-model ensemble approach to the ecosystem model DYRESM–CAEDYM and Lake Kinneret, Israel. We introduced uncertainty to parameters and conducted an ensemble of simulations for three scenarios. Despite the large degree of uncertainty in parameter values the trends in ecosystem response were consistent with those observed based on calibrated parameter values. The variation in results allowed us to estimate the consequences of parameter uncertainty on lake resource management without the need for a comprehensive uncertainty analysis.  相似文献   

19.
A probabilistic construction of model validation   总被引:1,自引:0,他引:1  
We describe a procedure to assess the predictive accuracy of process models subject to approximation error and uncertainty. The proposed approach is a functional analysis-based probabilistic approach for which we represent random quantities using polynomial chaos expansions (PCEs). The approach permits the formulation of the uncertainty assessment in validation, a significant component of the process, as a problem of approximation theory. It has two essential parts. First, a statistical procedure is implemented to calibrate uncertain parameters of the candidate model from experimental or model-based measurements. Such a calibration technique employs PCEs to represent the inherent uncertainty of the model parameters. Based on the asymptotic behavior of the statistical parameter estimator, the associated PCE coefficients are then characterized as independent random quantities to represent epistemic uncertainty due to lack of information. Second, a simple hypothesis test is implemented to explore the validation of the computational model assumed for the physics of the problem. The above validation path is implemented for the case of dynamical system validation challenge exercise.  相似文献   

20.
Friedman  Nir  Koller  Daphne 《Machine Learning》2003,50(1-2):95-125
In many multivariate domains, we are interested in analyzing the dependency structure of the underlying distribution, e.g., whether two variables are in direct interaction. We can represent dependency structures using Bayesian network models. To analyze a given data set, Bayesian model selection attempts to find the most likely (MAP) model, and uses its structure to answer these questions. However, when the amount of available data is modest, there might be many models that have non-negligible posterior. Thus, we want compute the Bayesian posterior of a feature, i.e., the total posterior probability of all models that contain it. In this paper, we propose a new approach for this task. We first show how to efficiently compute a sum over the exponential number of networks that are consistent with a fixed order over network variables. This allows us to compute, for a given order, both the marginal probability of the data and the posterior of a feature. We then use this result as the basis for an algorithm that approximates the Bayesian posterior of a feature. Our approach uses a Markov Chain Monte Carlo (MCMC) method, but over orders rather than over network structures. The space of orders is smaller and more regular than the space of structures, and has much a smoother posterior landscape. We present empirical results on synthetic and real-life datasets that compare our approach to full model averaging (when possible), to MCMC over network structures, and to a non-Bayesian bootstrap approach.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号