首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 18 毫秒
1.
Estimation of predictive accuracy in survival analysis using R and S-PLUS   总被引:1,自引:0,他引:1  
When the purpose of a survival regression model is to predict future outcomes, the predictive accuracy of the model needs to be evaluated before practical application. Various measures of predictive accuracy have been proposed for survival data, none of which has been adopted as a standard, and their inclusion in statistical software is disregarded. We developed the surev library for R and S-PLUS, which includes functions for evaluating the predictive accuracy measures proposed by Schemper and Henderson. The library evaluates the predictive accuracy of parametric regression models and of Cox models. The predictive accuracy of the Cox model can be obtained also when time-dependent covariates are included because of non-proportional hazards or when using Bayesian model averaging. The use of the library is illustrated with examples based on a real data set.  相似文献   

2.
The performance of trauma departments is widely audited by applying predictive models that assess probability of survival, and examining the rate of unexpected survivals and deaths. Although the TRISS methodology, a logistic regression modelling technique, is still the de facto standard, it is known that neural network models perform better. A key issue when applying neural network models is the selection of input variables. This paper proposes a novel form of sensitivity analysis, which is simpler to apply than existing techniques, and can be used for both numeric and nominal input variables. The technique is applied to the audit survival problem, and used to analyse the TRISS variables. The conclusions discuss the implications for the design of further improved scoring schemes and predictive models.  相似文献   

3.
According to the American Cancer Society report (1999), cancer surpasses heart disease as the leading cause of death in the United States of America (USA) for people of age less than 85. Thus, medical research in cancer is an important public health interest. Understanding how medical improvements are affecting cancer incidence, mortality and survival is critical for effective cancer control. In this paper, we study the cancer survival trend on the population level cancer data. In particular, we develop a parametric Bayesian joinpoint regression model based on a Poisson distribution for the relative survival. To avoid identifying the cause of death, we only conduct analysis based on the relative survival. The method is further extended to the semiparametric Bayesian joinpoint regression models wherein the parametric distributional assumptions of the joinpoint regression models are relaxed by modeling the distribution of regression slopes using Dirichlet process mixtures. We also consider the effect of adding covariates of interest in the joinpoint model. Three model selection criteria, namely, the conditional predictive ordinate (CPO), the expected predictive deviance (EPD), and the deviance information criteria (DIC), are used to select the number of joinpoints. We analyze the grouped survival data for distant testicular cancer from the Surveillance, Epidemiology, and End Results (SEER) Program using these Bayesian models.  相似文献   

4.
Parameter estimation for agent-based and individual-based models (ABMs/IBMs) is often performed by manual tuning and model uncertainty assessment is often ignored. Bayesian inference can jointly address these issues. However, due to high computational requirements of these models and technical difficulties in applying Bayesian inference to stochastic models, the exploration of its application to ABMs/IBMs has just started. We demonstrate the feasibility of Bayesian inference for ABMs/IBMs with a Particle Markov Chain Monte Carlo (PMCMC) algorithm developed for state-space models. The algorithm profits from the model's hidden Markov structure by jointly estimating system states and the marginal likelihood of the parameters using time-series observations. The PMCMC algorithm performed well when tested on a simple predator-prey IBM using artificial observation data. Hence, it offers the possibility for Bayesian inference for ABMs/IBMs. This can yield additional insights into model behaviour and uncertainty and extend the usefulness of ABMs/IBMs in ecological and environmental research.  相似文献   

5.
Constructing an accurate effort prediction model is a challenge in Software Engineering. This paper presents three Bayesian statistical software effort prediction models for database-oriented software systems, which are developed using a specific 4GL toolsuite. The models consist of specification-based software size metrics and development team's productivity metric. The models are constructed based on the subjective knowledge of human expert and calibrated using empirical data collected from 17 software systems developed in the target environment. The models' predictive accuracy is evaluated using subsets of the same data, which were not used for the models' calibration. The results show that the models have achieved very good predictive accuracy in terms of MMRE and pred measures. Hence, it is confirmed that the Bayesian statistical models can predict effort successfully in the target environment. In comparison with commonly used multiple linear regression models, the Bayesian statistical models'predictive accuracy is equivalent in general. However, when the number of software systems used for the models' calibration becomes smaller than five, the predictive accuracy of the best Bayesian statistical models are significantly better than the multiple linear regression model. This result suggests that the Bayesian statistical models would be a better choice when software organizations/practitioners do not posses sufficient empirical data for the models' calibration. The authors expect these findings to encourage more researchers to investigate the use of Bayesian statistical models for predicting software effort.  相似文献   

6.
In order to select the best predictive neural-network architecture in a set of several candidate networks, we propose a general Bayesian nonlinear regression model comparison procedure, based on the maximization of an expected utility criterion. This criterion selects the model under which the training set achieves the highest level of internal consistency, through the predictive probability distribution of each model. The density of this distribution is computed as the model posterior predictive density and is asymptotically approximated from the assumed Gaussian likelihood of the data set and the related conjugate prior density of the parameters. The use of such a conjugate prior allows the analytic calculation of the parameter posterior and predictive posterior densities, in an empirical Bayes-like approach. This Bayesian selection procedure allows us to compare general nonlinear regression models and in particular feedforward neural networks, in addition to embedded models as usual with asymptotic comparison tests.  相似文献   

7.
Intrigued by some recent results on impulse response estimation by kernel and nonparametric techniques, we revisit the old problem of transfer function estimation from input–output measurements. We formulate a classical regularization approach, focused on finite impulse response (FIR) models, and find that regularization is necessary to cope with the high variance problem. This basic, regularized least squares approach is then a focal point for interpreting other techniques, like Bayesian inference and Gaussian process regression. The main issue is how to determine a suitable regularization matrix (Bayesian prior or kernel). Several regularization matrices are provided and numerically evaluated on a data bank of test systems and data sets. Our findings based on the data bank are as follows. The classical regularization approach with carefully chosen regularization matrices shows slightly better accuracy and clearly better robustness in estimating the impulse response than the standard approach–the prediction error method/maximum likelihood (PEM/ML) approach. If the goal is to estimate a model of given order as well as possible, a low order model is often better estimated by the PEM/ML approach, and a higher order model is often better estimated by model reduction on a high order regularized FIR model estimated with careful regularization. Moreover, an optimal regularization matrix that minimizes the mean square error matrix is derived and studied. The importance of this result lies in that it gives the theoretical upper bound on the accuracy that can be achieved for this classical regularization approach.  相似文献   

8.
在概率图模型框架下提出了一种将回归分析和聚类分析相结合的贝叶斯点集匹配方法,其中,回归分析用来估计两个点集之间的映射函数,而聚类分析用来建立两个点集中点与点之间的对应关系.本文将点集匹配问题表示为一种多层的概率有向图,并提出了一种由粗到精的变分逼近算法来估计点集匹配的不确定性;此外,还利用高斯混合模型估计映射函数回归中的异方差噪声和场景点密度估计中离群点的分布;同时,引入转移变量建立起模型点集与场景点集之间的关系,并与离群点混合模型共同对场景点的分布进行估计.实验结果表明,该方法与其他点集匹配算法相比,在鲁棒性和匹配精度方面均达到了较好的效果.  相似文献   

9.
Jiang W 《Neural computation》2006,18(11):2762-2776
Modern data mining and bioinformatics have presented an important playground for statistical learning techniques, where the number of input variables is possibly much larger than the sample size of the training data. In supervised learning, logistic regression or probit regression can be used to model a binary output and form perceptron classification rules based on Bayesian inference. We use a prior to select a limited number of candidate variables to enter the model, applying a popular method with selection indicators. We show that this approach can induce posterior estimates of the regression functions that are consistently estimating the truth, if the true regression model is sparse in the sense that the aggregated size of the regression coefficients are bounded. The estimated regression functions therefore can also produce consistent classifiers that are asymptotically optimal for predicting future binary outputs. These provide theoretical justifications for some recent empirical successes in microarray data analysis.  相似文献   

10.
基于稀疏贝叶斯分类器的汽车车型识别   总被引:2,自引:0,他引:2  
稀疏贝叶斯方法在处理分类问题上具有良好的推广性,并且使用较少的核函数,介绍了一个实时的车型识别系统.它以每点色彩信息的高斯混合模型来实现对视频图像的背景估计,从而实现对汽车的检测;利用稀疏贝叶斯分类器对检测到的汽车进行车型分类,实验结果表明稀疏贝叶斯分类器不仅具有支持向量机的性能,而且比SVM使用更少的核函数.实验取得了较好的分类效果.  相似文献   

11.
12.
Tapani  Matti 《Neurocomputing》2009,72(16-18):3704
This paper studies the identification and model predictive control in nonlinear hidden state-space models. Nonlinearities are modelled with neural networks and system identification is done with variational Bayesian learning. In addition to the robustness of control, the stochastic approach allows for various control schemes, including combinations of direct and indirect controls, as well as using probabilistic inference for control. We study the noise-robustness, speed, and accuracy of three different control schemes as well as the effect of changing horizon lengths and initialisation methods using a simulated cart–pole system. The simulations indicate that the proposed method is able to find a representation of the system state that makes control easier especially under high noise.  相似文献   

13.
Abstract: Although the use of predictive models in rock engineering and engineering geology is an important issue, some simple and multivariate regression techniques traditionally employed in these areas have recently been challenged by the use of fuzzy inference systems and artificial neural networks. The purpose of this study was to construct some predictive models to estimate the uniaxial compressive strength of some clay-bearing rocks, depending on examination of their slake durability indices and clay contents. For this purpose, the simple and nonlinear multivariable regression techniques and the Mamdani fuzzy algorithm are compared in terms of their accuracy. To increase the accuracy of the Mamdani fuzzy inference system, the weighted if–then rules are extracted. To compare the predictive performances of the models, the statistical performance indices (root mean square error and variance account for) are calculated and the results are discussed. The indices reveal that the fuzzy inference system has a slightly higher prediction capacity than the regression models. The basic reason for the higher performance of the fuzzy inference system is the flexibility of the fuzzy approach.  相似文献   

14.
A probabilistic model for predicting software development effort   总被引:2,自引:0,他引:2  
Recently, Bayesian probabilistic models have been used for predicting software development effort. One of the reasons for the interest in the use of Bayesian probabilistic models, when compared to traditional point forecast estimation models, is that Bayesian models provide tools for risk estimation and allow decision-makers to combine historical data with subjective expert estimates. In this paper, we use a Bayesian network model and illustrate how a belief updating procedure can be used to incorporate decision-making risks. We develop a causal model from the literature and, using a data set of 33 real-world software projects, we illustrate how decision-making risks can be incorporated in the Bayesian networks. We compare the predictive performance of the Bayesian model with popular nonparametric neural-network and regression tree forecasting models and show that the Bayesian model is a competitive model for forecasting software development effort.  相似文献   

15.
A simple parametrization, built from the definition of cubic splines, is shown to facilitate the implementation and interpretation of penalized spline models, whatever configuration of knots is used. The parametrization is termed value-first derivative parametrization. Inference is Bayesian and explores the natural link between quadratic penalties and Gaussian priors. However, a full Bayesian analysis seems feasible only for some penalty functionals. Alternatives include empirical Bayes inference methods involving model selection type criteria. The proposed methodology is illustrated by an application to survival analysis where the usual Cox model is extended to allow for time-varying regression coefficients.  相似文献   

16.
General Regression Neural Network (GRNN) possesses distinct function approximation capability and predictive power without the requirement of a prescribed functional form. However, its prediction accuracy relies on uniformly distributed input training data. If the input training data are non-uniformly distributed, considerable bias will occur. It is especially pronounced when the data points are sparsely distributed. Moreover, GRNN presumes a set of input variables to be included in the regression model so it remains an issue to determine the proper set of input variables. To address these issues, we propose the Bayesian Nonparametric General Regression with Adaptive Kernel Bandwidth (BNGR-AKB). First, it determines the bandwidth of the kernels adaptively so as to accommodate non-uniformly distributed input training data. Furthermore, it utilizes Bayesian inference to determine the input variables to be included in the regression model. To demonstrate the variable selection and regression capacity of the proposed method for non-uniformly distributed input training data, we present three simulated examples and one real data example using the ground motion records of Wenchuan earthquake.  相似文献   

17.
18.
Bayesian model averaging (BMA) is a statistical method for post-processing forecast ensembles of atmospheric variables, obtained from multiple runs of numerical weather prediction models, in order to create calibrated predictive probability density functions (PDFs). The BMA predictive PDF of the future weather quantity is the mixture of the individual PDFs corresponding to the ensemble members and the weights and model parameters are estimated using forecast ensembles and validating observations from a given training period. A BMA model for calibrating wind speed forecasts is introduced using truncated normal distributions as conditional PDFs and the method is applied to the ALADIN-HUNEPS ensemble of the Hungarian Meteorological Service and to the University of Washington Mesoscale Ensemble. Three parameter estimation methods are proposed and each of the corresponding models outperforms the traditional gamma BMA model both in calibration and in accuracy of predictions.  相似文献   

19.
We study prediction problems in which the conditional distribution of the output given the input varies as a function of task variables which, in our applications, represent space and time. In varying-coefficient models, the coefficients of this conditional are allowed to change smoothly in space and time; the strength of the correlations between neighboring points is determined by the data. This is achieved by placing a Gaussian process (GP) prior on the coefficients. Bayesian inference in varying-coefficient models is generally intractable. We show that with an isotropic GP prior, inference in varying-coefficient models resolves to standard inference for a GP that can be solved efficiently. MAP inference in this model resolves to multitask learning using task and instance kernels. We clarify the relationship between varying-coefficient models and the hierarchical Bayesian multitask model and show that inference for hierarchical Bayesian multitask models can be carried out efficiently using graph-Laplacian kernels. We explore the model empirically for the problems of predicting rent and real-estate prices, and predicting the ground motion during seismic events. We find that varying-coefficient models with GP priors excel at predicting rents and real-estate prices. The ground-motion model predicts seismic hazards in the State of California more accurately than the previous state of the art.  相似文献   

20.
We propose a model for a point-referenced spatially correlated ordered categorical response and methodology for inference. Models and methods for spatially correlated continuous response data are widespread, but models for spatially correlated categorical data, and especially ordered multi-category data, are less developed. Bayesian models and methodology have been proposed for the analysis of independent and clustered ordered categorical data, and also for binary and count point-referenced spatial data. We combine and extend these methods to describe a Bayesian model for point-referenced (as opposed to lattice) spatially correlated ordered categorical data. We include simulation results and show that our model offers superior predictive performance as compared to a non-spatial cumulative probit model and a more standard Bayesian generalized linear spatial model. We demonstrate the usefulness of our model in a real-world example to predict ordered categories describing stream health within the state of Maryland.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号