首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Local influence diagnostics based on estimating equations as the role of a gradient vector derived from any fit function are developed for repeated measures regression analysis. Our proposal generalizes tools used in other studies ( [Cook, 1986] and [Cadigan and Farrell, 2002]), considering herein local influence diagnostics for a statistical model where estimation involves an estimating equation in which all observations are not necessarily independent of each other. Moreover, the measures of local influence are illustrated with some simulated data sets to assess influential observations. Applications using real data are presented.  相似文献   

2.
Estimation of predictive accuracy in survival analysis using R and S-PLUS   总被引:1,自引:0,他引:1  
When the purpose of a survival regression model is to predict future outcomes, the predictive accuracy of the model needs to be evaluated before practical application. Various measures of predictive accuracy have been proposed for survival data, none of which has been adopted as a standard, and their inclusion in statistical software is disregarded. We developed the surev library for R and S-PLUS, which includes functions for evaluating the predictive accuracy measures proposed by Schemper and Henderson. The library evaluates the predictive accuracy of parametric regression models and of Cox models. The predictive accuracy of the Cox model can be obtained also when time-dependent covariates are included because of non-proportional hazards or when using Bayesian model averaging. The use of the library is illustrated with examples based on a real data set.  相似文献   

3.
The performance of model based bootstrap methods for constructing point-wise confidence intervals around the survival function with interval censored data is investigated. It is shown that bootstrapping from the nonparametric maximum likelihood estimator of the survival function is inconsistent for the current status model. A model based smoothed bootstrap procedure is proposed and proved to be consistent. In fact, a general framework for proving the consistency of any model based bootstrap scheme in the current status model is established. In addition, simulation studies are conducted to illustrate the (in)-consistency of different bootstrap methods in mixed case interval censoring. The conclusions in the interval censoring model would extend more generally to estimators in regression models that exhibit non-standard rates of convergence.  相似文献   

4.
The aim of this paper is to derive diagnostic procedures based on case-deletion model for symmetrical nonlinear regression models, which complements Galea et al. (2005) that developed local influence diagnostics under some perturbation schemes. This class of models includes all symmetric continuous distributions for errors covering both light- and heavy-tailed distributions such as Student-t, logistic-I and -II, power exponential, generalized Student-t, generalized logistic and contaminated normal, among others. Thus, these models can be checked for robustness to outliers in the response variable and diagnostic methods may be a useful tool for an appropriate choice. First, an iterative process for the parameter estimation as well as some inferential results are presented. Besides, we present the results of a simulation study in which the characteristics of heavy-tailed models are evaluated in the presence of outliers. Then, we derive some diagnostic measures such as Cook distance, W-K statistic, one-step approach and likelihood displacement, generalizing results obtained for normal nonlinear regression models. Also, we present simulation studies that illustrate the behavior of diagnostic measures proposed. Finally, we consider two real data sets previously analyzed under normal nonlinear regression models. The diagnostic analysis indicates that a Student-t nonlinear regression model seems to fit the data better than the normal nonlinear regression model as well as other symmetrical nonlinear models in the sense of robustness against extreme observations.  相似文献   

5.
This paper proposes a regression model considering the modified Weibull distribution. This distribution can be used to model bathtub-shaped failure rate functions. Assuming censored data, we consider maximum likelihood and Jackknife estimators for the parameters of the model. We derive the appropriate matrices for assessing local influence on the parameter estimates under different perturbation schemes and we also present some ways to perform global influence. Besides, for different parameter settings, sample sizes and censoring percentages, various simulations are performed and the empirical distribution of the modified deviance residual is displayed and compared with the standard normal distribution. These studies suggest that the residual analysis usually performed in normal linear regression models can be straightforwardly extended for a martingale-type residual in log-modified Weibull regression models with censored data. Finally, we analyze a real data set under log-modified Weibull regression models. A diagnostic analysis and a model checking based on the modified deviance residual are performed to select appropriate models.  相似文献   

6.
Methods of regression diagnostics for functional regression models are developed which relate a functional response to predictor variables that can be multivariate vectors or random functions. For this purpose, a residual process is defined by subtracting the predicted from the observed response functions. This residual process is expanded into functional principal components (FPC), and the corresponding FPC scores are used as natural proxies for the residuals in functional regression models. For the case of a univariate covariate, a randomization test is proposed based on these scores to examine if the residual process depends on the covariate. If this is the case, it indicates lack of fit of the model. Graphical methods based on the FPC scores of observed and fitted functions can be used to complement more formal tests. The methods are illustrated with data from a recent study of Drosophila fruit flies regarding life-cycle gene expression trajectories as well as functional data from a dose-response experiment for Mediterranean fruit flies (Ceratitis capitata).  相似文献   

7.
According to the American Cancer Society report (1999), cancer surpasses heart disease as the leading cause of death in the United States of America (USA) for people of age less than 85. Thus, medical research in cancer is an important public health interest. Understanding how medical improvements are affecting cancer incidence, mortality and survival is critical for effective cancer control. In this paper, we study the cancer survival trend on the population level cancer data. In particular, we develop a parametric Bayesian joinpoint regression model based on a Poisson distribution for the relative survival. To avoid identifying the cause of death, we only conduct analysis based on the relative survival. The method is further extended to the semiparametric Bayesian joinpoint regression models wherein the parametric distributional assumptions of the joinpoint regression models are relaxed by modeling the distribution of regression slopes using Dirichlet process mixtures. We also consider the effect of adding covariates of interest in the joinpoint model. Three model selection criteria, namely, the conditional predictive ordinate (CPO), the expected predictive deviance (EPD), and the deviance information criteria (DIC), are used to select the number of joinpoints. We analyze the grouped survival data for distant testicular cancer from the Surveillance, Epidemiology, and End Results (SEER) Program using these Bayesian models.  相似文献   

8.
With parametric cure models, we can express survival parameters (e.g. cured fraction, location and scale parameters) as functions of covariates. These models can measure survival from a specific disease process, either by examining deaths due to the cause under study (cause-specific survival), or by comparing all deaths to those in a matched control population (relative survival). We present a binomial maximum likelihood algorithm to be used for actuarial data, where follow-up times are grouped into specific intervals. Our algorithm provides simultaneous maximum likelihood estimates for all the parameters of a cure model and can be used for cause-specific or relative survival analysis with a variety of survival distributions. Current software does not provide the flexibility of this unified approach.  相似文献   

9.
We consider the problem of selecting grouped variables in linear regression and generalized linear regression models, based on penalized likelihood. A number of penalty functions have been used for this purpose, including the smoothly clipped absolute deviation (SCAD) penalty and the minimax concave penalty (MCP). These penalty functions, in comparison to the popularly used Lasso, have attractive theoretical properties such as unbiasedness and selection consistency. Although the model fitting methods using these penalties are well developed for individual variable selection, the extension to grouped variable selection is not straightforward, and the fitting can be unstable due to the nonconvexity of the penalty functions. To this end, we propose the group coordinate descent (GCD) algorithms, which extend the regular coordinate descent algorithms. These GCD algorithms are efficient, in that the computation burden only increases linearly with the number of the covariate groups. We also show that using the GCD algorithm, the estimated parameters converge to a global minimum when the sample size is larger than the dimension of the covariates, and converge to a local minimum otherwise. In addition, we demonstrate the regions of the parameter space in which the objective function is locally convex, even though the penalty is nonconvex. In addition to group selection in the linear model, the GCD algorithms can also be extended to generalized linear regression. We present details of the extension using an example of logistic regression. The efficiency of the proposed algorithms are presented through simulation studies and a real data example, in which the MCP based and SCAD based GCD algorithms provide improved group selection results as compared to the group Lasso.  相似文献   

10.
In survival analysis applications, the failure rate function may frequently present a unimodal shape. In such case, the log-normal or log-logistic distributions are used. In this paper, we shall be concerned only with parametric forms, so a location-scale regression model based on the Burr XII distribution is proposed for modeling data with a unimodal failure rate function as an alternative to the log-logistic regression model. Assuming censored data, we consider a classic analysis, a Bayesian analysis and a jackknife estimator for the parameters of the proposed model. For different parameter settings, sample sizes and censoring percentages, various simulation studies are performed and compared to the performance of the log-logistic and log-Burr XII regression models. Besides, we use sensitivity analysis to detect influential or outlying observations, and residual analysis is used to check the assumptions in the model. Finally, we analyze a real data set under log-Burr XII regression models.  相似文献   

11.
Ordinal regression is a kind of regression analysis used for predicting an ordered response variable. In these problems, the patterns are labelled by a set of ranks with an ordering among the different categories. The most common type of ordinal regression model is the cumulative link model. The cumulative link model relates an unobserved continuous latent variable with a monotone link function. Logit and probit functions are examples of link functions used in cumulative link models. In this paper, a novel generalized link function based on a generalization of the logistic distribution is proposed. The generalized link function proposed is able to reproduce other different link functions by changing two real parameters: \(\alpha \) and \(\lambda \). The generalized link function has been included in a cumulative link model where the latent function is determined by a standard neural network in order to test the performance of the proposal. For this model, a reformulation of the tunable thresholds and distribution parameters was applied to convert the constrained optimization problem into an unconstrained optimization problem. Experimental results demonstrate that our proposed approach can achieve competitive generalization performance.  相似文献   

12.
基于梯度提升回归模型的生猪价格预测   总被引:1,自引:0,他引:1  
付莲莲  伍健 《计算机仿真》2020,37(1):347-350
研究生猪价格的准确预测问题,传统预测模型存在速度慢、陷入局部极小值、核函数的选择等问题,预测效果不佳。为此,首先筛选出生猪价格的显著因素,接着利用Python数据分析分别建立贝叶斯岭回归、普通线性回归、弹性网络和支持向量机模型,将这4个回归模型作为梯度提升回归模型的训练集,对生猪价格进行预测。结果表明,综合集成的梯度提升回归模型的均方差(MSE)为0.056,平均绝对误差(MAE)为0.18,判定系数为0.994,比前面单一模型预测效果好。最后,利用梯度提升回归模型对2017年2月至2017年11月的生猪价格预测,发现输出的预测值与真实值比较接近,最大相对误差为3.495%,梯度提升回归模型具有较高的预测精度。  相似文献   

13.
In this paper, we are mainly interested in inference on the reliability coefficient, R=P(X<Y), in proportional odds ratio models based on the new family of tilted survival functions introduced by Marshall and Olkin [Marshall, A.W., Olkin, I., 1997. A new method for adding a parameter to a family of distributions with application to the exponential and Weibull families. Biometrika 84 (3), 641-652]. We also present some results on stochastic comparison between the survival distribution functions. Asymptotic and various bootstrap confidence intervals of R are investigated. The performance of asymptotic and bootstrap confidence intervals is studied through a simulation. A numerical example based on real-life data is presented to illustrate the implementation of the proposed procedure.  相似文献   

14.
A parametric regression model for right-censored data with a log-linear median regression function and a transformation in both response and regression parts, named parametric Transform-Both-Sides (TBS) model, is presented. The TBS model has a parameter that handles data asymmetry while allowing various different distributions for the error, as long as they are unimodal symmetric distributions centered at zero. The discussion is focused on the estimation procedure with five important error distributions (normal, double-exponential, Student’s t, Cauchy and logistic) and presents properties, associated functions (that is, survival and hazard functions) and estimation methods based on maximum likelihood and on the Bayesian paradigm. These procedures are implemented in TBSSurvival, an open-source fully documented R package. The use of the package is illustrated and the performance of the model is analyzed using both simulated and real data sets.  相似文献   

15.
In this paper, several diagnostics measures are proposed based on case-deletion model for log-Birnbaum-Saunders regression models (LBSRM), which might be a necessary supplement of the recent work presented by Galea et al. [2004. Influence diagnostics in log-Birnbaum-Saunders regression models. J. Appl. Statist. 31, 1049-1064] who studied the influence diagnostics for LBSRM mainly based on the local influence analysis. It is shown that the case-deletion model is equivalent to the mean-shift outlier model in LBSRM and an outlier test is presented based on mean-shift outlier model. Furthermore, we investigate a test of homogeneity for shape parameter in LBSRM, which is a problem mentioned by both Rieck and Nedelman [1991. A log-linear model for the Birnbaum-Saunders distribution. Technometrics 33, 51-60] and Galea et al. [2004. Influence diagnostics in log-Birnbaum-Saunders regression models. J. Appl. Statist. 31, 1049-1064]. We obtain the likelihood ratio and score statistics for such test. Finally, a numerical example is given to illustrate our methodology and the properties of likelihood ratio and score statistics are investigated through Monte Carlo simulations.  相似文献   

16.
A new method called stepwise local influence analysis is proposed to detect influential observations and to identify masking effects in a dataset. Influential observations are detected step-by-step such that any highly influential observations identified in a previous step are removed from the perturbation in the next step. The process iterates until no further influential observations can be found. It is shown that this new method is very effective to identify the influential observations and has the power to uncover the masking effects. Additionally, the issues of constraints on perturbation vectors and bench-mark determination are discussed. Several examples with regression models and linear mixed models are illustrated for the proposed methodology.  相似文献   

17.
Comparing cost prediction models by resampling techniques   总被引:1,自引:0,他引:1  
The accurate software cost prediction is a research topic that has attracted much of the interest of the software engineering community during the latest decades. A large part of the research efforts involves the development of statistical models based on historical data. Since there are a lot of models that can be fitted to certain data, a crucial issue is the selection of the most efficient prediction model. Most often this selection is based on comparisons of various accuracy measures that are functions of the model’s relative errors. However, the usual practice is to consider as the most accurate prediction model the one providing the best accuracy measure without testing if this superiority is in fact statistically significant. This policy can lead to unstable and erroneous conclusions since a small change in the data is able to turn over the best model selection. On the other hand, the accuracy measures used in practice are statistics with unknown probability distributions, making the testing of any hypothesis, by the traditional parametric methods, problematic. In this paper, the use of statistical simulation tools is proposed in order to test the significance of the difference between the accuracy of two prediction methods: regression and estimation by analogy. The statistical simulation procedures involve permutation tests and bootstrap techniques for the construction of confidence intervals for the difference of measures. Four known datasets are used for experimentation in order to validate the results and make comparisons between the simulation methods and the traditional parametric and non-parametric procedures.  相似文献   

18.
In this paper we discuss log-Birnbaum–Saunders regression models with censored observations. This kind of model has been largely applied to study material lifetime subject to failure or stress. The score functions and observed Fisher information matrix are given as well as the process for estimating the regression coefficients and shape parameter is discussed. The normal curvatures of local influence are derived under various perturbation schemes and two deviance-type residuals are proposed to assess departures from the log-Birnbaum–Saunders error assumption as well as to detect outlying observations. Finally, a data set from the medical area is analyzed under log-Birnbaum–Saunders regression models. A diagnostic analysis is performed in order to select an appropriate model.  相似文献   

19.
A new concept and method of imposing imprecise (fuzzy) input and output data upon the conventional linear regression model is proposed. Under the considerations of fuzzy parameters and fuzzy arithmetic operations (fuzzy addition and multiplication), we propose a fuzzy linear regression model which has the similar form as that of conventional one. We conduct the h-level (conventional) linear regression models of fuzzy linear regression model for the sake of invoking the statistical techniques in (conventional) linear regression analysis for real-valued data. In order to determine the sign (nonnegativity or nonpositivity) of fuzzy parameters, we perform the statistical testing hypotheses and evaluate the confidence intervals. Using the least squares estimators obtained from the h-level linear regression models, we can construct the membership functions of fuzzy least squares estimators via the form of “Resolution Identity” which is well-known in fuzzy sets theory. In order to obtain the membership degree of any given estimate taken from the fuzzy least squares estimator, optimization problems have to be solved. We also provide two computational procedures to deal with those optimization problems.  相似文献   

20.
In this paper we discuss log-Birnbaum-Saunders regression models with censored observations. This kind of model has been largely applied to study material lifetime subject to failure or stress. The score functions and observed Fisher information matrix are given as well as the process for estimating the regression coefficients and shape parameter is discussed. The normal curvatures of local influence are derived under various perturbation schemes and two deviance-type residuals are proposed to assess departures from the log-Birnbaum-Saunders error assumption as well as to detect outlying observations. Finally, a data set from the medical area is analyzed under log-Birnbaum-Saunders regression models. A diagnostic analysis is performed in order to select an appropriate model.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号