期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Mixture cure models for multivariate survival data

Binbing Yu Yingwei Peng 《Computational statistics & data analysis》2008,52(3):1524-1532

Mixture cure models (MCMs) have been widely used to analyze survival data with a cure fraction. The MCMs postulate that a fraction of the patients are cured from the disease and that the failure time for the uncured patients follows a proper survival distribution, referred to as latency distribution. The MCMs have been extended to bivariate survival data by modeling the marginal distributions. In this paper, the marginal MCM is extended to multivariate survival data. The new model is applicable to the survival data with varied cluster size and interval censoring. The proposed model allows covariates to be incorporated into both the cure fraction and the latency distribution for the uncured patients. The primary interest is to estimate the marginal parameters in the mean structure, where the correlation structure is treated as nuisance parameters. The marginal parameters are estimated consistently by treating the observations within the cluster as independent. The variances of the parameters are estimated by the one-step jackknife method. The proposed method does not depend on the specification of correlation structure. Simulation studies show that the new method works well when the marginal model is correct. The performance of the MCM is also examined when the clustered survival times share common random effect. The MCM is applied to the data from a smoking cessation study. 相似文献

2.

CANSURV: A Windows program for population-based cancer survival analysis

Yu B Tiwari RC Cronin KA McDonald C Feuer EJ 《Computer methods and programs in biomedicine》2005,80(3):195-203

Patient survival is one of the most important measures of cancer patient care (the diagnosis and treatment of cancer). The optimal method for monitoring the progress of patient care across the full spectrum of provider settings is through the population-based study of cancer patient survival, which is only possible using data collected by population-based cancer registries. The probability of cure, “statistical cure”, is defined for a cohort of cancer patients as the percent of patients whose annual death rate equals the death rate of general cancer-free population. Mixture cure models have been widely used to model failure time data. The models provide simultaneous estimates of the proportion of the patients cured from cancer and the distribution of the failure times for the uncured patients (latency distribution). CANSURV (CAN-cer SURVival) is a Windows software fitting both the standard survival models and the cure models to population-based cancer survival data. CANSURV can analyze both cause-specific survival data and, especially, relative survival data, which is the standard measure of net survival in population-based cancer studies. It can also fit parametric (cure) survival models to the individual data. The program is available at http://srab.cancer.gov/cansurv. The colorectal cancer survival data from the Surveillance, Epidemiology and End Results (SEER) program [Surveillance, Epidemiology and End Results Program, The Portable Survival System/Mainframe Survival System, National Cancer Institute, Bethesda, 1999.] of the National Cancer Institute, NIH is used to demonstrate the use of CANSURV program. 相似文献

3.

Global sensitivity analysis for large-scale socio-hydrological models using Hadoop

《Environmental Modelling & Software》2015

A multi-agent system (MAS) model is coupled with a physically-based groundwater model to understand the declining water table in the heavily irrigated Republican River basin. Each agent in the MAS model is associated with five behavioral parameters, and we estimate their influences on the coupled models using Global Sensitivity Analysis (GSA). This paper utilizes Hadoop-based Cloud Computing techniques and Polynomial Chaos Expansion (PCE) based variance decomposition approach for the improvement of GSA with large-scale socio-hydrological models. With the techniques, running 1000 scenarios of the coupled models can be completed within two hours with Hadoop clusters, a substantial improvement over the 42 days required to run these scenarios sequentially on a desktop machine. Based on the model results, GSA is conducted with the surrogate model derived from using PCE to measure the impacts of the spatio-temporal variations of the behavioral parameters on crop profits and the water table, identifying influential parameters. 相似文献

4.

Semiparametric regression analysis of panel count data with informative observation times

Xingqiu ZhaoXingwei Tong 《Computational statistics & data analysis》2011,55(1):291-300

This paper discusses regression analysis of panel count data that arise naturally when recurrent events are considered. For the analysis of panel count data, most of the existing methods have assumed that observation times are completely independent of recurrent events or given covariates, which may not be true in practice. We propose a joint modeling approach that uses an unobserved random variable and a completely unspecified link function to characterize the correlations between the response variable and the observation times. For inference about regression parameters, estimating equation approaches are developed without involving any estimation for latent variables, and the asymptotic properties of the resulting estimators are established. In addition, a technique is provided for assessing the adequacy of the model. The performance of the proposed estimation procedures are evaluated by means of Monte Carlo simulations, and a data set from a bladder tumor study is analyzed as an illustrative example. 相似文献

5.

Statistical analysis of water-quality data containing multiple detection limits: S-language software for regression on order statistics

Lopaka Lee Dennis Helsel 《Computers & Geosciences》2005,31(10):1133

Trace contaminants in water, including metals and organics, often are measured at sufficiently low concentrations to be reported only as values below the instrument detection limit. Interpretation of these “less thans” is complicated when multiple detection limits occur. Statistical methods for multiply censored, or multiple-detection limit, datasets have been developed for medical and industrial statistics, and can be employed to estimate summary statistics or model the distributions of trace-level environmental data.We describe S-language-based software tools that perform robust linear regression on order statistics (ROS). The ROS method has been evaluated as one of the most reliable procedures for developing summary statistics of multiply censored data. It is applicable to any dataset that has 0 to 80% of its values censored. These tools are a part of a software library, or add-on package, for the R environment for statistical computing. This library can be used to generate ROS models and associated summary statistics, plot modeled distributions, and predict exceedance probabilities of water-quality standards. 相似文献

6.

Extending a global sensitivity analysis technique to models with correlated parameters 总被引：2，自引：0，他引：2

C. Xu 《Computational statistics & data analysis》2007,51(12):5579-5590

The identification and representation of uncertainty is recognized as an essential component in model applications. One important approach in the identification of uncertainty is sensitivity analysis. Sensitivity analysis evaluates how the variations in the model output can be apportioned to variations in model parameters. One of the most popular sensitivity analysis techniques is Fourier amplitude sensitivity test (FAST). The main mechanism of FAST is to assign each parameter with a distinct integer frequency (characteristic frequency) through a periodic sampling function. Then, for a specific parameter, the variance contribution can be singled out of the model output by the characteristic frequency based on a Fourier transformation. One limitation of FAST is that it can only be applied for models with independent parameters. However, in many cases, the parameters are correlated with one another. In this study, we propose to extend FAST to models with correlated parameters. The extension is based on the reordering of the independent sample in the traditional FAST. We apply the improved FAST to linear, nonlinear, nonmonotonic and real application models. The results show that the sensitivity indices derived by FAST are in a good agreement with those from the correlation ratio sensitivity method, which is a nonparametric method for models with correlated parameters. 相似文献

7.

Nonparametric regression models for right-censored data using Bernstein polynomials

Muhtarjan Osman Sujit K. Ghosh 《Computational statistics & data analysis》2012,56(3):559-573

In some applications of survival analysis with covariates, the commonly used semiparametric assumptions (e.g., proportional hazards) may turn out to be stringent and unrealistic, particularly when there is scientific background to believe that survival curves under different covariate combinations will cross during the study period. We present a new nonparametric regression model for the conditional hazard rate using a suitable sieve of Bernstein polynomials. The proposed nonparametric methodology has three key features: (i) the smooth estimator of the conditional hazard rate is shown to be a unique solution of a strictly convex optimization problem for a wide range of applications; making it computationally attractive, (ii) the model is shown to encompass a proportional hazards structure, and (iii) large sample properties including consistency and convergence rates are established under a set of mild regularity conditions. Empirical results based on several simulated data scenarios indicate that the proposed model has reasonably robust performance compared to other semiparametric models particularly when such semiparametric modeling assumptions are violated. The proposed method is further illustrated on the gastric cancer data and the Veterans Administration lung cancer data. 相似文献

8.

Parameter uncertainty and temporal dynamics of sensitivity for hydrologic models: A hybrid sequential data assimilation and probabilistic collocation method

《Environmental Modelling & Software》2016

In this study, a hybrid sequential data assimilation and probabilistic collocation (HSDAPC) approach is proposed for analyzing uncertainty propagation and parameter sensitivity of hydrologic models. In HSDAPC, the posterior probability distributions of model parameters are first estimated through a particle filter method based on streamflow discharge data. A probabilistic collocation method (PCM) is further employed to show uncertainty propagation from model parameters to model outputs. The temporal dynamics of parameter sensitivities are then generated based on the polynomial chaos expansion (PCE) generated by PCM, which can reveal the dominant model components for different catchment conditions. The maximal information coefficient (MIC) is finally employed to characterize the correlation/association between model parameter sensitivity and catchment precipitation, potential evapotranspiration and observed discharge. The proposed method is applied to the Xiangxi River located in the Three Gorges Reservoir area. The results show that: (i) the proposed HSDAPC approach can generate effective 2nd and 3rd PCE models which provide accuracy predictions; (ii) 2^nd-order PCE, which can run nearly ten time faster than the hydrologic model, can capably represent the original hydrological model to show the uncertainty propagation in a hydrologic simulation; (iii) the slow (R_s) and quick flows (R_q) in Hymod show significant sensitivities during the simulation periods but the distribution factor (α) shows a least sensitivity to model performance; (iv) the model parameter sensitivities show significant correlation with the catchment hydro-meteorological conditions, especially during the rainy period with MIC values larger than 0.5. Overall, the results in this paper indicate that uncertainty propagation and temporal sensitivities of parameters can be effectively characterized through the proposed HSDAPC approach. 相似文献

9.

A data envelopment analysis method for optimizing multi-response problem with censored data in the Taguchi method 总被引：1，自引：0，他引：1

Hung-Chang Liao 《Computers & Industrial Engineering》2004,46(4):817-835

Taguchi method is an efficient method used in off-line quality control in that the experimental design is combined with the quality loss. This method including three stages of systems design, parameter design, and tolerance design has been deeply discussed in Phadke [Quality engineering using robust design (1989)]. It is observable that most industrial applications solved by Taguchi method belong to single-response problems. However, in the real world more than one quality characteristic should be considered for most industrial products, i.e. most problems customers concern about are multi-response problems. As a result, Taguchi method is not appropriate to optimize a multi-response problem. At present, it is still necessary to rely on the engineering judgment to optimize the multi-response problem; therefore uncertainty will be increased during the decision-making process. On the other hand, due to some uncontrollable causes occurring, only a portion of experiment can be completed so that the censored data will be produced. Traditional approaches for analysis of censored data are computationally complicated. In order to overcome above two shortages, this article proposes an effective procedure on the basis of the neural network (NN) and the data envelopment analysis (DEA) to optimize the multi-response problems. A case study of improving the quality of hard disk driver in Su and Tong [ Total Quality Management 8 (1997) 409] is resolved by the proposed procedure. The result indicates that it yields a satisfactory solution. 相似文献

10.

A semi-parametric generalization of the Cox proportional hazards regression model: Inference and Applications

Devarajan K Ebrahimi N 《Computational statistics & data analysis》2011,55(1):667-676

The assumption of proportional hazards (PH) fundamental to the Cox PH model sometimes may not hold in practice. In this paper, we propose a generalization of the Cox PH model in terms of the cumulative hazard function taking a form similar to the Cox PH model, with the extension that the baseline cumulative hazard function is raised to a power function. Our model allows for interaction between covariates and the baseline hazard and it also includes, for the two sample problem, the case of two Weibull distributions and two extreme value distributions differing in both scale and shape parameters. The partial likelihood approach can not be applied here to estimate the model parameters. We use the full likelihood approach via a cubic B-spline approximation for the baseline hazard to estimate the model parameters. A semi-automatic procedure for knot selection based on Akaike’s information criterion is developed. We illustrate the applicability of our approach using real-life data. 相似文献

11.

Mixture Poisson regression models for heterogeneous count data based on latent and fuzzy class analysis

Miin-Shen Yang Chien-Yo Lai 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2005,9(7):519-524

In this paper we propose a new approach, called a fuzzy class model for Poisson regression, in the analysis of heterogeneous count data. On the basis of fuzzy set concept and fuzzy classification maximum likelihood (FCML) procedures we create an FCML algorithm for fuzzy class Poisson regression models. Traditionally, the EM algorithm had been used for latent class regression models. Thus, the accuracy and effectiveness of EM and FCML algorithms for estimating the parameters are compared. The results show that the proposed FCML algorithm presents better accuracy and effectiveness and can be used as another good tool to regression analysis for heterogeneous count data.This work was supported in part by the National Science Council of Taiwan under Grant NSC-89-2213-E-033-007. 相似文献

12.

基于纵向-生存联合模型的游戏行会生存预测

下载免费PDF全文

刘贺飞陈小红阮彤《计算机工程与应用》2018,54(14):264-270

行会生存对提高游戏用户的活跃度和留存率有着积极的作用。目前行会生存分析方法是使用分类法,即把行会是否生存看作一个二分类问题来处理,其未能充分利用行会纵向数据,不能及时反映行会的状态变化和生存趋势。采用纵向-生存联合模型,充分利用游戏行会纵向的状态变化特征和成员行为特征,预测行会的生存状态。实验表明,纵向-生存联合模型相比传统的Cox比例风险模型,综合性能提高了56.6%,相比分类算法提高了预测性能,如逻辑回归提高了11.9%。实验中发现：成员权利等级标准差对行会生存呈现了正向的影响,说明了行会内成员权利等级有着良好的分布对行会的生存有着重要作用;成员私聊次数标准差和成员PK次数标准差对行会生存有着积极的影响,说明行会成员行为差异性的重要性;生存时间对行会有着负向的影响,即行业已生存时间越长,越不利于行会的生存。相似文献

13.

一种基于容错的感知数据回归模型研究

左向东王坤邱辉《计算机科学》2016,43(2):140-143

传感器主要用于对外部环境进行监测,然而当传感器发生故障时监测结果会出现误差。为了提高传感器发生故障时系统的容错能力,提出了一种容错的感知数据回归模型。首先,对最小二乘和岭回归两种线性回归模型进行分析,并分析了线性回归模型的相关统计量;然后,分析了部分传感器发生故障时系统的相关统计量,并以此为基础分析了协变量矩阵的上下界;最后,依据协变量矩阵定义了故障指标,并将优化模型转化为同时最小化故障指标和均方误差的问题。实验表明,提出的容错回归模型与传统的最小二乘法和岭回归方法相比具有更小的预测误差,因而当传感器发生故障时所提模型具有更好的健壮性。相似文献

14.

Multiparametric sensitivity analysis of the additive model in data envelopment analysis

Sanjeet Singh 《International Transactions in Operational Research》2010,17(3):365-380

In this paper, we study multiparametric sensitivity analysis of the additive model in data envelopment analysis using the concept of maximum volume in the tolerance region. We construct critical regions for simultaneous and independent perturbations in all inputs/outputs of an efficient decision making unit. Necessary and sufficient conditions are derived to classify the perturbation parameters as “focal” and “nonfocal.” Nonfocal parameters can have unlimited variations because of their low sensitivity in practice and these parameters can be deleted from the final analysis. For focal parameters a maximum volume region is characterized. Theoretical results are illustrated with the help of a numerical example. 相似文献

15.

Semiparametric analysis of clustered interval-censored survival data with a cure fraction

《Computational statistics & data analysis》2014

A generalization of the semiparametric Cox’s proportional hazards model by means of a random effect or frailty approach to accommodate clustered survival data with a cure fraction is considered. The frailty serves as a quantification of the health condition of the subjects under study and may depend on some observed covariates like age. One single individual-specific frailty that acts on the hazard function is adopted to determine the cure status of an individual and the heterogeneity on the time to event if the individual is not cured. Under this formulation, an individual who has a high propensity to be cured would tend to have a longer time to event if he is not cured. Within a cluster, both the cure statuses and the times to event of the individuals would be correlated. In contrast to some models proposed in the literature, the model accommodates the correlations among the observations in a more natural way. A multiple imputation estimation method is proposed for both right-censored and interval-censored data. Simulation studies show that the performance of the proposed estimation method is highly satisfactory. The proposed model and method are applied to the National Aeronautics and Space Administration’s hypobaric decompression sickness data to investigate the factors associated with the occurrence and the time to onset of grade IV venous gas emboli under hypobaric environments. 相似文献

16.

Parameter estimation and sensitivity analysis of fat deposition models in beef steers using acslXtreme

Malcolm McPhee Jim Oltjen James Fadel David Mayer Roberto Sainz 《Mathematics and computers in simulation》2009

The Davis Growth Model (a dynamic steer growth model encompassing 4 fat deposition models) is currently being used by the phenotypic prediction program of the Cooperative Research Centre (CRC) for Beef Genetic Technologies to predict P8 fat (mm) in beef cattle to assist beef producers meet market specifications. The concepts of cellular hyperplasia and hypertrophy are integral components of the Davis Growth Model. The net synthesis of total body fat (kg) is calculated from the net energy available after accounting for energy needs for maintenance and protein synthesis. Total body fat (kg) is then partitioned into 4 fat depots (intermuscular, intramuscular, subcutaneous, and visceral). This paper reports on the parameter estimation and sensitivity analysis of the DNA (deoxyribonucleic acid) logistic growth equations and the fat deposition first-order differential equations in the Davis Growth Model using acslXtreme (Hunstville, AL, USA, Xcellon). The DNA and fat deposition parameter coefficients were found to be important determinants of model function; the DNA parameter coefficients with days on feed >100 days and the fat deposition parameter coefficients for all days on feed. The generalized NL2SOL optimization algorithm had the fastest processing time and the minimum number of objective function evaluations when estimating the 4 fat deposition parameter coefficients with 2 observed values (initial and final fat). The subcutaneous fat parameter coefficient did indicate a metabolic difference for frame sizes. The results look promising and the prototype Davis Growth Model has the potential to assist the beef industry meet market specifications. 相似文献

17.

Fitting marginal accelerated failure time models to clustered survival data with potentially informative cluster size

Jie FanSomnath Datta 《Computational statistics & data analysis》2011,55(12):3295-3303

Methods for analyzing clustered survival data are gaining popularity in biomedical research. Naive attempts to fitting marginal models to such data may lead to biased estimators and misleading inference when the size of a cluster is statistically correlated with some cluster specific latent factors or one or more cluster level covariates. A simple adjustment to correct for potentially informative cluster size is achieved through inverse cluster size reweighting. We give a methodology that incorporates this technique in fitting an accelerated failure time marginal model to clustered survival data. Furthermore, right censoring is handled by inverse probability of censoring reweighting through the use of a flexible model for the censoring hazard. The resulting methodology is examined through a thorough simulation study. Also an illustrative example using a real dataset is provided that examines the effects of age at enrollment and smoking on tooth survival. 相似文献

18.

基于数据挖掘的人口数据预测模型综述 总被引：3，自引：1，他引：2

下载免费PDF全文

师瑞峰周一民《计算机工程与应用》2008,44(9):1-6

论文调查了国内外基于数据挖掘技术的人口数据预测模型。根据预测目的不同对这些模型进行了分类比较,在此基础上综合各模型的优缺点,对今后的研究工作做了进一步展望。相似文献

19.

Estimation of Kendall’s tau from censored data

Jin-Jian Hsieh 《Computational statistics & data analysis》2010,54(6):1613-3043

This paper considers the nonparametric estimation of Kendall’s tau for bivariate censored data. Under censoring, there have been some papers discussing the nonparametric estimation of Kendall’s tau, such as Wang and Wells (2000), Oakes (2008) and Lakhal et al. (2009). In this article, we consider an alternative approach to estimate Kendall’s tau. The main idea is to replace a censored event-time by a proper imputation. Thus, it induces three estimators, say , , and . We also apply the bootstrap method to estimate the variance of , and and to construct the corresponding confidence interval. Furthermore, we analyze two data sets by the suggested approach, and compare these practical estimators of Kendall’s tau in simulation studies. 相似文献

20.

Random network models and sensitivity algorithms for the analysis of ordering time and inventory state in multi-stage supply chains

《Computers & Industrial Engineering》2014

Supply chains in reality face a highly dynamic and uncertain environment, especially the uncertain end-customer demands and orders. Since the condition of product market changes frequently, the tasks of order management, product planning, and inventory management are complex and difficult. It is imperative for companies to develop new ways to manage the randomness and uncertainty in market demands. Based on the graphical evaluation and review technique, this paper provides a simple but integrated stochastic network mathematical model for supply chain ordering time distribution analysis. Then the ordering time analysis model is extended so that the analysis of inventory level distribution characteristics of supply chain members is allowed. Further, to investigate the effects of different end-customer demands on upstream orders and relative inventory levels, model-based sensitivity analysis algorithms for ordering fluctuations and inventory fluctuations are developed. A detailed numerical example is presented to illustrate the application of the proposed models to a multi-stage supply chain system, and the results of which shows the effectiveness and flexibility of the proposed stochastic network models and algorithms in order and inventory management. 相似文献