首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 359 毫秒
1.
The cure fraction models have been widely used to analyze survival data in which a proportion of the individuals is not susceptible to the event of interest. In this article, we introduce a bivariate model for survival data with a cure fraction based on the three-parameter generalized Lindley distribution. The joint distribution of the survival times is obtained by using copula functions. We consider three types of copula function models, the Farlie–Gumbel–Morgenstern (FGM), Clayton and Gumbel–Barnett copulas. The model is implemented under a Bayesian framework, where the parameter estimation is based on Markov Chain Monte Carlo (MCMC) techniques. To illustrate the utility of the model, we consider an application to a real data set related to an invasive cervical cancer study.  相似文献   

2.
In biomedical, genetic and social studies, there may exist a fraction of individuals not experiencing the event of interest such that the survival curves eventually level off to nonzero proportions. These people are referred to as “cured” or “nonsusceptible” individuals. Models that have been developed to address this issue are known as cured models. The mixture model, which consists of a model for the binary cure status and a survival model for the event times of the noncured individuals, is one of the widely used cure models. In this paper, we propose a class of semiparametric transformation cure models for multivariate survival data with a surviving fraction by fitting a logistic regression model to the cure status and a semiparametric transformation model to the event time of the noncured individual. Both models allow incorporating covariates and do not require any assumption of the association structure. The statistical inference is based on the marginal approach by constructing a system of estimating equations. The asymptotic properties of the proposed estimators are proved, and the performance of the estimation is demonstrated via simulations. In addition, the approach is illustrated by analyzing the smoking cessation data.  相似文献   

3.
Cure models have been developed to analyze failure time data with a cured fraction. For such data, standard survival models are usually not appropriate because they do not account for the possibility of cure. Mixture cure models assume that the studied population is a mixture of susceptible individuals, who may experience the event of interest, and non-susceptible individuals that will never experience it. The aim of this paper is to propose a SAS macro to estimate parametric and semiparametric mixture cure models with covariates. The cure fraction can be modelled by various binary regression models. Parametric and semiparametric models can be used to model the survival of uncured individuals. The maximization of the likelihood function is performed using SAS PROC NLMIXED for parametric models and through an EM algorithm for the Cox's proportional hazards mixture cure model. Indications and limitations of the proposed macro are discussed and an example in the field of cancer clinical trials is shown.  相似文献   

4.
Mixture cure models (MCMs) have been widely used to analyze survival data with a cure fraction. The MCMs postulate that a fraction of the patients are cured from the disease and that the failure time for the uncured patients follows a proper survival distribution, referred to as latency distribution. The MCMs have been extended to bivariate survival data by modeling the marginal distributions. In this paper, the marginal MCM is extended to multivariate survival data. The new model is applicable to the survival data with varied cluster size and interval censoring. The proposed model allows covariates to be incorporated into both the cure fraction and the latency distribution for the uncured patients. The primary interest is to estimate the marginal parameters in the mean structure, where the correlation structure is treated as nuisance parameters. The marginal parameters are estimated consistently by treating the observations within the cluster as independent. The variances of the parameters are estimated by the one-step jackknife method. The proposed method does not depend on the specification of correlation structure. Simulation studies show that the new method works well when the marginal model is correct. The performance of the MCM is also examined when the clustered survival times share common random effect. The MCM is applied to the data from a smoking cessation study.  相似文献   

5.
With parametric cure models, we can express survival parameters (e.g. cured fraction, location and scale parameters) as functions of covariates. These models can measure survival from a specific disease process, either by examining deaths due to the cause under study (cause-specific survival), or by comparing all deaths to those in a matched control population (relative survival). We present a binomial maximum likelihood algorithm to be used for actuarial data, where follow-up times are grouped into specific intervals. Our algorithm provides simultaneous maximum likelihood estimates for all the parameters of a cure model and can be used for cause-specific or relative survival analysis with a variety of survival distributions. Current software does not provide the flexibility of this unified approach.  相似文献   

6.
In this paper, a survival model with long-term survivors and random effects, based on the promotion time cure rate model formulation for models with a surviving fraction is investigated. We present Bayesian and classical estimation approaches. The Bayesian approach is implemented using a Markov chain Monte Carlo (MCMC) based on the Metropolis-Hastings algorithms. For the second one, we use restricted maximum likelihood (REML) estimators. A simulation study is performed to evaluate the accuracy of the applied techniques for the estimates and their standard deviations. An example on an oropharynx cancer study is used to illustrate the model and the estimation approaches considered in the study.  相似文献   

7.

A short introduction to survival analysis and censored data is included in this paper. A thorough literature review in the field of cure models has been done. An overview on the most important and recent approaches on parametric, semiparametric and nonparametric mixture cure models is also included. The main nonparametric and semiparametric approaches were applied to a real time dataset of COVID-19 patients from the first weeks of the epidemic in Galicia (NW Spain). The aim is to model the elapsed time from diagnosis to hospital admission. The main conclusions, as well as the limitations of both the cure models and the dataset, are presented, illustrating the usefulness of cure models in this kind of studies, where the influence of age and sex on the time to hospital admission is shown.

  相似文献   

8.
Mixture analysis is a necessary component for capturing sub-pixel heterogeneity in the characterization of land cover from remotely sensed images. Mixture analysis approaches in remote sensing vary from conventional linear mixture models to nonlinear neural network mixture models. Linear mixture models are fairly simple and generally result in poor mixture analysis accuracy. Neural network models can achieve much higher accuracy, but typically lack interpretability. In this paper we present a mixture discriminant analysis (MDA) model for inferring land cover fractions within forest stands from Landsat Thematic Mapper images. Specifically, individual class distributions are modeled as mixtures of subclasses of Gaussian distributions, and land cover fractions are estimated using the corresponding posterior probabilities. Compared to a benchmark study on accuracy of mixture models with Plumas National Forest data, this MDA model easily outperforms traditional linear mixture models and parallels the performance of the ARTMAP neural network mixture model. In other words, the MDA model is observed to successfully combine the performance characteristics of more complex neural network models (due to the nonlinear nature of its classification rules), with the ease of interpretation associated with linear mixture models (due to its relatively simple structure). MDA models therefore offer an attractive alternative for addressing the mixture modeling problem in remote sensing.  相似文献   

9.
Patient survival is one of the most important measures of cancer patient care (the diagnosis and treatment of cancer). The optimal method for monitoring the progress of patient care across the full spectrum of provider settings is through the population-based study of cancer patient survival, which is only possible using data collected by population-based cancer registries. The probability of cure, “statistical cure”, is defined for a cohort of cancer patients as the percent of patients whose annual death rate equals the death rate of general cancer-free population. Mixture cure models have been widely used to model failure time data. The models provide simultaneous estimates of the proportion of the patients cured from cancer and the distribution of the failure times for the uncured patients (latency distribution). CANSURV (CAN-cer SURVival) is a Windows software fitting both the standard survival models and the cure models to population-based cancer survival data. CANSURV can analyze both cause-specific survival data and, especially, relative survival data, which is the standard measure of net survival in population-based cancer studies. It can also fit parametric (cure) survival models to the individual data. The program is available at http://srab.cancer.gov/cansurv. The colorectal cancer survival data from the Surveillance, Epidemiology and End Results (SEER) program [Surveillance, Epidemiology and End Results Program, The Portable Survival System/Mainframe Survival System, National Cancer Institute, Bethesda, 1999.] of the National Cancer Institute, NIH is used to demonstrate the use of CANSURV program.  相似文献   

10.
A generalization of the semiparametric Cox’s proportional hazards model by means of a random effect or frailty approach to accommodate clustered survival data with a cure fraction is considered. The frailty serves as a quantification of the health condition of the subjects under study and may depend on some observed covariates like age. One single individual-specific frailty that acts on the hazard function is adopted to determine the cure status of an individual and the heterogeneity on the time to event if the individual is not cured. Under this formulation, an individual who has a high propensity to be cured would tend to have a longer time to event if he is not cured. Within a cluster, both the cure statuses and the times to event of the individuals would be correlated. In contrast to some models proposed in the literature, the model accommodates the correlations among the observations in a more natural way. A multiple imputation estimation method is proposed for both right-censored and interval-censored data. Simulation studies show that the performance of the proposed estimation method is highly satisfactory. The proposed model and method are applied to the National Aeronautics and Space Administration’s hypobaric decompression sickness data to investigate the factors associated with the occurrence and the time to onset of grade IV venous gas emboli under hypobaric environments.  相似文献   

11.
Model selection and model combination is a general problem in many areas. Especially, when we have several different candidate models and also have gathered a new data set, we want to construct a more accurate and precise model in order to help predict future events. In this paper, we propose a new data-guided model combination method by decomposition and aggregation. With the aid of influence diagrams, we analyze the dependence among candidate models and apply latent factors to characterize such dependence. After analyzing model structures in this framework, we derive an optimal composite model. Two widely used data analysis tools, namely, Principal Component Analysis (PCA) and Independent Component Analysis (ICA) are applied for the purpose of factor extraction from the class of candidate models. Once factors are ready, they are sorted and aggregated in order to produce composite models. During the course of factor aggregation, another important issue, namely factor selection, is also touched on. Finally, a numerical study shows how this method works and an application using physical data is also presented. Editor: Dan Roth  相似文献   

12.
This article investigates emission behaviour at frequencies of 18.7, 36.5 and 89 GHz and an incidence angle of 55° over a snow-covered surface at the local scale observation site in Fraser, CO, USA, using both one-layer and two-layer emission models. The models employ the matrix doubling approach to implement the radiative-transfer equation based on dense media theory and the advanced integral equation model. When compared to Ground-Based Passive Microwave Radiometer (GBMR-7) observation on 21 February 2003, both the models could simulate the observed brightness temperature well, but the polarization difference between the observation and the models was smaller for the two-layer emission model than the one-layer model. In addition, we successfully interpreted the emission magnitude and polarization separation of a snow-removed surface by incorporating a Mie scattering transition layer above the soil medium. In this work, we also demonstrated the effect of snow fraction on the brightness temperature difference at 18.7 and 36.5 GHz over a snow-covered surface with the field observation. In conclusion, we demonstrate the snow impact on soil surface with snow depth (SD) and snow fraction variation through modelling and in situ data.  相似文献   

13.
In the last decade,ranking units in data envelopment analysis(DEA) has become the interests of many DEA researchers and a variety of models were developed to rank units with multiple inputs and multiple outputs.These performance factors(inputs and outputs) are classified into two groups:desirable and undesirable.Obviously,undesirable factors in production process should be reduced to improve the performance.Also,some of these data may be known only in terms of ordinal relations.While the models developed in the past are interesting and meaningful,they didn t consider both undesirable and ordinal factors at the same time.In this research,we develop an evaluating model and a ranking model to overcome some deficiencies in the earlier models.This paper incorporates undesirable and ordinal data in DEA and discusses the efficiency evaluation and ranking of decision making units(DMUs) with undesirable and ordinal data.For this purpose,we transform the ordinal data into definite data,and then we consider each undesirable input and output as desirable output and input,respectively.Finally,an application that shows the capability of the proposed method is illustrated.  相似文献   

14.
二维主分量分析是一种直接面向图像矩阵表达方式的特征抽取与降维方法. 提出了一个基于二维主分量分析的概率模型. 首先, 通过对此产生式概率模型参数的最大似然估计得到主分量(矢量); 然后, 考虑到缺失数据问题, 利用期望最大化算法迭代估计模型参数和主分量. 混合概率二维主分量分析模型在人脸聚类问题上的应用表明概率二维主分量分析模型能作为图像矩阵的密度估计工具. 含有缺失值的人脸图像重构实验阐述了此模型及迭代算法的有效性.  相似文献   

15.
Remotely sensed vegetation indices are widely used to detect greening and browning trends; especially the global coverage of time-series normalized difference vegetation index (NDVI) data which are available from 1981. Seasonality and serial auto-correlation in the data have previously been dealt with by integrating the data to annual values; as an alternative to reducing the temporal resolution, we apply harmonic analyses and non-parametric trend tests to the GIMMS NDVI dataset (1981-2006). Using the complete dataset, greening and browning trends were analyzed using a linear model corrected for seasonality by subtracting the seasonal component, and a seasonal non-parametric model. In a third approach, phenological shift and variation in length of growing season were accounted for by analyzing the time-series using vegetation development stages rather than calendar days. Results differed substantially between the models, even though the input data were the same. Prominent regional greening trends identified by several other studies were confirmed but the models were inconsistent in areas with weak trends. The linear model using data corrected for seasonality showed similar trend slopes to those described in previous work using linear models on yearly mean values. The non-parametric models demonstrated the significant influence of variations in phenology; accounting for these variations should yield more robust trend analyses and better understanding of vegetation trends.  相似文献   

16.
In a recent article, Wang et al. [Wang, N. S., Yi, R. H., & Wang, W. (2008). Evaluating the performances of decision-making units based on interval efficiencies. Journal of Computational and Applied Mathematics, 216, 328–343] proposed a pair of interval data envelopment analysis (DEA) models for measuring the overall performances of decision-making units (DMUs) with crisp data. In this paper, we demonstrate that interval DEA models face problems in determining the efficiency interval for each DMU when there are zero values for every input. To remedy this drawback, we propose a pair of improved interval DEA models which make it possible to perform a DEA analysis using the concepts of the best and the worst relative efficiencies. Two numerical examples will be examined using the improved interval DEA models. One of the examples is a real-world application about 42 educational departments in one of the branches of the Islamic Azad University in Iran that shows the advantages and applicability of the improved approach in real-life situations.  相似文献   

17.
The flow characteristics in open channel junctions are of great interest in hydraulic and environmental engineering areas. This study investigates the capacity of artificial neural network (ANN) models for representing and modelling the velocity distributions of combined open channel flows. ANN models are constructed and tested using data derived from computational-fluid-dynamics models. The orthogonal sampling method is used to select representative data. The ANN models trained and validated by representative data generally outperform those by using random data. Sobols' sensitivity analysis is performed to investigate contributions of different uncertainty sources to model performance. Results indicate that the major uncertainty source is from ANN model parameter initialization. Hence an ANN model training strategy is designed in order to reduce the main uncertainty source: models are trained for many runs with random model parameter initializations and the model with the best performance is adopted.  相似文献   

18.
《国际计算机数学杂志》2012,89(17):3709-3749
Subdivision schemes are multi-resolution methods used in computer-aided geometric design to generate smooth curves or surfaces. In this paper, we are interested in both smooth and non-smooth subdivision schemes. We propose two models that generalize the subdivision operation and can yield both smooth and non-smooth schemes in a controllable way:
  • (1) The ‘varying-resolution’ model allows a structured access to the various resolutions of the refined data, yielding certain patterns. This model generalizes the standard subdivision iterative operation and has interesting interpretations in the geometrical space and also in creativity-oriented domains, such as music. As an infrastructure for this model, we propose representing a subdivision scheme by two dual rules trees. The dual tree is a permuted rules tree that gives a new operator-oriented view on the subdivision process, from which we derive an ‘adjoint scheme’.

  • (2) The ‘generalized perturbed schemes’ model can be viewed as a special multi-resolution representation that allows a more flexible control on adding the details. For this model, we define the terms ‘template mask’ and ‘tension vector parameter’.

The non-smooth schemes are created by the permutations of the ‘varying-resolution’ model or by certain choices of the ‘generalized perturbed schemes’ model. We then present procedures that integrate and demonstrate these models and some enhancements that bear a special meaning in creative contexts, such as music, imaging and texture. We describe two new applications for our models: (a) data and music analysis and synthesis, which also manifests the usefulness of the non-smooth schemes and the approximations proposed, and (b) the acceleration of convergence and smoothness analysis, using the ‘dual rules tree’.  相似文献   

19.
当前,数据挖掘作为一种高时效性、高真实性的分析方法,正在社会中扮演着越发重要的角色,其在大型数据中快速挖掘模式,发现规律的能力正逐步取代人工的作用.而在当前各个计算机领域大行其道的大型分布式系统(如Hadoop、Spark等)的日志中,每天都产生着数以百万计的系统日志,这些日志的数据量之庞杂、关系之混乱,已大大影响了程序员对系统的人工监控效率,同时也提高了新程序员的培养成本.为解决以上问题,数据挖掘及系统分析两个领域相结合是一种必然的趋势,也因此,机器学习模型也越来越多地被业界提及用于做系统日志分析.然而大多数情况下,系统日志中,报告系统运行状态为“严重”的日志占少数,而这些少数信息才是程序员最需要关注的,然而大多数用于系统日志分析的机器学习模型都假设训练集的数据是均衡数据,因此这些模型在做系统日志预警时容易过度偏向大样本数据,以至于效果不够理想.本文将从深度学习角度出发,探究深度学习中的CNN-text(CT)在系统日志分析方面的应用能力,通过将CT与主流的系统日志分析机器学习模型SVM、决策树对比,探究CT相对于这些算法的优越性;将CT与CNN-RNN-text(CRT)进行对比,分析CT对特征的处理方式,证实CT在深度学习模型中处理系统日志类文本的优越性;最后将所有模型应用至两套不同的日志类文本数据中进行对比,证明CT的普适性.在CT同日志分析的主流机器学习模型对比的实验中,CT相较于最优模型的结果召回率提升了近15%;在CT同CRT模型对比的实验中,CT相较于更为先进的CRT,模型准确率高出约20%,召回率高出约80%、查准率高出约60%;在CT的普适性实验中,将各类模型融入到本文的实验数据集logstash和公开数据集WC85_1中,在准确率同其他表现较优的模型同为100%的情况下,CT的召回率高出其余召回率最高的模型(DT-Bi)近14%.从中可看出,相较于主流系统日志分析机器学习模型,如支持向量机、决策树、朴素贝叶斯等,CNN-text的局部特征提取能力及非线性拟合能力都有更为优异的表现;同时相较于同为深度学习CNN簇的CNN-RNN-text将大量权重投入到系统日志的序列特征中的特点,CNN-text则报以较少的关注,反而在序列不规则的系统日志中展现出比CNN-RNN-text更优秀的表现.最终证明了CNN-text是本文所提到的方法中最适合进行软件系统异常检测的方法.  相似文献   

20.
Clustering problems are central to many knowledge discovery and data mining tasks. However, most existing clustering methods can only work with fixed-dimensional representations of data patterns. In this paper, we study the clustering of data patterns that are represented as sequences or time series possibly of different lengths. We propose a model-based approach to this problem using mixtures of autoregressive moving average (ARMA) models. We derive an expectation-maximization (EM) algorithm for learning the mixing coefficients as well as the parameters of the component models. To address the model selection problem, we use the Bayesian information criterion (BIC) to determine the number of clusters in the data. Experiments are conducted on a number of simulated and real datasets. Results from the experiments show that our method compares favorably with other methods proposed previously by others for similar time series clustering tasks.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号