首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 156 毫秒
1.
A two-component parametric mixture is proposed to model survival after an invasive treatment, when patients may experience different hazards regimes: a risk of early mortality directly related to the treatment and/or the treated condition, and a risk of late death influenced by several exogenous factors. The parametric mixture is based on Weibull distributions for both components. Different sets of covariates can affect the Weibull scale parameters and the probability of belonging to one of the two latent classes. A logarithmic function is used to link explanatory variables to scale parameters while a logistic link is assumed for the probability of the latent classes. Inference about unknown parameters is developed in a Bayesian framework: point and interval estimates are based on posterior distributions, whereas the Schwarz criterion is used for testing hypotheses. The advantages of the approach are illustrated by analyzing data from an aorta aneurysm study.  相似文献   

2.
Land use changes have a pronounced impact on hydrology. Vice versa, hydrologic changes affect land use patterns. The objective of this study is to test whether hydrologic variables can explain land use change. We employ a set of spatially distributed hydrologic variables and compare it against a set of commonly used explanatory variables for land use change. The explanatory power of these variables is assessed by using a logistic regression approach to model the spatial distribution of land use changes in a meso-scale Indian catchment. When hydrologic variables are additionally included, the accuracies of the logistic regression models improve, which is indicated by a change in the relative operating characteristic statistic (ROC) by up to 11%. This is mostly due to the complementarity of the two datasets that is reflected in the use of 44% commonly used variables and 56% hydrologic variables in the best models for land use change.  相似文献   

3.
马江洪  张文修  梁怡 《计算机学报》2003,26(12):1652-1659
复杂海量数据往往表现为多种结构特征的混合体,回归类混合模型就是对这种混合体的一个描述.该文基于统计学的有限混合分布理论和可识别性的相关结果,针对回归变量的三种情形:(1)解释变量固定,(2)解释变量随机,(3)解释变量固定且类别参数指定,分别讨论挖掘一般回归类的混合模型的可识别性问题,并给出同族回归类混合模型可识别的相应充分条件.这些条件的一个共同特点是它们都与一类特别的解释变量集合有关,而该类集合是由同族的回归函数与回归参数唯一确定的,其元素使不同的回归参数对应回归函数的相同值.特别地,当回归函数线性时,这类集合就是解释变量空间中的超平面.  相似文献   

4.
A novel logistic multi-class supervised classification model based on multi-fractal spectrum parameters is proposed to avoid the error that is caused by the difference between the real data distribution and the hypothetic Gaussian distribution and avoid the computational burden working in the logistic regression classification directly for hyperspectral data. The multi-fractal spectra and parameters are calculated firstly with training samples along the spectral dimension of hyperspectral data. Secondly, the logistic regression model is employed in our work because the logistic regression classification model is a distribution-free nonlinear model which is based on the conditional probability without the Gaussian distribution assumption of the random variables, and the obtained multi-fractal parameters are applied to establish the multi-class logistic regression classification model. Finally, the Newton–Raphson method is applied to estimate the model parameters via the maximum likelihood algorithm. The classification results of the proposed model are compared with the logistic regression classification model based on an adaptive bands selection method by using the Airborne Visible/Infrared Imaging Spectrometer and airborne Push Hyperspectral Imager data. The results illuminate that the proposed approach achieves better accuracy with lower computational cost simultaneously.  相似文献   

5.
To model fuzzy binary observations, a new model named “Fuzzy Logistic Regression” is proposed and discussed in this study. In fact, due to the vague nature of binary observations, no probability distribution can be considered for these data. Therefore, the ordinary logistic regression may not be appropriate. This study attempts to construct a fuzzy model based on possibility of success. These possibilities are defined by some linguistic terms such as …, low, medium, high…. Then, by use of the Extension principle, the logarithm transformation of “possibilistic odds” is modeled based on a set of crisp explanatory variables observations. Also, to estimate parameters in the proposed model, the least squares method in fuzzy linear regression is used. For evaluating the model, a criterion named the “capability index” is calculated. At the end, because of widespread applications of logistic regression in clinical studies and also, the abundance of vague observations in clinical diagnosis, the suspected cases to Systematic Lupus Erythematosus (SLE) disease is modeled based on some significant risk factors to detect the application of the model. The results showed that the proposed model could be a rational substituted model of an ordinary one in modeling the clinical vague status.  相似文献   

6.
Practitioners use Trauma and Injury Severity Score (TRISS) models for predicting the survival probability of an injured patient. The accuracy of TRISS predictions is acceptable for patients with up to three typical injuries, but unacceptable for patients with a larger number of injuries or with atypical injuries. Based on a regression model, the TRISS methodology does not provide the predictive density required for accurate assessment of risk. Moreover, the regression model is difficult to interpret. We therefore consider Bayesian inference for estimating the predictive distribution of survival. The inference is based on decision tree models which recursively split data along explanatory variables, and so practitioners can understand these models. We propose the Bayesian method for estimating the predictive density and show that it outperforms the TRISS method in terms of both goodness-of-fit and classification accuracy. The developed method has been made available for evaluation purposes as a stand-alone application.  相似文献   

7.
This study focuses on the one of the most critical issues of modeling under severe conditions of uncertainty: determining the relative importance (weight) of the explanatory variables. The ability to determine relative importance of explanatory variables and the reliability of such outcome are of utmost importance to the decision makers, who utilize such models as components of decision support or decision making. We compare the reliability of traditional method multiple linear regression versus fuzzy logic‐based soft regression. We provide a case study (cross‐national model of background factors facilitating economic growth) to illustrate the performance of both methods. We conclude that soft regression is definitely more reliable and consistent tool to determine relative importance of explanatory variables.  相似文献   

8.
A flexible Bayesian approach to a generalized linear model is proposed to describe the dependence of binary data on explanatory variables. The inverse of the exponential power cumulative distribution function is used as the link to the binary regression model. The exponential power family provides distributions with both lighter and heavier tails compared to the normal distribution, and includes the normal and an approximation to the logistic distribution as particular cases. The idea of using a data augmentation framework and a mixture representation of the exponential power distribution is exploited to derive efficient Gibbs sampling algorithms for both informative and noninformative settings. Some examples are given to illustrate the performance of the proposed approach when compared with other competing models.  相似文献   

9.
As biometric authentication systems become more prevalent, it is becoming increasingly important to evaluate their performance. This paper introduces a novel statistical method of performance evaluation for these systems. Given a database of authentication results from an existing system, the method uses a hierarchical random effects model, along with Bayesian inference techniques yielding posterior predictive distributions, to predict performance in terms of error rates using various explanatory variables. By incorporating explanatory variables as well as random effects, the method allows for prediction of error rates when the authentication system is applied to potentially larger and/or different groups of subjects than those originally documented in the database. We also extend the model to allow for prediction of the probability of a false alarm on a "watch-list" as a function of the list size. We consider application of our methodology to three different face authentication systems: a filter-based system, a Gaussian mixture model (GMM)-based system, and a system based on frequency domain representation of facial asymmetry  相似文献   

10.
In this paper a FORTRAN program is presented for multivariate survival or life table regression analysis in a competing risks' situation. The relevant failure rate (for example, a particular disease or mortality rate) is modelled as a log-linear function of a vector of (possibly time-dependent) explanatory variables. The explanatory variables may also include the variable time itself, which is useful for parameterizing piecewise exponential time-to-failure distributions in a Gompertz-like or Weibull-like way as a more efficient alternative to Cox's proportional hazards model. Maximum likelihood estimates of the coefficients of the log-linear relationship are obtained from the iterative Newton-Raphson method. The program runs on a personal computer under DOS; running time is quite acceptable, even for large samples.  相似文献   

11.
The purpose of this study was to develop methods for exceedance probability estimation in the case of highly scattered measurement sets. The situation may occur when product quality is verified with several test samples, and thus, traditional point prediction based modelling methods are not sufficient.Density forecasting methods are needed when not only the mean but also the deviance and the distribution shape of the response depend on the explanatory variables. Furthermore, with probability predictors, the ranking methods for the model selection should be chosen carefully, when models trained with different methods are compared.In this article, the impact toughness of the steel products was modelled. The rejection probability in Charpy-V quality test was predicted with mean and deviation models, distribution shape model and quantile regression model. The proposed methods were employed in two steel manufacturing applications with different distributional properties.  相似文献   

12.
Variable selection for Poisson regression when the response variable is potentially underreported is considered. A logistic regression model is used to model the latent underreporting probabilities. An efficient MCMC sampling scheme is designed, incorporating uncertainty about which explanatory variables affect the dependent variable and which affect the underreporting probabilities. Validation data is required in order to identify and estimate all parameters. A simulation study illustrates favorable results both in terms of variable selection and parameter estimation. Finally, the procedure is applied to a real data example concerning deaths from cervical cancer.  相似文献   

13.
Operator reliability in complex systems is influenced by various performance shaping factors (PSFs). Time is a particularly important PSF; however, empirical studies of human reliability analysis (HRA) are rarely focused on modeling the effect of time PSF on human error probability (HEP). This study contributes to HRA literature by investigating the empirical relationship between time margin and HEP. Time margin is defined as the difference between the time available to complete a task and the time required to successfully complete the task, divided by the required time. We investigate and compare two models (logistic and linear) to explain HEP based on time margin. The empirical HEP data for model testing were extracted from a microworld simulator (Study 1) and a full-scope simulator (Study 2) in two existing studies relevant to procedural tasks in nuclear power plants. For Study 1, both models exhibited an acceptable, equivalent explanatory power; for Study 2, although both models exhibited an acceptable explanatory ability, the logistic model explained more variance in HEP. Our findings indicate the potential of the logistic model in explaining and predicting HEP based on time margin in time-critical tasks.  相似文献   

14.
Methods for estimating the parameters of the logistic regression model when the data are collected using a case-control (retrospective) scheme are compared. The regression coefficients are estimated by maximum likelihood methodology. This leaves the constant term parameter to be estimated. Four methods for estimating this parameter are proposed. The comparison of the four estimators is in two parts. First, they are compared for large samples. This is accomplished via the asymptotic distribution of the estimators. Second, the estimators are compared for small samples. This is conducted via stimulation using 11 logistic models. The estimation of the posterior probability of the response variable being a success (Px), as given by the logistic regression model, when the constant parameter is estimated by each of the four proposed methods is the main focus of this paper. A third concern is the comparison of the logistic discriminant procedures when each of the four methods of estimating the constant parameters is used. In addition, the linear discriminant function procedure is included. This comparison is executed only for small samples via simulation. It was found that when estimating Px, method 1 (which is essentially the MLE) minimizes the expected mean square error. The results were not as clear when the parameter of interest was the constant term itself. The results from the classification comparisons implied that when the logistic model contains mostly (or all) binary regression variables the logistic discriminant procedure using method 1 to estimate the constant term gives minimum expected error rate; otherwise the linear discriminant function gives minimum expected error rate. In the latter case the logistic discriminant procedure (method 1 estimator of the constant term) is approximately as good.  相似文献   

15.
This paper presents an enrollment management model by applying artificial neural network (ANN). The aim of the research, which has been presented in this paper, is to show that ANNs are more successful in predicting than the classical statistical method – regression analysis (logistic regression). Both predictive models, no matter whether they are based on ANNs or logistic regression, offer satisfactory predictive results, and they can offer support in the decision-making process. However, the model based on neural networks shows certain advantages. ANNs demand understanding of functional connection between independent and dependent variables in order to evaluate the model. Also, they adapt easily to related independent variables, without the appearance of the problem of multicollinearity. In contrast to logistic regression, neural networks can recognize the appearance of nonlinearity and interactions in input data, and they can react on time.  相似文献   

16.
The relationship between time evolution of stress and flares in Systemic Lupus Erythematosus patients has recently been studied. Daily stress data can be considered as observations of a single variable for a subject, carried out repeatedly at different time points (functional data). In this study, we propose a functional logistic regression model with the aim of predicting the probability of lupus flare (binary response variable) from a functional predictor variable (stress level). This method differs from the classical approach, in which longitudinal data are considered as observations of different correlated variables. The estimation of this functional model may be inaccurate due to multicollinearity, and so a principal component based solution is proposed. In addition, a new interpretation is made of the parameter function of the model, which enables the relationship between the response and the predictor variables to be evaluated. Finally, the results provided by different logit approaches (functional and longitudinal) are compared, using a sample of Lupus patients.  相似文献   

17.
A fuzzy regression model is developed to construct the relationship between the response and explanatory variables in fuzzy environments. To enhance explanatory power and take into account the uncertainty of the formulated model and parameters, a new operator, called the fuzzy product core (FPC), is proposed for the formulation processes to establish fuzzy regression models with fuzzy parameters using fuzzy observations that include fuzzy response and explanatory variables. In addition, the sign of parameters can be determined in the model-building processes. Compared to existing approaches, the proposed approach reduces the amount of unnecessary or unimportant information arising from fuzzy observations and determines the sign of parameters in the models to increase model performance. This improves the weakness of the relevant approaches in which the parameters in the models are fuzzy and must be predetermined in the formulation processes. The proposed approach outperforms existing models in terms of distance, mean similarity, and credibility measures, even when crisp explanatory variables are used.  相似文献   

18.
The relationship between time evolution of stress and flares in Systemic Lupus Erythematosus patients has recently been studied. Daily stress data can be considered as observations of a single variable for a subject, carried out repeatedly at different time points (functional data). In this study, we propose a functional logistic regression model with the aim of predicting the probability of lupus flare (binary response variable) from a functional predictor variable (stress level). This method differs from the classical approach, in which longitudinal data are considered as observations of different correlated variables. The estimation of this functional model may be inaccurate due to multicollinearity, and so a principal component based solution is proposed. In addition, a new interpretation is made of the parameter function of the model, which enables the relationship between the response and the predictor variables to be evaluated. Finally, the results provided by different logit approaches (functional and longitudinal) are compared, using a sample of Lupus patients.  相似文献   

19.
Classical nonlinear expectile regression has two shortcomings. It is difficult to choose a nonlinear function, and it does not consider the interaction effects among explanatory variables. Therefore, we combine the random forest model with the expectile regression method to propose a new nonparametric expectile regression model: expectile regression forest (ERF). The major novelty of the ERF model is using the bagging method to build multiple decision trees, calculating the conditional expectile of each leaf node in each decision tree, and deriving final results through aggregating these decision tree results via simple average approach. At the same time, in order to compensate for the black box problem in the model interpretation of the ERF model, the measurement of the importance of explanatory variable and the partial dependence is defined to evaluate the magnitude and direction of the influence of each explanatory variable on the response variable. The advantage of ERF model is illustrated by Monte Carlo simulation studies. The numerical simulation results show that the estimation and prediction ability of the ERF model is significantly better than alternative approaches. We also apply the ERF model to analyse the real data. From the nonparametric expectile regression analysis of these data sets, we have several conclusions that are consistent with the results of numerical simulation.  相似文献   

20.
In this paper a robust linear regression method with variable selection is proposed for predicting desirable end-of-line quality variables in complex industrial processes. The development of such prediction models is challenging because there is usually a large pool of candidate explanatory variables, limited sample data, and multicollinearity among explanatory variables. The proposed method is named as the enumerative partial least square based nonnegative garrote regression. It employs partial least square regression in enumerative manner to generate initial model coefficients and then uses a nonnegative garrote method to shrink original coefficients so that irrelevant variables can be eliminated implicitly. Analysis about the advantages of the proposed method is provided compared to existing state-of-art model construction methods. Two simulation examples as well as an industrial application in a local semiconductor factory unit are used to validate the proposed method. These examples witness substantial improvement in terms of accuracy and robustness in variable selection compared to existing methods. Specifically, for the industrial case the percentages of improvement in terms of root mean squared error is up to 24.3% compared with the previous work.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号