首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Regression models are proposed for joint analysis of Poisson and continuous longitudinal data with nonignorable missing values under fully parametric framework. Our primary interest is to evaluate the influence of the covariates on both Poisson and continuous responses. First, we form the full likelihood with complete data using the multivariate Poisson model and conditional multivariate normal distribution and then construct an ECM algorithm to find the maximum likelihood estimates of the model parameters. Then, under the assumption that the missingness mechanisms for the two responses are independent but nonignorable, namely, dependent on both observed and missing data of the two responses, we choose the logit model for the missingness mechanisms and selection model for the full likelihood. Also, we build two implementations of the Monte Carlo EM algorithm for estimating the parameters in the model. Wald test is employed to test the significance of covariates. Finally, we present the results of the Monte Carlo simulation to evaluate the performance of the proposed methodology and an application to the interstitial cystitis data base (ICDB) cohort study. To the best of our knowledge, our model is the first parametric model for joint analysis of Poisson and continuous longitudinal data with nonignorable missing value.  相似文献   

2.
Nonlinear mixed-effects (NLME) models are widely used for longitudinal data analyses. Time-dependent covariates are often introduced to partially explain inter-individual variation. These covariates often have missing data, and the missingness may be nonignorable. Likelihood inference for NLME models with nonignorable missing data in time-varying covariates can be computationally very intensive and may even offer computational difficulties such as nonconvergence. We propose a computationally very efficient method for approximate likelihood inference. The method is illustrated using a real data example.  相似文献   

3.
具有丢失数据的贝叶斯网络结构学习研究   总被引:40,自引:0,他引:40       下载免费PDF全文
王双成  苑森淼 《软件学报》2004,15(7):1042-1048
目前主要基于EM算法和打分-搜索方法进行具有丢失数据的贝叶斯网络结构学习,算法效率较低,而且易于陷入局部最优结构.针对这些问题,建立了一种新的具有丢失数据的贝叶斯网络结构学习方法.首先随机初始化未观察到的数据,得到完整的数据集,并利用完整数据集建立最大似然树作为初始贝叶斯网络结构,然后进行迭代学习.在每一次迭代中,结合贝叶斯网络结构和Gibbs sampling修正未观察到的数据,在新的完整数据集的基础上,基于变量之间的基本依赖关系和依赖分析思想调整贝叶斯网络结构,直到结构趋于稳定.该方法既解决了标准Gi  相似文献   

4.
An Approximate Bayesian Bootstrap (ABB) offers advantages in incorporating appropriate uncertainty when imputing missing data, but most implementations of the ABB have lacked the ability to handle nonignorable missing data where the probability of missingness depends on unobserved values. This paper outlines a strategy for using an ABB to multiply impute nonignorable missing data. The method allows the user to draw inferences and perform sensitivity analyses when the missing data mechanism cannot automatically be assumed to be ignorable. Results from imputing missing values in a longitudinal depression treatment trial as well as a simulation study are presented to demonstrate the method’s performance. We show that a procedure that uses a different type of ABB for each imputed data set accounts for appropriate uncertainty and provides nominal coverage.  相似文献   

5.
An Approximate Bayesian Bootstrap (ABB) offers advantages in incorporating appropriate uncertainty when imputing missing data, but most implementations of the ABB have lacked the ability to handle nonignorable missing data where the probability of missingness depends on unobserved values. This paper outlines a strategy for using an ABB to multiply impute nonignorable missing data. The method allows the user to draw inferences and perform sensitivity analyses when the missing data mechanism cannot automatically be assumed to be ignorable. Results from imputing missing values in a longitudinal depression treatment trial as well as a simulation study are presented to demonstrate the method’s performance. We show that a procedure that uses a different type of ABB for each imputed data set accounts for appropriate uncertainty and provides nominal coverage.  相似文献   

6.
Generalized linear mixed models (GLMM) form a very general class of random effects models for discrete and continuous responses in the exponential family. They are useful in a variety of applications. The traditional likelihood approach for GLMM usually involves high dimensional integrations which are computationally intensive. In this work, we investigate the case of binary outcomes analyzed under a two stage probit normal model with random effects. First, it is shown how ML estimates of the fixed effects and variance components can be computed using a stochastic approximation of the EM algorithm (SAEM). The SAEM algorithm can be applied directly, or in conjunction with a parameter expansion version of EM to speed up the convergence. A procedure is also proposed to obtain REML estimates of variance components and REML-based estimates of fixed effects. Finally an application to a real data set involving a clinical trial is presented, in which these techniques are compared to other procedures (penalized quasi-likelihood, maximum likelihood, Bayesian inference) already available in classical softwares (SAS Glimmix, SAS Nlmixed, WinBUGS), as well as to a Monte Carlo EM (MCEM) algorithm.  相似文献   

7.
A general procedure for fitting growth curves is proposed that can be applied to longitudinal data even if observations are missing or irregularly spaced. Maximum likelihood estimates for mean growths are obtained from an EM algorithm. Estimates for standard errors, percentiles, and growth velocities are also produced. The techniques are demonstrated through the use of growth data from a longitudinal study of sickle cell disease.  相似文献   

8.
This paper deals with maximum likelihood (ML) parameter estimation of continuous-time nonlinear partially observed stochastic systems, via the expectation maximization (EM) algorithm. It is shown that the EM algorithm can be executed efficiently, provided the unnormalized conditional density of nonlinear filtering is either explicitly solvable or numerically implemented. The methodology exploits the relationships between incomplete and complete data, log-likelihood and its gradient  相似文献   

9.
A broad range of studies of preventive measures in infectious diseases gives rise to incidence data from close contact groups. Parameters of common interest in such studies include transmission probabilities and efficacies of preventive or therapeutic interventions. We estimate these parameters using discrete-time likelihood models. We augment the data with unobserved pairwise transmission outcomes and fit the model using the EM algorithm. A linear model derived from the likelihood based on the augmented data and fitted with the iteratively reweighted least squares method is also discussed. Using simulations, we demonstrate the comparable accuracy and lower sensitivity to initial estimates of the proposed methods with data augmentation relative to the likelihood model based solely on the observed data. Two randomized household-based trials of zanamivir, an influenza antiviral agent, are analyzed using the proposed methods.  相似文献   

10.
A broad range of studies of preventive measures in infectious diseases gives rise to incidence data from close contact groups. Parameters of common interest in such studies include transmission probabilities and efficacies of preventive or therapeutic interventions. We estimate these parameters using discrete-time likelihood models. We augment the data with unobserved pairwise transmission outcomes and fit the model using the EM algorithm. A linear model derived from the likelihood based on the augmented data and fitted with the iteratively reweighted least squares method is also discussed. Using simulations, we demonstrate the comparable accuracy and lower sensitivity to initial estimates of the proposed methods with data augmentation relative to the likelihood model based solely on the observed data. Two randomized household-based trials of zanamivir, an influenza antiviral agent, are analyzed using the proposed methods.  相似文献   

11.
Many real-world clustering problems are plagued by incomplete data characterized by missing or absent features for some or all of the data instances. Traditional clustering methods cannot be directly applied to such data without preprocessing by imputation or marginalization techniques. In this article, we overcome this drawback by utilizing a penalized dissimilarity measure which we refer to as the feature weighted penalty based dissimilarity (FWPD). Using the FWPD measure, we modify the traditional k-means clustering algorithm and the standard hierarchical agglomerative clustering algorithms so as to make them directly applicable to datasets with missing features. We present time complexity analyses for these new techniques and also undertake a detailed theoretical analysis showing that the new FWPD based k-means algorithm converges to a local optimum within a finite number of iterations. We also present a detailed method for simulating random as well as feature dependent missingness. We report extensive experiments on various benchmark datasets for different types of missingness showing that the proposed clustering techniques have generally better results compared to some of the most well-known imputation methods which are commonly used to handle such incomplete data. We append a possible extension of the proposed dissimilarity measure to the case of absent features (where the unobserved features are known to be undefined).  相似文献   

12.
Multi-level nonlinear mixed effects (ML-NLME) models have received a great deal of attention in recent years because of the flexibility they offer in handling the repeated-measures data arising from various disciplines. In this study, we propose both maximum likelihood and restricted maximum likelihood estimations of ML-NLME models with two-level random effects, using first order conditional expansion (FOCE) and the expectation–maximization (EM) algorithm. The FOCE–EM algorithm was compared with the most popular Lindstrom and Bates (LB) method in terms of computational and statistical properties. Basal area growth series data measured from Chinese fir (Cunninghamia lanceolata) experimental stands and simulated data were used for evaluation. The FOCE–EM and LB algorithms given the same parameter estimates and fit statistics for models that converged by both. However, FOCE–EM converged for all the models, while LB did not, especially for the models in which two-level random effects are simultaneously considered in several base parameters to account for between-group variation. We recommend the use of FOCE–EM in ML-NLME models, particularly when convergence is a concern in model selection.  相似文献   

13.
In this paper, we consider the recurrent failures of several repairable units, which can only be observed at periodic inspection times. A unit is not aging over the period between a failure and its detection. The failure times are interval censored by the periodic assessment times. The observed data consists of censoring intervals of failure times and the unobserved data are the actual ages of the units at the failure times. We formulate the likelihood function and use several iterative algorithms to find the maximum likelihood estimate (MLE) of the parameters. The complete Expectation–Maximization (EM) algorithm, the EM gradient, full Newton–Raphson (NR), and the Simplex method are used. We derive recursive equations to calculate the expected values required in the algorithms. We estimate the parameters for four failure datasets, assuming that the failures follow a non-homogeneous Poisson process (NHPP). Three datasets are obtained from a hospital for the components of general infusion pump, and the fourth dataset is simulated. Since the estimation could take a long time, we compare the performance of the algorithms in terms of the required number of iterations to converge, the total execution time, and the precision of the estimated parameters. We also use Monte Carlo and Quasi-Monte Carlo simulation as the substitutes for the recursive procedures in the Expectation step of the EM gradient and compare the results.  相似文献   

14.
Missing data is a widespread problem that can affect the ability to use data to construct effective prediction systems. We investigate a common machine learning technique that can tolerate missing values, namely C4.5, to predict cost using six real world software project databases. We analyze the predictive performance after using the k-NN missing data imputation technique to see if it is better to tolerate missing data or to try to impute missing values and then apply the C4.5 algorithm. For the investigation, we simulated three missingness mechanisms, three missing data patterns, and five missing data percentages. We found that the k-NN imputation can improve the prediction accuracy of C4.5. At the same time, both C4.5 and k-NN are little affected by the missingness mechanism, but that the missing data pattern and the missing data percentage have a strong negative impact upon prediction (or imputation) accuracy particularly if the missing data percentage exceeds 40%.  相似文献   

15.
针对聚类问题中的非随机性缺失数据, 本文基于高斯混合聚类模型, 分析了删失型数据期望最大化算法的有效性, 并揭示了删失数据似然函数对模型算法的作用机制. 从赤池弘次信息准则、信息散度等指标, 比较了所提出方法与标准的期望最大化算法的优劣性. 通过删失数据划分及指示变量, 推导了聚类模型参数后验概率及似然函数, 调整了参数截尾正态函数的一阶和二阶估计量. 并根据估计算法的有效性理论, 通过关于得分向量期望的方程得出算法估计的最优参数. 对于同一删失数据集, 所提出的聚类算法对数据聚类中心估计更精准. 实验结果证实了所提出算法在高斯混合聚类的性能上优于标准的随机性缺失数据期望最大化算法.  相似文献   

16.
The current computational power and some recently developed algorithms allow a new automatic spectral analysis method for randomly missing data. Accurate spectra and autocorrelation functions are computed from the estimated parameters of time series models, without user interaction. If only a few data are missing, the accuracy is almost the same as when all observations were available. For larger missing fractions, low-order time series models can still be estimated with a good accuracy if the total observation time is long enough. Autoregressive models are best estimated with the maximum likelihood method if data are missing. Maximum likelihood estimates of moving average and of autoregressive moving average models are not very useful with missing data. Those models are found most accurately if they are derived from the estimated parameters of an intermediate autoregressive model. With statistical criteria for the selection of model order and model type, a completely automatic and numerically reliable algorithm is developed that estimates the spectrum and the autocorrelation function in randomly missing data problems. The accuracy was better than what can be obtained with other methods, including the famous expectation–maximization (EM) algorithm.  相似文献   

17.
On classification with incomplete data   总被引:4,自引:0,他引:4  
We address the incomplete-data problem in which feature vectors to be classified are missing data (features). A (supervised) logistic regression algorithm for the classification of incomplete data is developed. Single or multiple imputation for the missing data is avoided by performing analytic integration with an estimated conditional density function (conditioned on the observed data). Conditional density functions are estimated using a Gaussian mixture model (GMM), with parameter estimation performed using both expectation-maximization (EM) and variational Bayesian EM (VB-EM). The proposed supervised algorithm is then extended to the semisupervised case by incorporating graph-based regularization. The semisupervised algorithm utilizes all available data-both incomplete and complete, as well as labeled and unlabeled. Experimental results of the proposed classification algorithms are shown  相似文献   

18.
The technique of the analysis of statistical characteristics of turbulent fluctuation based on volatility decomposition by using EM-type algorithms is presented. The changes of these characteristics in the case of variations in the conditions of microwave field excitation are analyzed. Good consistency between the results obtained using the EM algorithm and its stochastic modification (SEM algorithm) and actually observed processes in a turbulent plasma is demonstrated.  相似文献   

19.
Nonlinear structural equation models with nonignorable missing outcomes from reproductive dispersion models are proposed to identify the relationship between manifest variables and latent variables in modern educational, medical, social and psychological studies. The nonignorable missing mechanism is specified by a logistic regression model. An EM algorithm is developed to obtain the maximum likelihood estimates of the structural parameters and parameters in the logistic regression model. Assessment of local influence is investigated in nonlinear structural equation models with nonignorable missing outcomes from reproductive dispersion models on the basis of the conditional expectation of the complete-data log-likelihood function. Some local influence diagnostics are obtained via observations of missing data and latent variables that are generated by the Gibbs sampler and Metropolis-Hastings algorithm on the basis of the conformal normal curvature. A simulation study and a real example are used to illustrate the application of the proposed methodologies.  相似文献   

20.
The presence of rounded zeros results in an important drawback for the statistical analysis of compositional data. Data analysis methodology based on log-ratios cannot be applied under these conditions. In this paper rounded zeros are considered as a special kind of missing data. Thus, an EM-type computational algorithm for replacing them is provided. The procedure is based on the additive logistic transformation and assumes an additive logistic normal model for the data. First, the alr transformation moves data from the constrained simplex space to the unconstrained real space. Next, missing transformed data are imputed by using modified EM steps. Last, imputed data are transformed back into the simplex space to obtain a compositional data set free of rounded zeros. Additionally, a sequential strategy is proposed for the case of rounded zeros in all the components of a composition. This work focuses on the algorithm's properties and on computational implementation details. Also, its effectiveness on simulated data sets with a range of detection limits is analyzed. Special attention is paid on the effects on the covariance structure of a compositional data set. Results confirm the good behavior of our proposal. Finally, MATLAB routines implementing the algorithm are made available to the reader.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号