首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Binary outcomes are very common in medical studies. Logistic regression is typically used to analyze independent binary outcomes while generalized estimating equations regression methods (GEE) are often used to analyze correlated binary data. Several goodness-of-fit (GoF) statistics for the GEE methods have been developed recently. The objective of this study is to compare the power and Type I error rates of existing GEE GoF statistics using simulated data under different conditions. The number of clusters was varied in each condition. Different tested models included discrete, continuous, observation-specific and/or cluster-specific covariates. Two or three observations per cluster were generated with various correlations between observations.No single GEE GoF statistic performed best across all conditions. Generally, the larger the number of clusters, the more powerful the GEE GoF statistics. The GEE GoF statistics with correctly specified working correlation matrices tended to be robust in terms of Type I error rates and more powerful. For data with two observations per cluster, both Evans and Pan's statistics [1998. Goodness of fit in two models for clustered binary data. Ph.D. Dissertation, University of Massachusetts; 2002a. Goodness-of-fit tests for GEE with correlated binary data. Scand. J. Stat. 29(1), 101–110.] and Barnhart–Williamson's statistics [1998. Goodness-of-fit tests for GEE modeling with binary data. Biometrics 54, 720–729.] performed well for detecting the effect of the omitted interaction between two binary covariates. Barnhart–Williamson's statistics were generally the most powerful for detecting other types of interactions in models with at least one continuous covariate. For data with three observations per cluster, Evans and Pan's statistics performed best.  相似文献   

2.
Longitudinal studies involving categorical responses are extensively applied in many fields of research and are often fitted by the generalized estimating equations (GEE) approach and generalized linear mixed models (GLMMs). The assessment of model fit is an important issue for model inference. The purpose of this article is to extend Pan’s (2002a) goodness-of-fit tests for GEE models with longitudinal binary data to the tests for logistic proportional odds models with longitudinal ordinal data. Two proposed methods based on Pearson chi-squared test and unweighted sum of residual squares are developed, and the approximate expectations and variances of the test statistics are easily computed. Four major variants of working correlation structures, independent, AR(1), exchangeable and unspecified, are considered to estimate the variances of the proposed test statistics. Simulation studies in terms of type I error rate and the power performance of the proposed tests are presented for various sample sizes. Furthermore, the approaches are demonstrated by two real data sets.  相似文献   

3.
Many studies in biomedical fields are carried out using diagnoses reported by different raters to evaluate the agreement of multiple ratings. The most popular indices of agreement are kappa measures including Cohen’s kappa and weighted kappa for binary and ordinal outcomes, respectively. However, when raters assess the same observation on two or more occasions, these ratings are dependent and so the correlation between kappa estimates must be considered when making inferences. In this paper, we focus on testing the equality of correlated kappa coefficients using the generalized estimating equation (GEE) approach and applying quasi-likelihood under the independence model criterion (QIC) measures for model selection. Simulation studies are conducted to compare the performance between GEE with and without QIC measures, weighted least squares (WLS) and independence approaches for binary and ordinal data. Two applications are illustrated: a comparison of two methods for assessing cervical ectopy, and similarity in myopic status for monozygous twins and dizygous twins. We conclude that when performing the QIC model-selection procedure in GEE models and taking into account the correlation between kappa measures, it leads to nominal type I errors and larger powers.  相似文献   

4.
The estimation of correlation parameters has received attention for both its own interest and improvement of the estimation efficiency of mean parameters by the generalized estimating equations (GEE) approach. Many of the well-established methods for the estimation of correlation parameters can be constructed under the GEE framework which is, however, sensitive to outliers. In this paper, we consider two ways of constructing robust estimating equations for achieving robust estimation of the correlation parameters. Furthermore, the estimators of the correlation parameters from the robustified GEE may be still biased as the expectation of the estimating equation is biased from zero when the underlying distribution is not symmetric. Therefore, bias-corrected robust estimators of correlation parameters are proposed. The performance of the proposed methods are investigated by simulation. The results show that the proposed robust and bias-corrected robust estimators can reduce the bias successfully. Two real data sets are analyzed for illustration.  相似文献   

5.
Semi-parametrically specified models for multivariate, longitudinal, clustered, multi-level, and other hierarchical data, particularly for non-Gaussian outcomes, are ubiquitous because their parameters can most often be conveniently estimated using the important class of generalized estimating equations (GEE). The focus here is on marginal models, to be understood as models that condition neither on random effects nor on other outcomes, but merely on fixed covariates. In spite of their well-deserved popularity, concern could be raised as to whether such models can always be viewed as a partially specified version of a model with full distributional assumptions, or rather whether such a parent simply does not exist. It is shown, through the use of the hybrid marginal–conditional models, that the answer is affirmative. For conventional GEE with a working correlation structure, the Bahadur model is sometimes considered to be the natural parent candidate, but we show that this is a misconception. The result presented here, which is conceptual in nature, is valid whenever the exponential family is used for the semi-parametric specification, or when a straightforward transformation to an exponential family member is possible, implying validity for broad classes of binary, ordinal, nominal, and count data. The result is illustrated in the context of trivariate binary data. Further, as an illustration, many of the models considered are applied to data from a developmental toxicity study.  相似文献   

6.
The generalized estimating equations (GEE) approach has been widely used to analyze repeated measures data. However, in the absence of likelihood ratio tests, model diagnostic checking tools are not well established for the GEE approach, whereas they are for other likelihood-based approaches. Diagnostic checking tools are essential for determining a model’s goodness of fit, especially for non-normal data. In this paper, we propose simple residual plots to investigate the goodness of fit of the model based on the GEE approach for discrete data. The proposed residual plots are based on the quantile–quantile (QQ) plots of a χ2-distribution, and are particularly useful for comparing several models simultaneously.  相似文献   

7.
Many different robust estimation approaches for the covariance or shape matrix of multivariate data have been established. Tyler’s M-estimator has been recognized as the ‘most robust’ M-estimator for the shape matrix of elliptically symmetric distributed data. Tyler’s M-estimators for location and shape are generalized by taking account of incomplete data. It is shown that the shape matrix estimator remains distribution-free under the class of generalized elliptical distributions. Its asymptotic distribution is also derived and a fast algorithm, which works well even for high-dimensional data, is presented. A simulation study with clean and contaminated data covers the complete-data as well as the incomplete-data case, where the missing data are assumed to be MCAR, MAR, and NMAR.  相似文献   

8.
Previous decision tree algorithms have used Mahalanobis distance for multiple continuous longitudinal response or generalized entropy index for multiple binary responses. However, these methods are limited to either continuous or binary responses. In this paper, we suggest a new tree-based method that can analyze any type of multiple responses by using a statistical approach, called GEE (generalized estimating equations). The value of this new technique is demonstrated with reference to an application using web-usage survey. This work was supported by grant No. R05-2003-000-11281-0 from the Basic Research Program of the Korea Science & Engineering Foundation.  相似文献   

9.
Longitudinal or otherwise correlated categorical variables are typically related to some covariates and exhibit nonignorable correlations of the observed variables. A further complication often consists in missing entries. For analyzing such data, it is proposed to create an extra missing category and to employ latent class analysis which, regarding missing data, can be shown to belong to the family of nonmissing at random models. By treating the complete and the incomplete cases jointly, it becomes possible to estimate the parameters of interest along with additional parameters characterizing the missing mechanism. Data from the Muscatine Coronary Risk Factor Study, where each child was classified obese or not obese at three occasions, serve as an illustrative example. Previous analyses resulted in significant interaction of age and sex for the complete data (N=460), and in a linear increase in the logit of the rate of obesity over time for the incomplete data, with no effect of the covariate sex (N=1014). Reanalyses employing latent class models do not support these findings. The finally accepted two-classes model for the complete data assumes a linear effect of age which is the same for boys and girls. The incomplete data were considered three-categorical (not obese, obese, missing) and resulted in a more complex model only in part supporting the linear age hypothesis.  相似文献   

10.
Use of zero-inflated count data models is common in applications where the number of zero counts exceeds that predicted from a traditional count data model such as Poisson or negative binomial. When count data exhibiting inflated zero counts are correlated among subjects, a natural approach will be to fit a marginal model with the help of generalized estimating equations (GEE) that can incorporate subject-to-subject correlations. A GEE based zero-inflated negative binomial (ZINB) model is proposed to fit clustered counts with excessive zeros. However, the corresponding sandwich variance estimator appears to underestimate the true variance. The theoretical reasons for its failure are explained and a correction under additional modeling assumptions is offered. In addition, a clustered resampling (bootstrap) procedure is proposed to estimate the variance and it is shown that the bootstrap procedure captures the correct variance under no additional model assumptions. Utility of this marginal GEE based ZINB model over two other competing models has been assessed using a thorough simulation study. The resulting inference procedure is applied to study the association between the dental caries and fluoride exposures using a dataset extracted from the Iowa Fluoride Study. A number of risk factors of clinical significance are reliably identified using the proposed model.  相似文献   

11.
In this study, we focus on the click-through rate for the advertising effectiveness to examine the effects of design factors on animated online advertisings. A factorial experiment with repeated measuring was designed to collect a set of serially correlated click-through data. Ad types, positions, animation lengths, and exposure times were considered as the independent factors in this study. The generalized estimating equations (GEE) approach is introduced to the logistic regression models with correlated binary data. A goodness-of-fit statistic, quasi-likelihood information criterion (QIC) for data correlated models will be used for evaluating GEE-constructed models. The results showed a logistic regression model with order effect, two-factor interaction effect of ad types and ad positions, as well as ad positions and animation lengths are statistically significant. In addition, the GEE model with AR(1) correlation structures was well verified by the data.  相似文献   

12.
The widely used proportional odds model is developed for correlated repeated ordinal score data, using a modified version of the generalized estimating equation (GEE) method for model fitting for a range of working correlation models. The algorithm developed estimates the correlation parameter, by minimizing the generalized variance of the regression parameters at each step of the fitting algorithm. Methods for parameter estimation are described for the widely used uniform and first-order autoregressive correlation models, for data potentially recorded at irregularly spaced time intervals. A full implementation of the algorithm (repolr) in the R statistical software package, that both tests the assumption of proportional odds and accommodates missing data, is described and applied to a clinical trial of post-operative treatment, after rupture of the Achilles tendon and a study of patient pain response after hip joint resurfacing.  相似文献   

13.
Generalized linear mixed models are popular for regressing a discrete response when there is clustering, e.g. in longitudinal studies or in hierarchical data structures. It is standard to assume that the random effects have a normal distribution. Recently, it has been examined whether wrongly assuming a normal distribution for the random effects is important for the estimation of the fixed effects parameters. While it has been shown that misspecifying the distribution of the random effects has a minor effect in the context of linear mixed models, the conclusion for generalized mixed models is less clear. Some studies report a minor impact, while others report that the assumption of normality really matters especially when the variance of the random effect is relatively high. Since it is unclear whether the normality assumption is truly satisfied in practice, it is important that generalized mixed models are available which relax the normality assumption. A replacement of the normal distribution with a mixture of Gaussian distributions specified on a grid whereby only the weights of the mixture components are estimated using a penalized approach ensuring a smooth distribution for the random effects is proposed. The parameters of the model are estimated in a Bayesian context using MCMC techniques. The usefulness of the approach is illustrated on two longitudinal studies using R-functions.  相似文献   

14.
《Intelligent Data Analysis》1998,2(1-4):139-160
Current methods to learn Bayesian Networks from incomplete databases share the common assumption that the unreported data are missing at random. This paper describes a method—called Bound and Collapse (BC)—to learn Bayesian Networks from incomplete databases which allows the analyst to efficiently integrate information provided by the observed data and exogenous knowledge about the pattern of missing data. BC starts by bounding the set of estimates consistent with the available information and then collapses the resulting set to a point estimate via a convex combination of the extreme points, with weights depending on the assumed pattern of missing data. Experiments comparing BC to Gibbs Sampling are provided.  相似文献   

15.
Multivariate extensions of well-known linear mixed-effects models have been increasingly utilized in inference by multiple imputation in the analysis of multilevel incomplete data. The normality assumption for the underlying error terms and random effects plays a crucial role in simulating the posterior predictive distribution from which the multiple imputations are drawn. The plausibility of this normality assumption on the subject-specific random effects is assessed. Specifically, the performance of multiple imputation created under a multivariate linear mixed-effects model is investigated on a diverse set of incomplete data sets simulated under varying distributional characteristics. Under moderate amounts of missing data, the simulation study confirms that the underlying model leads to a well-calibrated procedure with negligible biases and actual coverage rates close to nominal rates in estimates of the regression coefficients. Estimation quality of the random-effect variance and association measures, however, are negatively affected from both the misspecification of the random-effect distribution and number of incompletely-observed variables. Some of the adverse impacts include lower coverage rates and increased biases.  相似文献   

16.
Researchers and practitioners who use databases usually feel that it is cumbersome in knowledge discovery or application development due to the issue of missing data. Though some approaches can work with a certain rate of incomplete data, a large portion of them demands high data quality with completeness. Therefore, a great number of strategies have been designed to process missingness particularly in the way of imputation. Single imputation methods initially succeeded in predicting the missing values for specific types of distributions. Yet, the multiple imputation algorithms have maintained prevalent because of the further promotion of validity by minimizing the bias iteratively and less requirement on prior knowledge to the distributions. This article carefully reviews the state of the art and proposes a hybrid missing data completion method named Multiple Imputation using Gray-system-theory and Entropy based on Clustering (MIGEC). Firstly, the non-missing data instances are separated into several clusters. Then, the imputed value is obtained after multiple calculations by utilizing the information entropy of the proximal category for each incomplete instance in terms of the similarity metric based on Gray System Theory (GST). Experimental results on University of California Irvine (UCI) datasets illustrate the superiority of MIGEC to other current achievements on accuracy for either numeric or categorical attributes under different missing mechanisms. Further discussion on real aerospace datasets states MIGEC is also applicable for the specific area with both more precise inference and faster convergence than other multiple imputation methods in general.  相似文献   

17.
《Artificial Intelligence》2001,125(1-2):209-226
Naive Bayes classifiers provide an efficient and scalable approach to supervised classification problems. When some entries in the training set are missing, methods exist to learn these classifiers under some assumptions about the pattern of missing data. Unfortunately, reliable information about the pattern of missing data may be not readily available and recent experimental results show that the enforcement of an incorrect assumption about the pattern of missing data produces a dramatic decrease in accuracy of the classifier. This paper introduces a Robust Bayes Classifier (rbc) able to handle incomplete databases with no assumption about the pattern of missing data. In order to avoid assumptions, the rbc bounds all the possible probability estimates within intervals using a specialized estimation method. These intervals are then used to classify new cases by computing intervals on the posterior probability distributions over the classes given a new case and by ranking the intervals according to some criteria. We provide two scoring methods to rank intervals and a decision theoretic approach to trade off the risk of an erroneous classification and the choice of not classifying unequivocally a case. This decision theoretic approach can also be used to assess the opportunity of adopting assumptions about the pattern of missing data. The proposed approach is evaluated on twenty publicly available databases.  相似文献   

18.
In the statistics literature, a number of procedures have been proposed for testing equality of several groups’ covariance matrices when data are complete, but this problem has not been considered for incomplete data in a general setting. This paper proposes statistical tests for equality of covariance matrices when data are missing. A Wald test (denoted by T1), a likelihood ratio test (LRT) (denoted by R), based on the assumption of normal populations are developed. It is well-known that for the complete data case the classic LRT and the Wald test constructed under the normality assumption perform poorly in instances when data are not from multivariate normal distributions. As expected, this is also the case for the incomplete data case and therefore has led us to construct a robust Wald test (denoted by T2) that performs well for both normal and non-normal data. A re-scaled LRT (denoted by R*) is also proposed. A simulation study is carried out to assess the performance of T1, T2, R, and R* in terms of closeness of their observed significance level to the nominal significance level as well as the power of these tests. It is found that T2 performs very well for both normal and non-normal data in both small and large samples. In addition to its usual applications, we have discussed the application of the proposed tests in testing whether a set of data are missing completely at random (MCAR).  相似文献   

19.
As more and more real time spatio-temporal datasets become available at increasing spatial and temporal resolutions, the provision of high quality, predictive information about spatio-temporal processes becomes an increasingly feasible goal. However, many sensor networks that collect spatio-temporal information are prone to failure, resulting in missing data. To complicate matters, the missing data is often not missing at random, and is characterised by long periods where no data is observed. The performance of traditional univariate forecasting methods such as ARIMA models decreases with the length of the missing data period because they do not have access to local temporal information. However, if spatio-temporal autocorrelation is present in a space–time series then spatio-temporal approaches have the potential to offer better forecasts. In this paper, a non-parametric spatio-temporal kernel regression model is developed to forecast the future unit journey time values of road links in central London, UK, under the assumption of sensor malfunction. Only the current traffic patterns of the upstream and downstream neighbouring links are used to inform the forecasts. The model performance is compared with another form of non-parametric regression, K-nearest neighbours, which is also effective in forecasting under missing data. The methods show promising forecasting performance, particularly in periods of high congestion.  相似文献   

20.
We state and prove a theorem that asserts the asymptotic stability and provides an estimate of the region of attraction of an equilibrium point of the swing equations. A version of this theorem was originally introduced by Willems; however, his justifications are sketchy and, in our opinion, require additional analysis. All the Liapunov method analyses utilized in the transient stability problem of power systems have been based on the assumed validity of this theorem. The shortcomings of the other proposed proofs in the literature is that they rely on available Liapunov theorems for which an assumption can not be verified. In contrast, the techniques used in the proof here avoid the restriction of the available Liapunov theorems; moreover, they can be extended to apply to a broader class of systems.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号