共查询到20条相似文献,搜索用时 0 毫秒
1.
Generalized linear mixed models (GLMMs) are useful for modelling longitudinal and clustered data, but parameter estimation is very challenging because the likelihood may involve high-dimensional integrals that are analytically intractable. Gauss-Hermite quadrature (GHQ) approximation can be applied but is only suitable for low-dimensional random effects. Based on the Quasi-Monte Carlo (QMC) approximation, a heuristic approach is proposed to calculate the maximum likelihood estimates of parameters in the GLMM. The QMC points scattered uniformly on the high-dimensional integration domain are generated to replace the GHQ nodes. Compared to the GHQ approximation, the proposed method has many advantages such as its affordable computation, good approximation and fast convergence. Comparisons to the penalized quasi-likelihood estimation and Gibbs sampling are made using a real dataset and a simulation study. The real dataset is the salamander mating dataset whose modelling involves six 20-dimensional intractable integrals in the likelihood. 相似文献
2.
Searching for an effective dimension reduction space is an important problem in regression, especially for high-dimensional data such as microarray data. A major characteristic of microarray data consists in the small number of observations n and a very large number of genes p. This “large p, small n” paradigm makes the discriminant analysis for classification difficult. In order to offset this dimensionality problem a solution consists in reducing the dimension. Supervised classification is understood as a regression problem with a small number of observations and a large number of covariates. A new approach for dimension reduction is proposed. This is based on a semi-parametric approach which uses local likelihood estimates for single-index generalized linear models. The asymptotic properties of this procedure are considered and its asymptotic performances are illustrated by simulations. Applications of this method when applied to binary and multiclass classification of the three real data sets Colon, Leukemia and SRBCT are presented. 相似文献
3.
Generalized linear mixed models (GLMMs) have wide applications in practice. Similar to other data analyses, the identification of influential observations that may be potential outliers is an important step beyond estimation in GLMMs. Since the pioneering work of Cook in 1977, deletion measures have been applied to many statistical models for identifying influential observations. However, as this well-known approach is based on the observed-data likelihood, it is very difficult to apply it to developing diagnostic measures for GLMMs due to the complexity of the observed-data likelihood that involves multidimensional integrals. The objective of this article is to develop diagnostic measures for identifying influential observations. Deletion measures are developed on the basis of the conditional expectation of the complete-data log-likelihood at the E-step of a stochastic approximation Markov chain Monte Carlo algorithm. Making use of by-products of the estimation to compute building blocks of the proposed diagnostic measures and activating appropriate approximations, the proposed methods require little additional computation. The performance of the methods is illustrated by an artificial example, a real example, and some simulation studies. 相似文献
4.
Consider stratified data in which Yi1,…,Yini denote real-valued response variables corresponding to the observations from stratum i, i=1,…,m and suppose that Yij follows an exponential family distribution with canonical parameter of the form θij=xijβ+γi. In analyzing data of this type, the stratum-specific parameters are often modeled as random effects; a commonly-used approach is to assume that γ1,…,γm are independent, identically distributed random variables.The purpose of this paper is to consider an alternative approach to defining the random effects, in which the stratum means of the response variable are assumed to be independent and identically distributed, with a distribution not depending on β. It will be shown that inferences about β based on this formulation of the generalized linear mixed model have many desirable properties. For instance, inferences regarding β are less sensitive to the choice of random effects distribution, are less subject to bias from omitted stratum-level covariates and are less affected by separate between- and within-cluster covariate effects. 相似文献
5.
In this study, a model identification instrument to determine the variance component structure for generalized linear mixed models (glmms) is developed based on the conditional Akaike information (cai). In particular, an asymptotically unbiased estimator of the cai (denoted as caicc) is derived as the model selection criterion which takes the estimation uncertainty in the variance component parameters into consideration. The relationship between bias correction and generalized degree of freedom for glmms is also explored. Simulation results show that the estimator performs well. The proposed criterion demonstrates a high proportion of correct model identification for glmms. Two sets of real data (epilepsy seizure count data and polio incidence data) are used to illustrate the proposed model identification method. 相似文献
6.
Luz Marina Rondon Luis Hernando VanegasCristiano Ferraz 《Computational statistics & data analysis》2012,56(3):680-697
Finite population estimation is the overall goal of sample surveys. When information regarding auxiliary variables are available, one may take advantage of general regression estimators (GREG) to improve sample estimates precision. GREG estimators may be derived when the relationship between interest and auxiliary variables is represented by a normal linear model. However, in some cases, such as when estimating class frequencies or counting processes means, Bernoulli or Poisson models are more suitable than linear normal ones. This paper focuses on building regression type estimators under a model-assisted approach, for the general case in which the relationship between interest and auxiliary variables may be suitably described by a generalized linear model. The finite population distribution of the variable of interest is viewed as if generated by a member of the exponential family, which includes Bernoulli, Poisson, gamma and inverse Gaussian distributions, among others. The resulting estimator is a generalized linear model regression estimator (GEREG). Its general form and basic statistical properties are presented and studied analytically and empirically, using Monte Carlo simulation experiments. Three applications are presented in which the GEREG estimator shows better performance than the GREG one. 相似文献
7.
Estimation in generalized linear mixed models for non-Gaussian longitudinal data is often based on maximum likelihood theory, which assumes that the underlying probability model is correctly specified. It is known that the results obtained from these models are not always robust against misspecification of the random-effects structure. Therefore, diagnostic tools for the detection of this misspecification are of the utmost importance. Three diagnostic tests, based on the eigenvalues of the variance-covariance matrices for the fixed-effects parameters estimates, are proposed in the present work. The power and type I error rate of these tests are studied via simulations. A very acceptable performance was observed in many cases, especially for those misspecifications that can have a big impact on the maximum likelihood estimators. 相似文献
8.
Inference in Generalized linear mixed models with multivariate random effects is often made cumbersome by the high-dimensional intractable integrals involved in the marginal likelihood. An inferential methodology based on the marginal pairwise likelihood approach is proposed. This method belonging to the broad class of composite likelihood involves marginal pairs probabilities of the responses which has analytical expression for the probit version of the model, from where we derived those of the logit version. The different results are illustrated with a simulation study and with an analysis of a real data from health-related quality of life. 相似文献
9.
Rositsa B. Dimova Marianthi Markatou Andrew H. Talal 《Computational statistics & data analysis》2011,55(9):2677-2697
In this paper, we derive a small sample Akaike information criterion, based on the maximized loglikelihood, and a small sample information criterion based on the maximized restricted loglikelihood in the linear mixed effects model when the covariance matrix of the random effects is known. Small sample corrected information criteria are proposed for a special case of linear mixed effects models, the balanced random-coefficient model, without assuming the random coefficients covariance matrix to be known. A simulation study comparing the derived criteria and several others for model selection in the linear mixed effects models is presented. We illustrate the behavior of the studied information criteria on real data from a study of subjects coinfected with HIV and Hepatitis C virus. Robustness of the criteria, in terms of the error distributed as a mixture of normal distributions, is also studied. Special attention is given to the behavior of the conditional AIC by Vaida and Blanchard (2005). Among the studied criteria, GIC performs best, while cAIC exhibits poor performance. Because of its inferior performance, as demonstrated in this work, we do not recommend its use for model selection in linear mixed effects models. 相似文献
10.
Collaborative filtering (CF) is a data analysis task appearing in many challenging applications, in particular data mining in Internet and e-commerce. CF can often be formulated as identifying patterns in a large and mostly empty rating matrix. In this paper, we focus on predicting unobserved ratings. This task is often a part of a recommendation procedure. We propose a new CF approach called interlaced generalized linear models (GLM); it is based on a factorization of the rating matrix and uses probabilistic modeling to represent uncertainty in the ratings. The advantage of this approach is that different configurations, encoding different intuitions about the rating process can easily be tested while keeping the same learning procedure. The GLM formulation is the keystone to derive an efficient learning procedure, applicable to large datasets. We illustrate the technique on three public domain datasets. 相似文献
11.
The generalized linear mixed model (GLIMMIX) provides a powerful technique to model correlated outcomes with different types of distributions. The model can now be easily implemented with SAS PROC GLIMMIX in version 9.1. For binary outcomes, linearization methods of penalized quasi-likelihood (PQL) or marginal quasi-likelihood (MQL) provide relatively accurate variance estimates for fixed effects. Using GLIMMIX based on these linearization methods, we derived formulas for power and sample size calculations for longitudinal designs with attrition over time. We found that the power and sample size estimates depend on the within-subject correlation and the size of random effects. In this article, we present tables of minimum sample sizes commonly used to test hypotheses for longitudinal studies. A simulation study was used to compare the results. We also provide a Web link to the SAS macro that we developed to compute power and sample sizes for correlated binary outcomes. 相似文献
12.
The behaviour of single input, single output, continuous time systems sampled with various types of jitter in both measurement and control action is investigated. It is shown that the effects of jitter can be modelled by approximations for the plant dynamics, and additive noise constructed by the modulation of plant signals with the jitter. These approximations give useful insights for digital controller design. 相似文献
13.
Antony M. Overstall Jonathan J. Forster 《Computational statistics & data analysis》2010,54(12):3269-3288
A default strategy for fully Bayesian model determination for generalised linear mixed models (GLMMs) is considered which addresses the two key issues of default prior specification and computation. In particular, the concept of unit-information priors is extended to the parameters of a GLMM. A combination of Markov chain Monte Carlo (MCMC) and Laplace approximations is used to compute approximations to the posterior model probabilities to find a subset of models with high posterior model probability. Bridge sampling is then used on the models in this subset to approximate the posterior model probabilities more accurately. The strategy is applied to four examples. 相似文献
14.
Lihua An Sévérien Nkurunziza Daniel Krewski 《Computational statistics & data analysis》2009,53(7):2537-2549
We propose a James-Stein-type shrinkage estimator for the parameter vector in a general linear model when it is suspected that some of the parameters may be restricted to a subspace. The James-Stein estimator is shown to demonstrate asymptotically superior risk performance relative to the conventional least squares estimator under quadratic loss. An extensive simulation study based on a multiple linear regression model and a logistic regression model further demonstrates the improved performance of this James-Stein estimator in finite samples. The application of this new estimator is illustrated using Ontario newborn infants data spanning four fiscal years. 相似文献
15.
采用PIMS软件中的多周期混合整数规划技术建立炼油企业购油计划模型,使优化结果与实际购油方式相吻合;采用虚拟周期方法解决原油期末库存质量控制问题;采用滚动处理方式解决炼厂月、季原油选购计划的衔接和全局优化问题。文中还给出了多周期MIP模型技术在某炼厂中的应用以及不同方案的效益对比。 相似文献
16.
In many situations, data follow a generalized partly linear model in which the mean of the responses is modeled, through a link function, linearly on some covariates and nonparametrically on the remaining ones. A new class of robust estimates for the smooth function η, associated to the nonparametric component, and for the parameter , related to the linear one, is defined. The robust estimators are based on a three-step procedure, where large values of the deviance or Pearson residuals are bounded through a score function. These estimators allow us to make easier inferences on the regression parameter and also improve computationally those based on a robust profile likelihood approach. The resulting estimates of turn out to be root-n consistent and asymptotically normally distributed. Besides, the empirical influence function allows us to study the sensitivity of the estimators to anomalous observations. A robust Wald test for the regression parameter is also provided. Through a Monte Carlo study, the performance of the robust estimators and the robust Wald test is compared with that of the classical ones. 相似文献
17.
Hossein Baghishani Mohsen Mohammadzadeh 《Computational statistics & data analysis》2011,55(4):1748-1759
Non-Gaussian spatial data are common in many sciences such as environmental sciences, biology and epidemiology. Spatial generalized linear mixed models (SGLMMs) are flexible models for modeling these types of data. Maximum likelihood estimation in SGLMMs is usually made cumbersome due to the high-dimensional intractable integrals involved in the likelihood function and therefore the most commonly used approach for estimating SGLMMs is based on the Bayesian approach. This paper proposes a computationally efficient strategy to fit SGLMMs based on the data cloning (DC) method suggested by Lele et al. (2007). This method uses Markov chain Monte Carlo simulations from an artificially constructed distribution to calculate the maximum likelihood estimates and their standard errors. In this paper, the DC method is adapted and generalized to estimate SGLMMs and some of its asymptotic properties are explored. Performance of the method is illustrated by a set of simulated binary and Poisson count data and also data about car accidents in Mashhad, Iran. The focus is inference in SGLMMs for small and medium data sets. 相似文献
18.
A Bayesian approach to variable selection which is based on the expected Kullback-Leibler divergence between the full model and its projection onto a submodel has recently been suggested in the literature. For generalized linear models an extension of this idea is proposed by considering projections onto subspaces defined via some form of L1 constraint on the parameter in the full model. This leads to Bayesian model selection approaches related to the lasso. In the posterior distribution of the projection there is positive probability that some components are exactly zero and the posterior distribution on the model space induced by the projection allows exploration of model uncertainty. Use of the approach in structured variable selection problems such as ANOVA models is also considered, where it is desired to incorporate main effects in the presence of interactions. Projections related to the non-negative garotte are able to respect the hierarchical constraints. A consistency result is given concerning the posterior distribution on the model induced by the projection, showing that for some projections related to the adaptive lasso and non-negative garotte the posterior distribution concentrates on the true model asymptotically. 相似文献
19.
20.
The problem of parameter estimation in linear discrete-time systems with random coefficients is discussed. In particular, the maximum-likelihood estimators and their consistency for the defined structure of the model are derived. The estimators have a structure similar to that of the least square estimators for the linear discrete-time system with constant coefficients 相似文献