首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Our work examines the performance of proposed local influence diagnostics applied to multivariate normal longitudinal data with drop-outs: these diagnostics prove to be ambiguous as they are sensitive not only to the presence of anomalous records, as intended, but also, unfortunately, to the misspecification of the longitudinal covariance structure of the response. We suggest an unambiguous index for detecting covariance misspecification, and recommend that an analyst use this index first to confirm that the covariance structure is well specified before attempting to interpret the influence diagnostics.  相似文献   

2.
Mis-specification of the covariance structure in longitudinal data can result in loss of regression estimation efficiency and in misleading influence diagnostics. Therefore, a rule-of-thumb, even one that is rough, for detecting covariance mis-specification would prove valuable to data analysts. In this paper, we examine two indices for detecting the mis-specification of the covariance structure of longitudinal normal, Poisson or binary responses. Our work shows that the suggested indices prove to be worthwhile when there are no missing time observations; they, however, should be used with caution when there are MAR drop-outs.  相似文献   

3.
Machine learning methods provide a powerful approach for analyzing longitudinal data in which repeated measurements are observed for a subject over time. We boost multivariate trees to fit a novel flexible semi-nonparametric marginal model for longitudinal data. In this model, features are assumed to be nonparametric, while feature-time interactions are modeled semi-nonparametrically utilizing P-splines with estimated smoothing parameter. In order to avoid overfitting, we describe a relatively simple in sample cross-validation method which can be used to estimate the optimal boosting iteration and which has the surprising added benefit of stabilizing certain parameter estimates. Our new multivariate tree boosting method is shown to be highly flexible, robust to covariance misspecification and unbalanced designs, and resistant to overfitting in high dimensions. Feature selection can be used to identify important features and feature-time interactions. An application to longitudinal data of forced 1-second lung expiratory volume (FEV1) for lung transplant patients identifies an important feature-time interaction and illustrates the ease with which our method can find complex relationships in longitudinal data.  相似文献   

4.
This paper presents a proposal based on an evolutionary algorithm to impute missing observations in multivariate data. A genetic algorithm based on the minimization of an error function derived from their covariance matrix and vector of means is presented.All methodological aspects of the genetic structure are presented. An extended explanation of the design of the fitness function is provided. An application example is solved by the proposed method.  相似文献   

5.
Estimation in generalized linear mixed models for non-Gaussian longitudinal data is often based on maximum likelihood theory, which assumes that the underlying probability model is correctly specified. It is known that the results obtained from these models are not always robust against misspecification of the random-effects structure. Therefore, diagnostic tools for the detection of this misspecification are of the utmost importance. Three diagnostic tests, based on the eigenvalues of the variance-covariance matrices for the fixed-effects parameters estimates, are proposed in the present work. The power and type I error rate of these tests are studied via simulations. A very acceptable performance was observed in many cases, especially for those misspecifications that can have a big impact on the maximum likelihood estimators.  相似文献   

6.
Semiparametric models are becoming increasingly attractive for longitudinal data analysis. Often there is lack of knowledge of the covariance structure of the response variable. Although it is still possible to obtain consistent estimators for both parametric and nonparametric components of a semipatrametric model by assuming an identity structure for the covariance matrix, the resulting estimators may not be efficient. We conducted extensive simulation studies to investigate the impact of an unknown covariance structure on estimators in semiparametric models for longitudinal data. In some situations the loss of efficiency could be substantial. A two-step estimator is thus proposed to improve the efficiency. Our study was motivated by a population based data analysis to examine the temporal relationship between systolic blood pressure and urinary albumin excretion.  相似文献   

7.
The perturbation theory of an eigenvalue problem provides a useful tool for the sensitivity analysis in principal component analysis (PCA). However, single-perturbation diagnostics can suffer from masking effects. In this paper, we develop the pair-perturbation influence functions for the eigenvalues and eigenvectors of covariance matrices utilized in PCA to uncover the masked influential points. The relationship between the empirical pair-perturbation influence function and local influence in pairs is also investigated. Moreover, we propose an approach for determining cut points for influence function values in PCA, which has not been addressed yet. A simulation study and a specific data example are provided to illustrate the application of these approaches.  相似文献   

8.
The perturbation theory of an eigenvalue problem provides a useful tool for the sensitivity analysis in principal component analysis (PCA). However, single-perturbation diagnostics can suffer from masking effects. In this paper, we develop the pair-perturbation influence functions for the eigenvalues and eigenvectors of covariance matrices utilized in PCA to uncover the masked influential points. The relationship between the empirical pair-perturbation influence function and local influence in pairs is also investigated. Moreover, we propose an approach for determining cut points for influence function values in PCA, which has not been addressed yet. A simulation study and a specific data example are provided to illustrate the application of these approaches.  相似文献   

9.
Semiparametric methods for longitudinal data with dependence within subjects have recently received considerable attention. Existing approaches that focus on modeling the mean structure require a correct specification of the covariance structure as misspecified covariance structures may lead to inefficient or biased mean parameter estimates. Besides, computation and estimation problems arise when the repeated measurements are taken at irregular and possibly subject-specific time points, the dimension of the covariance matrix is large, and the positive definiteness of the covariance matrix is required. In this article, we propose a profile kernel approach based on semiparametric partially linear regression models for the mean and model covariance structures simultaneously, motivated by the modified Cholesky decomposition. We also study the large-sample properties of the parameter estimates. The proposed method is evaluated through simulation and applied to a real dataset. Both theoretical and empirical results indicate that properly taking into account the within-subject correlation among the responses using our method can substantially improve efficiency.  相似文献   

10.
Random effects in generalized linear mixed models (GLMM) are used to explain the serial correlation of the longitudinal categorical data. Because the covariance matrix is high dimensional and should be positive definite, its structure is assumed to be constant over subjects and to be restricted such as AR(1) structure. However, these assumptions are too strong and can result in biased estimates of the fixed effects. In this paper we propose a Bayesian modeling for the GLMM with regression models for parameters of the random effects covariance matrix using a moving average Cholesky decomposition which factors the covariance matrix into moving average (MA) parameters and IVs. We analyze lung cancer data using our proposed model.  相似文献   

11.
为利用过程数据实时监控模型预测控制(Model predictive control, MPC)的性能, 提出一种基于协方差预测残差的性能监控方法.首先在分析模型预测控制器优 化函数和控制结构的基础上, 构造包含预测误差、控制量和过程输出的监控变量集, 然后利用滑动时间窗口建立基于协方差的实时性能评价 指标.针对协方差指标缺少控制限的问题, 建立实时协方差指标的时间序列模型, 根据协方差指标的预测残差检测模型预测控制性能下降.进 一步利用基于数据集相似度的性能诊断方法确定性能恶化源.最后通过Wood-Berry二元精馏塔上的仿真研究验证了所提方法的有效性.  相似文献   

12.
A general linear least-squares estimation problem is considered. It is shown how the optimal filters for filtering and smoothing can be recursively and efficiently calculated under certain structural assumptions about the covariance functions involved. This structure is related to an index known as the displacement rank, which is a measure of non-Toeplitzness of a covariance kernel. When a state space type structure is added, it is shown how the Chandrasekhar equations for determining the gain of the Kalman-Bucy filter can be derived directly from the covariance function information; thus we are able to imbed this class of state-space problems into a general input-output framework.  相似文献   

13.
Principal Component Analysis (PCA) is an important tool in multivariate analysis, in particular when faced with high dimensional data. There has been much done with regard to sensitivity analysis and the development of influence diagnostics for the eigenvector estimators that define the sample principal components. However, little, if any, has been done in this setting with regard to the sample principal components themselves. In this paper we develop a sensitivity measure for principal components associated with the covariance matrix that is very much related to the influence function (Hampel, 1974). This influence measure is based on the average squared canonical correlation and differs from the existing measures in that it assesses the influence of certain observational types on the sample principal components. We use this measure to derive an influence diagnostic that satisfies two key criteria being (i) it detects influential observations with respect to subsets of sample principal components and (ii) is efficient to calculate even in high dimensions. We use several microarray datasets to show that our measure satisfies both criteria.  相似文献   

14.
Neural nets' usefulness for forecasting is limited by problems of overfitting and the lack of rigorous procedures for model identification, selection and adequacy testing. This paper describes a methodology for neural model misspecification testing. We introduce a generalization of the Durbin-Watson statistic for neural regression and discuss the general issues of misspecification testing using residual analysis. We derive a generalized influence matrix for neural estimators which enables us to evaluate the distribution of the statistic. We deploy Monte Carlo simulation to compare the power of the test for neural and linear regressors. While residual testing is not a sufficient condition for model adequacy, it is nevertheless a necessary condition to demonstrate that the model is a good approximation to the data generating process, particularly as neural-network estimation procedures are susceptible to partial convergence. The work is also an important step toward developing rigorous procedures for neural model identification, selection and adequacy testing which have started to appear in the literature. We demonstrate its applicability in the nontrivial problem of forecasting implied volatility innovations using high-frequency stock index options. Each step of the model building process is validated using statistical tests to verify variable significance and model adequacy with the results confirming the presence of nonlinear relationships in implied volatility innovations  相似文献   

15.
Recursive algorithrms for the solution of linear least-squares estimation problems have been based mainly on state-space models. It has been known, however, that recursive Levinson-Whittle-Wiggins-Robinson (LWR) algorithms exist for stationary time-series, using only input-output information (i.e, covariance matrices). By introducing a way of classifying stochastic processes in terms of an "index of nonstationarity" we derive extended LWR algorithms for nonstationary processes We show also how adding state-space structure to the covariance matrix allows us to specialize these general results to state-space type estimation algorithms. In particular, the Chandrasekhar equations are shown to be natural descendants of the extended LWR algorithm.  相似文献   

16.
Missingness frequently complicates the analysis of longitudinal data. A popular solution for dealing with incomplete longitudinal data is the use of likelihood-based methods, when, for example, linear, generalized linear, or non-linear mixed models are considered, due to their validity under the assumption of missing at random (MAR). Semi-parametric methods such as generalized estimating equations (GEEs) offer another attractive approach but require the assumption of missing completely at random (MCAR). Weighted GEE (WGEE) has been proposed as an elegant way to ensure validity under MAR. Alternatively, multiple imputation (MI) can be used to pre-process incomplete data, after which GEE is applied (MI-GEE). Focusing on incomplete binary repeated measures, both methods are compared using the so-called asymptotic, as well as small-sample, simulations, in a variety of correctly specified as well as incorrectly specified models. In spite of the asymptotic unbiasedness of WGEE, results provide striking evidence that MI-GEE is both less biased and more accurate in the small to moderate sample sizes which typically arise in clinical trials.  相似文献   

17.
The interpretation of generative, discriminative and hybrid approaches to classification is discussed, in particular for the generative–discriminative tradeoff (GDT), a hybrid approach. The asymptotic efficiency of the GDT, relative to that of its generative or discriminative counterpart, is presented theoretically and, by using linear normal discrimination as an example, numerically. On real and simulated datasets, the classification performance of the GDT is compared with those of normal-based linear discriminant analysis (LDA) and linear logistic regression (LLR). Four arguments are made as follows. First, the GDT is a generative model integrating both discriminative and generative learning. It is therefore subject to model misspecification of the data-generating process and hindered by complex optimisation. Secondly, among the three approaches being compared, the asymptotic efficiency of the GDT is higher than that of the discriminative approach but lower than that of the generative approach, when no model misspecification occurs. Thirdly, without model misspecification, LDA performs the best; with model misspecification, LLR or the GDT with an optimal, large weight on its discriminative component may perform the best. Finally, LLR is affected by the imbalance between groups of data.  相似文献   

18.
Two approaches are presented to perform principal component analysis (PCA) on data which contain both outlying cases and missing elements. At first an eigendecomposition of a covariance matrix which can deal with such data is proposed, but this approach is not fit for data where the number of variables exceeds the number of cases. Alternatively, an expectation robust (ER) algorithm is proposed so as to adapt the existing methodology for robust PCA to data containing missing elements. According to an extensive simulation study, the ER approach performs well for all data sizes concerned. Using simulations and an example, it is shown that by virtue of the ER algorithm, the properties of the existing methods for robust PCA carry through to data with missing elements.  相似文献   

19.
A modeling paradigm is proposed for covariate, variance and working correlation structure selection for longitudinal data analysis. Appropriate selection of covariates is pertinent to correct variance modeling and selecting the appropriate covariates and variance function is vital to correlation structure selection. This leads to a stepwise model selection procedure that deploys a combination of different model selection criteria. Although these criteria find a common theoretical root based on approximating the Kullback-Leibler distance, they are designed to address different aspects of model selection and have different merits and limitations. For example, the extended quasi-likelihood information criterion (EQIC) with a covariance penalty performs well for covariate selection even when the working variance function is misspecified, but EQIC contains little information on correlation structures. The proposed model selection strategies are outlined and a Monte Carlo assessment of their finite sample properties is reported. Two longitudinal studies are used for illustration.  相似文献   

20.
This article describes a new software for modeling correlated binary data based on orthogonalized residuals, a recently developed estimating equations approach that includes, as a special case, alternating logistic regressions. The software is flexible with respect to fitting in that the user can choose estimating equations for association models based on alternating logistic regressions or orthogonalized residuals, the latter choice providing a non-diagonal working covariance matrix for second moment parameters providing potentially greater efficiency. Regression diagnostics based on this method are also implemented in the software. The mathematical background is briefly reviewed and the software is applied to medical data sets.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号