首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 471 毫秒
1.
Different conditional independence specifications for ordinal categorical data are compared by calculating a posterior distribution over classes of graphical models. The approach is based on the multivariate ordinal probit model where the data are considered to have arisen as truncated multivariate normal random vectors. By parameterising the precision matrix of the associated multivariate normal in Cholesky form, ordinal data models corresponding to directed acyclic conditional independence graphs for the latent variables can be specified and conveniently computed. Where one or more of the variables are binary this parameterisation is particularly compelling, as necessary constraints on the latent variable distribution can be imposed in such a way that a standard, fully normalised, prior can still be adopted. For comparing different directed graphical models a reversible jump Markov chain Monte Carlo (MCMC) approach is proposed. Where interest is focussed on undirected graphical models, this approach is augmented to allow switches in the orderings of variables of associated directed graphs, hence allowing the posterior distribution over decomposable undirected graphical models to be computed. The approach is illustrated with several examples, involving both binary and ordinal variables, and directed and undirected graphical model classes.  相似文献   

2.
3.
A spatial latent class analysis model that extends the classic latent class analysis model by adding spatial structure to the latent class distribution through the use of the multinomial probit model is introduced. Linear combinations of independent Gaussian spatial processes are used to develop multivariate spatial processes that are underlying the categorical latent classes. This allows the latent class membership to be correlated across spatially distributed sites and it allows correlation between the probabilities of particular types of classes at any one site. The number of latent classes is assumed to be fixed but is chosen by model comparison via cross-validation. An application of the spatial latent class analysis model is shown using soil pollution samples where 8 heavy metals were measured to be above or below government pollution limits across a 25 square kilometer region. Estimation is performed within a Bayesian framework using MCMC and is implemented using the OpenBUGS software.  相似文献   

4.
Motivated from the stochastic representation of the univariate zero-inflated Poisson (ZIP) random variable, the authors propose a multivariate ZIP distribution, called as Type I multivariate ZIP distribution, to model correlated multivariate count data with extra zeros. The distributional theory and associated properties are developed. Maximum likelihood estimates for parameters of interest are obtained by Fisher’s scoring algorithm and the expectation–maximization (EM) algorithm, respectively. Asymptotic and bootstrap confidence intervals of parameters are provided. Likelihood ratio test and score test are derived and are compared via simulation studies. Bayesian methods are also presented if prior information on parameters is available. Two real data sets are used to illustrate the proposed methods. Under both AIC and BIC, our analysis of the two data sets supports the Type I multivariate zero-inflated Poisson model as a much less complex alternative with feasibility to the existing multivariate ZIP models proposed by Li et al. (Technometrics, 29–38, Vol 41, 1999).  相似文献   

5.
DNA microarrays make it possible to study simultaneously the expression of thousands of genes in a biological sample. Univariate clustering techniques have been used to discover target genes with differential expression between two experimental conditions. Because of possible loss of information due to use of univariate summary statistics, it may be more effective to use multivariate statistics. We present multivariate normal mixture model based clustering analyses to detect differential gene expression between two conditions.Deviating from the general mixture model and model-based clustering, we propose mixture models with specific mean and covariance structures that account for special features of two-condition microarray experiments. Explicit updating formulas in the EM algorithm for three such models are derived. The methods are applied to a real dataset to compare the expression levels of 1176 genes of rats with and without pneumococcal middle-ear infection to illustrate the performance and usefulness of this approach. About 10 genes and 20 genes are found to be differentially expressed in a six-dimensional modeling and a bivariate modeling, respectively. Two simulation studies are conducted to compare the performance of univariate and multivariate methods. Depending on data, neither method can always dominate the other. The results suggest that multivariate normal mixture models can be useful alternatives to univariate methods to detect differential gene expression in exploratory data analysis.  相似文献   

6.
A procedure is presented for finding a discrete approximation to a continuous multivariate density function. It is based on a previously developed algorithm [2] for determining the L1 optimal discrete approximation to a univariate density. Results of approximating continuous bivariate density functions, which represent distributions of the parameters of a pharmacokinetic model, show good agreement between the mean and covariance matrix of the approximated and approximating densities. The distribution of a predicted drug conceptration was also calculated using a continuous density and discrete approximations with both 25 and 81 points. The expected values of the predicted concentration, as well as selected percentile points, obtained using each density are in close agreement.  相似文献   

7.
We introduce a new multivariate statistical process control chart for fault detection using robust statistics and principal component analysis. The proposed approach consists of two main steps. In the first step, a robust covariance matrix is determined using the minimum covariance determinant algorithm. In the second step, an eigen-analysis of the robust correlation matrix is performed to derive the robust control limits of the proposed multivariate chart. Our experimental results illustrate the much better fault detection performance of the proposed method in comparison with existing statistical monitoring and process controlling charts.  相似文献   

8.
Wan  Xiaoji  Li  Hailin  Zhang  Liping  Wu  Yenchun Jim 《The Journal of supercomputing》2022,78(7):9862-9878

A multivariate time series is one of the most important objects of research in data mining. Time and variables are two of its distinctive characteristics that add the complication of the algorithms applied to data mining. Reduction in the dimensionality is often regarded as an effective way to address these issues. In this paper, we propose a method based on principal component analysis (PCA) to effectively reduce the dimensionality. We call it “piecewise representation based on PCA” (PPCA), which segments multivariate time series into several sequences, calculates the covariance matrix for each of them in terms of the variables, and employs PCA to obtain the principal components in an average covariance matrix. The results of the experiments, including retained information analysis, classification, and a comparison of the central processing unit time consumption, demonstrate that the PPCA method used to reduce the dimensionality in multivariate time series is superior to the prior methods.

  相似文献   

9.
We consider the problem of parameter and covariance estimation for multivariate stochastic systems described by regression models with special structure perturbations of unknown covariance. Sufficient conditions of uniformly optimal estimations are obtained for system parameters and covariances. The observation vector distribution family is factorized, and the full sufficient statistics is found under those conditions. Equations for uniformely optimal unbiased estimates of covariance parameters are obtained.  相似文献   

10.
Fuzzy time-series models have been widely applied due to their ability to handle nonlinear data directly and because no rigid assumptions for the data are needed. In addition, many such models have been shown to provide better forecasting results than their conventional counterparts. However, since most of these models require complicated matrix computations, this paper proposes the adoption of a multivariate heuristic function that can be integrated with univariate fuzzy time-series models into multivariate models. Such a multivariate heuristic function can easily be extended and integrated with various univariate models. Furthermore, the integrated model can handle multiple variables to improve forecasting results and, at the same time, avoid complicated computations due to the inclusion of multiple variables.  相似文献   

11.
Recent developments in multivariate volatility modeling suggest that the conditional correlation matrix can be described by a time series recursion, where the total number of parameters grows by the power-of-two of the dimension of financial returns. The power of two computational requirement makes high-dimensional multivariate volatility modeling very time consuming. In this paper, we propose two simplified specifications in a multivariate autoregressive conditional heteroscedasticity model. The first specification computes an unconditional correlation matrix from standardized residuals of the model. The second specification restricts the sum of the weights in a time-varying conditional correlation equation to be one. Applying a Bayesian sampling scheme allows the number of parameters to be reduced from the power of two of the dimension to the linear order of the dimension only and simultaneously provides us a framework for model comparison. We test our simplified specifications using simulated and real data from three sectoral indices in Hong Kong, three market indices and four exchange rates. The results suggest that our simplified specifications are more effective than the original formulation.  相似文献   

12.
Considering latent heterogeneity is of special importance in nonlinear models in order to gauge correctly the effect of explanatory variables on the dependent variable. A stratified model-based clustering approach is adapted for modeling latent heterogeneity in binary panel probit models. Within a Bayesian framework an estimation algorithm dealing with the inherent label switching problem is provided. Determination of the number of clusters is based on the marginal likelihood and a cross-validation approach. A simulation study is conducted to assess the ability of both approaches to determine on the correct number of clusters indicating high accuracy for the marginal likelihood criterion, with the cross-validation approach performing similarly well in most circumstances. Different concepts of marginal effects incorporating latent heterogeneity at different degrees arise within the considered model setup and are directly at hand within Bayesian estimation via MCMC methodology. An empirical illustration of the methodology developed indicates that consideration of latent heterogeneity via latent clusters provides the preferred model specification over a pooled and a random coefficient specification.  相似文献   

13.
In semiparametric regression models, penalized splines can be used to describe complex, non-linear relationships between the mean response and covariates. In some applications it is desirable to restrict the shape of the splines so as to enforce properties such as monotonicity or convexity on regression functions. We describe a method for imposing such shape constraints on penalized splines within a linear mixed model framework. We employ Markov chain Monte Carlo (MCMC) methods for model fitting, using a truncated prior distribution to impose the requisite shape restrictions. We develop a computationally efficient MCMC sampler by using a correspondingly truncated multivariate normal proposal distribution, which is a restricted version of the approximate sampling distribution of the model parameters in an unconstrained version of the model. We also describe a cheap approximation to this methodology that can be applied for shape-constrained scatterplot smoothing. Our methods are illustrated through two applications, the first involving the length of dugongs and the second concerned with growth curves for sitka spruce trees.  相似文献   

14.
The well known latent variable representation of the Bayesian probit regression model due to Albert and Chib (1993) allows model fitting to be performed using a simple Gibbs sampler. In addition, various types of dependence among categorical outcomes not explained by covariate information can be accommodated in a straightforward manner as a result of this latent variable representation of the model. One example of this is the spatial probit regression model for spatially-referenced categorical outcomes. In this setting, commonly used covariance structures for describing residual spatial dependence in the normal linear model setting can be imbedded into the probit regression model. Capturing spatial dependence in this way, however, can negatively impact the performance of MCMC model-fitting algorithms, particularly in terms of mixing and sensitivity to starting values. To address these computational issues, we demonstrate how the non-identifiable spatial variance parameter can be used to create data augmentation MCMC algorithms. We compare the performance of several non-collapsed and partially collapsed data augmentation MCMC algorithms through a simulation study and an analysis of land cover data.  相似文献   

15.
Restricted regression estimation in measurement error models   总被引:1,自引:0,他引:1  
The problem of consistent estimation of the regression coefficients when some prior information about the regression coefficients is available is considered. Such prior information is expressed in the form of exact linear restrictions. The knowledge of covariance matrix of measurement errors that is associated with explanatory variables is used to construct the consistent estimators. Some consistent estimators are suggested which satisfy the exact linear restrictions also. Their asymptotic properties are derived and analytically analyzed under a multivariate ultrastructural model with not necessarily normally distributed measurement errors. The finite sample properties of the estimators are studied through a Monte-Carlo simulation experiment.  相似文献   

16.
李海林 《控制与决策》2015,30(3):441-447
针对高维特性对多元时间序列数据挖掘过程和结果的影响,以及传统主成分分析方法在多元时间序列数据特征表示上的局限性,提出一种基于变量相关性的多元时间序列数据特征表示方法。通过协方差矩阵描述每个多元时间序列的分布特征和变量相关关系,利用主成分分析方法对综合协方差矩阵进行主元分析,进而实现多元时间序列的数据降维和特征表示。实验结果表明,所提出的方法不仅能提高多元时间序列数据挖掘的质量,还可以对不等长多元时间序列进行快速有效的挖掘。  相似文献   

17.
A random effects model is presented to estimate multivariate data of mixed data types. Such data typically appear in studies where different response variables are measured repeatedly for one subject. It is possible to relate normal, binary, multinomial and count data by our joint model. Further flexibility with respect to model specification is obtained by including modern variable selection techniques. Auxiliary mixture sampling leads to a Gibbs sampling type scheme which is easy to implement since no additional tuning is needed. The method is illustrated by transaction data of a costumer cohort acquired by an apparel retailer.  相似文献   

18.
This paper presents a proposal based on an evolutionary algorithm to impute missing observations in multivariate data. A genetic algorithm based on the minimization of an error function derived from their covariance matrix and vector of means is presented.All methodological aspects of the genetic structure are presented. An extended explanation of the design of the fitness function is provided. An application example is solved by the proposed method.  相似文献   

19.
Consistency condi ?ions are given for the estimates of AR-parameters of multivariate ARMA models, obtained by using generalized Yule-Walker equations. The related problem of selecting the model orders by testing the rank of the output covariance matrix is also discussed.  相似文献   

20.
An R package mixAK is introduced which implements routines for a semiparametric density estimation through normal mixtures using the Markov chain Monte Carlo (MCMC) methodology. Besides producing the MCMC output, the package computes posterior summary statistics for important characteristics of the fitted distribution or computes and visualizes the posterior predictive density. For the estimated models, penalized expected deviance (PED) and deviance information criterion (DIC) is directly computed which allows for a selection of mixture components. Additionally, multivariate right-, left- and interval-censored observations are allowed. For univariate problems, the reversible jump MCMC algorithm has been implemented and can be used for a joint estimation of the mixture parameters and the number of mixture components. The core MCMC routines have been implemented in C++ and linked to R to ensure a reasonable computational speed. We briefly review the implemented algorithms and illustrate the use of the package on three real examples of different complexity.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号