首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Different conditional independence specifications for ordinal categorical data are compared by calculating a posterior distribution over classes of graphical models. The approach is based on the multivariate ordinal probit model where the data are considered to have arisen as truncated multivariate normal random vectors. By parameterising the precision matrix of the associated multivariate normal in Cholesky form, ordinal data models corresponding to directed acyclic conditional independence graphs for the latent variables can be specified and conveniently computed. Where one or more of the variables are binary this parameterisation is particularly compelling, as necessary constraints on the latent variable distribution can be imposed in such a way that a standard, fully normalised, prior can still be adopted. For comparing different directed graphical models a reversible jump Markov chain Monte Carlo (MCMC) approach is proposed. Where interest is focussed on undirected graphical models, this approach is augmented to allow switches in the orderings of variables of associated directed graphs, hence allowing the posterior distribution over decomposable undirected graphical models to be computed. The approach is illustrated with several examples, involving both binary and ordinal variables, and directed and undirected graphical model classes.  相似文献   

2.
We propose a model for a point-referenced spatially correlated ordered categorical response and methodology for inference. Models and methods for spatially correlated continuous response data are widespread, but models for spatially correlated categorical data, and especially ordered multi-category data, are less developed. Bayesian models and methodology have been proposed for the analysis of independent and clustered ordered categorical data, and also for binary and count point-referenced spatial data. We combine and extend these methods to describe a Bayesian model for point-referenced (as opposed to lattice) spatially correlated ordered categorical data. We include simulation results and show that our model offers superior predictive performance as compared to a non-spatial cumulative probit model and a more standard Bayesian generalized linear spatial model. We demonstrate the usefulness of our model in a real-world example to predict ordered categories describing stream health within the state of Maryland.  相似文献   

3.
When we have data with missing values, the assumption that data are missing at random is very convenient. It is, however, sometimes questionable because some of the missing values could be strongly related to the underlying true values. We introduce methods for nonignorable multivariate missing data, which assume that missingness is related to the variables in question, and to the additional covariates, through a latent variable measured by the missingness indicators. The methodology developed here is useful for investigating the sensitivity of one’s estimates to untestable assumptions about the missing-data mechanism. A simulation study and data analysis are conducted to evaluate the performance of the proposed method and to compare to that of MAR-based alternatives.  相似文献   

4.
We present a general framework for data analysis and visualization by means of topographic organization and clustering. Imposing distributional assumptions on the assumed underlying latent factors makes the proposed model suitable for both visualization and clustering. The system noise will be modeled in parametric form, as a member of the exponential family of distributions and this allows us to deal with different (continuous or discrete) types of observables in a unified framework. In this paper, we focus on discrete case formulations which, contrary to self organizing methods for continuous data, imply variants of Bregman divergencies as measures of dissimilarity between data and reference points and, also, define the matching nonlinear relation between latent and observable variables. Therefore, the trait variant of the model can be seen as a data-driven noisy nonlinear independent component analysis, which is capable of revealing meaningful structure in the multivariate observable data and visualizing it in two dimensions. The class variant (which performs the clustering) of our model performs data-driven parametric mixture modeling. The combined (trait and class) model along with the associated estimation procedures allows us to interpret the visualization result, in the sense of a topographic ordering. One important application of this work is the discovery of underlying semantic structure in text-based documents. Experimental results on various subsets of the 20-News groups text corpus and binary coded digits data are given by way of demonstration  相似文献   

5.
In unfolding for two-way two-mode preference ratings data, the categorization of the set of individuals while the categories are represented in a low dimensional space may be an advisable procedure to facilitate their understanding. In addition to considering groups of individuals of a similar preference pattern, homogeneous groups of objects are also considered, such that within each group there are clustered objects perceived to have similar attributes. A dual latent class model is proposed for a matrix of preference ratings data, which will partition the individuals and the objects into classes, and simultaneously represent the cluster centers in a low dimensional space, while individuals and objects retain their preference relationship. Both the categories achieved and the unfolding configuration are estimated to be simultaneously optimal, by means of a conditional maximum likelihood estimation procedure, in a simulated annealing framework that enables us to take a statistical decision about the parameters of the model. The adjusted BIC statistic is employed to test the number of mixture components, and the dimensionality of the representation. Real and artificial data sets are analyzed to illustrate the model’s performance.  相似文献   

6.
Computational aspects concerning a model for clustered binary panel data are analyzed. The model is based on the representation of the behavior of a subject (individual panel member) in a given cluster by means of a latent process. This latent process is decomposed into a cluster-specific component and an individual-specific component. The first component follows a first-order Markov chain, whereas the second is time-invariant and is represented by a discrete random variable. An algorithm for computing the joint distribution of the response variables is introduced. The algorithm may be used even in the presence of a large number of subjects in the same cluster. An Expectation-Maximization (EM) scheme for the maximum likelihood estimation of the model is also described together with the estimation of the Fisher information matrix on the basis of the numerical derivative of the score vector. The estimate of this matrix is used to obtain standard errors for the parameter estimates and to check the identifiability of the model and the convergence of the EM algorithm. The approach is illustrated by means of an application to a data set concerning Italian employees’ illness benefits.  相似文献   

7.
Many important science and engineering applications, such as regulating the temperature distribution over a semiconductor wafer and controlling the noise from a photocopy machine, require interpreting distributed data and designing decentralized controllers for spatially distributed systems. Developing effective computational techniques for representing and reasoning about these systems, which are usually modeled with partial differential equations (PDEs), is one of the major challenge problems for qualitative and spatial reasoning research.

This paper introduces a novel approach to decentralized control design, influence-based model decomposition, and applies it in the context of thermal regulation. Influence-based model decomposition uses a decentralized model, called an influence graph, as a key data abstraction representing influences of controls on distributed physical fields. It serves as the basis for novel algorithms for control placement and parameter design for distributed systems with large numbers of coupled variables. These algorithms exploit physical knowledge of locality, linear superposability, and continuity, encapsulated in influence graphs representing dependencies of field nodes on control nodes. The control placement design algorithms utilize influence graphs to decompose a problem domain so as to decouple the resulting regions. The decentralized control parameter optimization algorithms utilize influence graphs to efficiently evaluate thermal fields and to explicitly trade off computation, communication, and control quality. By leveraging the physical knowledge encapsulated in influence graphs, these control design algorithms are more efficient than standard techniques, and produce designs explainable in terms of problem structures.  相似文献   


8.
In the past few years, several papers have presented methods to simulate values of a stationary random function with the same probability distribution and autocorrelation as spatially distributed data actually observed in nature. Moreover, it is possible to have the simulated values generated around existing data and the expected simularity between the two being respected. This capability has not been exploited fully yet and the purpose of this paper is to show how simulated values can help the geologist in solving practical problems. In a first step, a simulation package written at the Ecole Polytechnique of Montreal, POLYSIM2, is described. Then, several examples of application are presented. They are concerned with the estimation of the size of geochemical anomalies, the appraisal of economic ore reserves in an uranium deposit, and finally the evaluation of the fluctuations of the mill-feed grade in a porphyry copper deposit.  相似文献   

9.
This paper presents an approach that partitions data sets of unlabeled binary vectors without a priori information about the number of clusters or the saliency of the features. The unsupervised binary feature selection problem is approached using finite mixture models of multivariate Bernoulli distributions. Using stochastic complexity, the proposed model determines simultaneously the number of clusters in a given data set composed of binary vectors and the saliency of the features used. We conduct different applications involving real data, document classification and images categorization to show the merits of the proposed approach.  相似文献   

10.
Bayesian models are increasingly used to analyze complex multivariate outcome data. However, diagnostics for such models have not been well developed. We present a diagnostic method of evaluating the fit of Bayesian models for multivariate data based on posterior predictive model checking (PPMC), a technique in which observed data are compared to replicated data generated from model predictions. Most previous work on PPMC has focused on the use of test quantities that are scalar summaries of the data and parameters. However, scalar summaries are unlikely to capture the rich features of multivariate data. We introduce the use of dissimilarity measures for checking Bayesian models for multivariate outcome data. This method has the advantage of checking the fit of the model to the complete data vectors or vector summaries with reduced dimension, providing a comprehensive picture of model fit. An application with longitudinal binary data illustrates the methods.  相似文献   

11.
Distributed and Parallel Databases - We propose a novel supervised classification algorithm for spatially dependent data, built as an extension of kernel discriminant analysis, that we named...  相似文献   

12.
Discovering global knowledge from distributed data sources is challenging, where the important issues include the ever-increasing data volume at the highly distributed sources and the general concern on data privacy. Properly abstracting the distributed data with a compact representation which can retain sufficient local details for global knowledge discovery in principle can address both the scalability and the data privacy challenges. This calls for the need to develop formal methodologies to support knowledge discovery on abstracted data. In this paper, we propose to abstract distributed data as Gaussian mixture models and learn a family of generative models from the abstracted data using a modified EM algorithm. To demonstrate the effectiveness of the proposed approach, we applied it to learn (a) data cluster models and (b) data manifold models, and evaluated their performance using both synthetic and benchmark data sets with promising results in terms of both effectiveness and scalability. Also, we have demonstrated that the proposed approach is robust against heterogeneous data distributions over the distributed sources.  相似文献   

13.
14.
The concepts of semistability and exponential semistability are well-developed for finite-dimensional systems with nonisolated equilibrium points, where asymptotic or exponential stability is not possible. Definitions of semistability and exponential semistability have recently been formulated for networks with time-delays. This paper further extends the semistability theory to continuous and discrete spatially distributed systems. This requires the definition of the notions of exact and approximate semicontrollability and semiobservability, and discrete approximate semicontrollability and semiobservability. Also introduced is the property of weak semistability. Necessary and sufficient conditions are given for exponential semistability and weak semistability, and sufficient conditions are given for semistability.  相似文献   

15.
In this paper we propose a new approach, called a fuzzy class model for Poisson regression, in the analysis of heterogeneous count data. On the basis of fuzzy set concept and fuzzy classification maximum likelihood (FCML) procedures we create an FCML algorithm for fuzzy class Poisson regression models. Traditionally, the EM algorithm had been used for latent class regression models. Thus, the accuracy and effectiveness of EM and FCML algorithms for estimating the parameters are compared. The results show that the proposed FCML algorithm presents better accuracy and effectiveness and can be used as another good tool to regression analysis for heterogeneous count data.This work was supported in part by the National Science Council of Taiwan under Grant NSC-89-2213-E-033-007.  相似文献   

16.
Capture-recapture methods are used to estimate the prevalence of diseases in the field of epidemiology. The information used for estimation purposes are available from multiple lists, whereby giving rise to the problems of list dependence and heterogeneity. In this paper, modelling is focused on the heterogeneity part. We present a new binomial latent class model which takes into account both the observed and unobserved heterogeneity within capture-recapture data. We adopt the conditional likelihood approach and perform estimation via the EM algorithm. We also derive the mathematical expressions for the computation of the standard error of the unknown population size. An application to data on diabetes patients in a town in northern Italy is discussed.  相似文献   

17.
A hierarchical latent variable model for data visualization   总被引:2,自引:0,他引:2  
Visualization has proven to be a powerful and widely-applicable tool for the analysis and interpretation of multivariate data. Most visualization algorithms aim to find a projection from the data space down to a two-dimensional visualization space. However, for complex data sets living in a high-dimensional space, it is unlikely that a single two-dimensional projection can reveal all of the interesting structure. We therefore introduce a hierarchical visualization algorithm which allows the complete data set to be visualized at the top level, with clusters and subclusters of data points visualized at deeper levels. The algorithm is based on a hierarchical mixture of latent variable models, whose parameters are estimated using the expectation-maximization algorithm. We demonstrate the principle of the approach on a toy data set, and we then apply the algorithm to the visualization of a synthetic data set in 12 dimensions obtained from a simulation of multiphase flows in oil pipelines, and to data in 36 dimensions derived from satellite images  相似文献   

18.
In this paper we present a model for the analysis of multivariate functional data with unequally spaced observation times that may differ among subjects. Our method is formulated as a Bayesian mixed-effects model in which the fixed part corresponds to the mean functions, and the random part corresponds to individual deviations from these mean functions. Covariates can be incorporated into both the fixed and the random effects. The random error term of the model is assumed to follow a multivariate Ornstein–Uhlenbeck process. For each of the response variables, both the mean and the subject-specific deviations are estimated via low-rank cubic splines using radial basis functions. Inference is performed via Markov chain Monte Carlo methods.  相似文献   

19.
We describe a microcomputer program (COXSURV) for proportional hazards multiple regression analysis of survival and other failure-time data generated in clinical trials and in retrospective clinical epidemiology studies. COXSURV is menu-driven and has powerful variable factoring and data exploratory capabilities for multivariate modeling. A batch mode allows automatic uni- or multivariate analyses for confounder summarization. Model selection for predictive purposes is possible through a step-up algorithm. The partial likelihood method used in the program allows the use of either discrete or continuous time scales by treating tied uncensored observations by either the exact method or by a robust approximation method. The program calculates most standard model fitting statistics for either overall or stratified analyses and uses data layout files compatible with those of other related epidemiologic analysis software.  相似文献   

20.
尹春勇  张帼杰 《计算机应用》2021,41(7):1947-1955
针对大数据环境下分类精度不高的问题,提出了一种面向分布式数据流的集成分类模型.首先,使用微簇模式减少局部节点向中心节点传输的数据量,降低通信代价;然后,使用样本重构算法生成全局分类器的训练样本;最后,提出一种面向漂移数据流的集成分类模型,采用动态分类器和稳定分类器的加权组合策略,使用混合标记策略标记最具代表性的样本以更...  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号