首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
因果发现旨在通过观测数据挖掘变量间的因果关系,在实际应用中需要从观测数据中学习隐变量间的因果结构。现有方法主要利用观测变量间的协方差信息(如四分体约束)或引入非高斯假设(如三分体约束)来解决线性因果模型下的隐变量结构学习问题,但大多限定于分布明确的情况,而实际应用环境往往并不满足这种假设。给出任意分布下隐变量结构的识别性证明,指出在没有混淆因子影响的情况下,两个隐变量的因果方向可识别所需要的最小条件是仅需要其中一个隐变量的噪声服从非高斯分布。在此基础上,针对线性隐变量模型提出一种在任意分布下学习隐变量因果结构的算法,先利用四分体约束方法学习得到隐变量骨架图,再通过枚举骨架图的等价类并测量每一个等价类中的三分体约束来学习因果方向,同时将非高斯约束放宽到尽可能最小的变量子集,从而扩展线性隐变量模型的应用范围。实验结果表明,与MIMBuild和三分体约束方法相比,该算法得到了最佳的F1值,能够在任意分布下学习更多的隐变量因果结构信息,且具有更强的鲁棒性。  相似文献   

2.
Different conditional independence specifications for ordinal categorical data are compared by calculating a posterior distribution over classes of graphical models. The approach is based on the multivariate ordinal probit model where the data are considered to have arisen as truncated multivariate normal random vectors. By parameterising the precision matrix of the associated multivariate normal in Cholesky form, ordinal data models corresponding to directed acyclic conditional independence graphs for the latent variables can be specified and conveniently computed. Where one or more of the variables are binary this parameterisation is particularly compelling, as necessary constraints on the latent variable distribution can be imposed in such a way that a standard, fully normalised, prior can still be adopted. For comparing different directed graphical models a reversible jump Markov chain Monte Carlo (MCMC) approach is proposed. Where interest is focussed on undirected graphical models, this approach is augmented to allow switches in the orderings of variables of associated directed graphs, hence allowing the posterior distribution over decomposable undirected graphical models to be computed. The approach is illustrated with several examples, involving both binary and ordinal variables, and directed and undirected graphical model classes.  相似文献   

3.
Latent Variable Model Predictive Control (LV-MPC) algorithms are developed for trajectory tracking and disturbance rejection in batch processes. The algorithms are based on multi-phase PCA models developed using batch-wise unfolding of batch data arrays. Two LV-MPC formulations are presented, one based on optimization in the latent variable space and the other on direct optimization over a finite vector of future manipulated variables. In both cases prediction of the future trajectories is accomplished using statistical latent variable missing data imputation methods. The proposed LV-MPCs can handle constraints. Furthermore, due to the batch-wise unfolding approach selected in the modeling section, the nonlinear time-varying behavior of batch processes is captured by the linear LV models thereby yielding very simple and computationally fast nonlinear batch MPC. The methods are tested and compared on a simulated batch reactor case study.  相似文献   

4.
This paper is concerned with data science and analytics as applied to data from dynamic systems for the purpose of monitoring, prediction, and inference. Collinearity is inevitable in industrial operation data. Therefore, we focus on latent variable methods that achieve dimension reduction and collinearity removal. We present a new dimension reduction expression of state space framework to unify dynamic latent variable analytics for process data, dynamic factor models for econometrics, subspace identification of multivariate dynamic systems, and machine learning algorithms for dynamic feature analysis. We unify or differentiate them in terms of model structure, objectives with constraints, and parsimony of parameterization. The Kalman filter theory in the latent space is used to give a system theory foundation to some empirical treatments in data analytics. We provide a unifying review of the connections among the dynamic latent variable methods, dynamic factor models, subspace identification methods, dynamic feature extractions, and their uses for prediction and process monitoring. Both unsupervised dynamic latent variable analytics and the supervised counterparts are reviewed. Illustrative examples are presented to show the similarities and differences among the analytics in extracting features for prediction and monitoring.  相似文献   

5.
In the reliability-based design optimization (RBDO) process, surrogate models are frequently used to reduce the number of simulations because analysis of a simulation model takes a great deal of computational time. On the other hand, to obtain accurate surrogate models, we have to limit the dimension of the RBDO problem and thus mitigate the curse of dimensionality. Therefore, it is desirable to develop an efficient and effective variable screening method for reduction of the dimension of the RBDO problem. In this paper, requirements of the variable screening method for deterministic design optimization (DDO) and RBDO are compared, and it is found that output variance is critical for identifying important variables in the RBDO process. An efficient approximation method based on the univariate dimension reduction method (DRM) is proposed to calculate output variance efficiently. For variable screening, the variables that induce larger output variances are selected as important variables. To determine important variables, hypothesis testing is used in this paper so that possible errors are contained in a user-specified error level. Also, an appropriate number of samples is proposed for calculating the output variance. Moreover, a quadratic interpolation method is studied in detail to calculate output variance efficiently. Using numerical examples, performance of the proposed method is verified. It is shown that the proposed method finds important variables efficiently and effectively  相似文献   

6.
运用偏最小二乘(PLS)和遗传算法(GA)预测含能材料的爆炸性能。利用GA在"分子结构—爆炸性能(QSDR)"数据中选取较少的变量个数,以较少的变量个数包含较多的变量信息,再用PLS进行结构性能模型的建立和性能预测。将这种方法应用于呋咱和芳香类含能材料的性能预测当中,可以验证方法的有效性。  相似文献   

7.
Guest Editorial     
The identification of continuous time models from non-uniformly sampled data records is investigated and a new identification algorithm based on the state variable filter approach is derived. It is shown that the orthogonal least squares estimator can be adapted for the identification of continuous time models from non-uniformly sampled data records and instrumental variables are introduced to reduce the bias in stochastic system identification. Multiplying the filtered variables obtained from the state variable filter, with higher powers of the noise free output signal prior to the estimation, is shown to enhance the parameter estimates. Simulated examples are included to illustrate the models.  相似文献   

8.
The estimation of the differences among groups in observational studies is frequently inaccurate owing to a bias caused by differences in the distributions of covariates. In order to estimate the average treatment effects when the treatment variable is binary, Rosenbaum and Rubin [1983. The central role of the propensity score in observational studies for causal effects. Biometrika 70, 41-55] proposed an adjustment method for pre-treatment variables using propensity scores. Imbens [2000. The role of the propensity score in estimating dose-response functions. Biometrika 87, 706-710] extended the propensity score methodology for estimation of average treatment effects with multivalued treatments.However, these studies focused only on estimating the marginal mean structure. In many substantive sciences such as the biological and social sciences, a general estimation method is required to deal with more complex analyses other than regression, such as testing group differences on latent variables. For latent variable models, the EM algorithm or the traditional Monte Carlo methods are necessary. However, in propensity score adjustment, these methods cannot be used because the full distribution is not specified.In this paper, we propose a quasi-Bayesian estimation method for general parametric models that integrate out the distributions of covariates using propensity scores. Although the proposed Bayes estimates are shown to be consistent, they can be calculated by existing Markov chain Monte Carlo methods such as Gibbs sampler. The proposed method is useful to estimate parameters in latent variable models, while the previous methods were unable to provide valid estimates for complex models such as latent variable models.We also illustrated the procedure using the data obtained from the US National Longitudinal Survey of Children and Youth (NLSY1979-2002) for estimating the effect of maternal smoking during pregnancy on the development of the child's cognitive functioning.  相似文献   

9.
In the general classification context the recourse to the so-called Bayes decision rule requires to estimate the class conditional probability density functions. A mixture model for the observed variables which is derived by assuming that the data have been generated by an independent factor model is proposed. Independent factor analysis is in fact a generative latent variable model whose structure closely resembles the one of the ordinary factor model, but it assumes that the latent variables are mutually independent and not necessarily Gaussian. The method therefore provides a dimension reduction together with a semiparametric estimate of the class conditional probability density functions. This density approximation is plugged into the classic Bayes rule and its performance is evaluated both on real and simulated data.  相似文献   

10.
In clinical studies, covariates are often measured with error due to biological fluctuations, device error and other sources. Summary statistics and regression models that are based on mis-measured data will differ from the corresponding analysis based on the “true” covariate. Statistical analysis can be adjusted for measurement error, however various methods exhibit a tradeoff between convenience and performance. Moment Adjusted Imputation (MAI) is a measurement error in a scalar latent variable that is easy to implement and performs well in a variety of settings. In practice, multiple covariates may be similarly influenced by biological fluctuations, inducing correlated, multivariate measurement error. The extension of MAI to the setting of multivariate latent variables involves unique challenges. Alternative strategies are described, including a computationally feasible option that is shown to perform well.  相似文献   

11.
In this paper a robust linear regression method with variable selection is proposed for predicting desirable end-of-line quality variables in complex industrial processes. The development of such prediction models is challenging because there is usually a large pool of candidate explanatory variables, limited sample data, and multicollinearity among explanatory variables. The proposed method is named as the enumerative partial least square based nonnegative garrote regression. It employs partial least square regression in enumerative manner to generate initial model coefficients and then uses a nonnegative garrote method to shrink original coefficients so that irrelevant variables can be eliminated implicitly. Analysis about the advantages of the proposed method is provided compared to existing state-of-art model construction methods. Two simulation examples as well as an industrial application in a local semiconductor factory unit are used to validate the proposed method. These examples witness substantial improvement in terms of accuracy and robustness in variable selection compared to existing methods. Specifically, for the industrial case the percentages of improvement in terms of root mean squared error is up to 24.3% compared with the previous work.  相似文献   

12.
A nonlinear latent variable model for the topographic organization and subsequent visualization of multivariate binary data is presented. The generative topographic mapping (GTM) is a nonlinear factor analysis model for continuous data which assumes an isotropic Gaussian noise model and performs uniform sampling from a two-dimensional (2-D) latent space. Despite the, success of the GTM when applied to continuous data the development of a similar model for discrete binary data has been hindered due, in part, to the nonlinear link function inherent in the binomial distribution which yields a log-likelihood that is nonlinear in the model parameters. The paper presents an effective method for the parameter estimation of a binary latent variable model-a binary version of the GTM-by adopting a variational approximation to the binomial likelihood. This approximation thus provides a log-likelihood which is quadratic in the model parameters and so obviates the necessity of an iterative M-step in the expectation maximization (EM) algorithm. The power of this method is demonstrated on two significant application domains, handwritten digit recognition and the topographic organization of semantically similar text-based documents.  相似文献   

13.
Exponential principal component analysis (e-PCA) has been proposed to reduce the dimension of the parameters of probability distributions using Kullback information as a distance between two distributions. It also provides a framework for dealing with various data types such as binary and integer for which the Gaussian assumption on the data distribution is inappropriate. In this paper, we introduce a latent variable model for the e-PCA. Assuming the discrete distribution on the latent variable leads to mixture models with constraint on their parameters. This provides a framework for clustering on the lower dimensional subspace of exponential family distributions. We derive a learning algorithm for those mixture models based on the variational Bayes (VB) method. Although intractable integration is required to implement the algorithm for a subspace, an approximation technique using Laplace's method allows us to carry out clustering on an arbitrary subspace. Combined with the estimation of the subspace, the resulting algorithm performs simultaneous dimensionality reduction and clustering. Numerical experiments on synthetic and real data demonstrate its effectiveness for extracting the structures of data as a visualization technique and its high generalization ability as a density estimation model.   相似文献   

14.
Generalized linear mixed models or latent variable models for categorical data are difficult to estimate if the random effects or latent variables vary at non-nested levels, such as persons and test items. Clayton and Rasbash (1999) suggested an Alternating Imputation Posterior (AIP) algorithm for approximate maximum likelihood estimation. For item response models with random item effects, the algorithm iterates between an item wing in which the item mean and variance are estimated for given person effects and a person wing in which the person mean and variance are estimated for given item effects. The person effects used for the item wing are sampled from the conditional posterior distribution estimated in the person wing and vice versa. Clayton and Rasbash (1999) used marginal quasi-likelihood (MQL) and penalized quasi-likelihood (PQL) estimation within the AIP algorithm, but this method has been shown to produce biased estimates in many situations, so we use maximum likelihood estimation with adaptive quadrature. We apply the proposed algorithm to the famous salamander mating data, comparing the estimates with many other methods, and to an educational testing dataset. We also present a simulation study to assess performance of the AIP algorithm and the Laplace approximation with different numbers of items and persons and a range of item and person variances.  相似文献   

15.
The problem of optimizing truss structures in the presence of uncertain parameters considering both continuous and discrete design variables is studied. An interval analysis based robust optimization method combined with the improved genetic algorithm is proposed for solving the problem. Uncertain parameters are assumed to be bounded in specified intervals. The natural interval extensions are employed to obtain explicitly a conservative approximation of the upper and lower bounds of the structural response, and hereby the bounds of the objective function and the constraint function. This way the uncertainty design may be performed in a very efficient manner in comparison with the probabilistic analysis based method. A mix-coded genetic algorithm (GA), where the discrete variables are coded with binary numbers while the continuous variables are coded with real numbers, is developed to deal with simultaneously the continuous and discrete design variables of the optimization model. An improved differences control strategy is proposed to avoid the GA getting stuck in local optima. Several numerical examples concerning the optimization of plane and space truss structures with continuous, discrete or mixed design variables are presented to validate the method developed in the present paper. Monte Carlo simulation shows that the interval analysis based optimization method gives much more robust designs in comparison with the deterministic optimization method.  相似文献   

16.
This paper presents a random fuzzy economic manufacturing quantity (EMQ) model in a deteriorating process. It is assumed that the setup cost and the average holding cost are characterized as fuzzy variables and the elapsed time until shift is a random fuzzy variable. As a function of these parameters, the average total cost is also a random fuzzy variable, and the unimodality of its expected value is studied. To obtain the optimal run length and the minimum average cost, simultaneous perturbation stochastic approximation (SPSA) algorithm based on random fuzzy simulation is provided. Random fuzzy EMQ models with fuzzy deterioration, fuzzy linear deterioration and fuzzy exponential deterioration are presented, respectively. These models can be solved by the proposed algorithm. Numerical examples are presented in the end.  相似文献   

17.
Probabilistic models such as probabilistic principal component analysis (PPCA) have recently caught much attention in the process monitoring area. An important issue of the PPCA method is how to determine the dimensionality of the latent variable space. In the present paper, one of the most popular Bayesian type chemometric methods, Bayesian PCA (BPCA) is introduced for process monitoring purpose, which is based on the recent developed variational inference algorithm. In this monitoring framework, the effectiveness of each extracted latent variable can be well reflected by a hyperparameter, upon which the dimensionality of the latent variable space can be automatically determined. Meanwhile, for practical consideration, the developed BPCA-based monitoring method is robust to missing data and can also give satisfactory performance under limited data samples. Another contribution of this paper is due to the proposal of a new fault reconstruction method under the BPCA model structure. Two case studies are provided to evaluate the performance of the proposed method.  相似文献   

18.
Variational Bayesian Expectation-Maximization (VBEM), an approximate inference method for probabilistic models based on factorizing over latent variables and model parameters, has been a standard technique for practical Bayesian inference. In this paper, we introduce a more general approximate inference framework for conjugate-exponential family models, which we call Latent-Space Variational Bayes (LSVB). In this approach, we integrate out model parameters in an exact way, leaving only the latent variables. It can be shown that the LSVB approach gives better estimates of the model evidence as well as the distribution over latent variables than the VBEM approach, but in practice, the distribution over latent variables has to be approximated. As a practical implementation, we present a First-order LSVB (FoLSVB) algorithm to approximate this distribution over latent variables. From this approximate distribution, one can estimate the model evidence and the posterior over model parameters. The FoLSVB algorithm is directly comparable to the VBEM algorithm and has the same computational complexity. We discuss how LSVB generalizes the recently proposed collapsed variational methods [20] to general conjugate-exponential families. Examples based on mixtures of Gaussians and mixtures of Bernoullis with synthetic and real-world data sets are used to illustrate some advantages of our method over VBEM.  相似文献   

19.
In this paper, the complex relationship between environmental variables and dam static response is expressed using composition of functions, including nonlinear mapping and linear mapping. The environmental effect and noise disturbance is successfully separated from the monitoring data by analysis of the covariance matrix of multivariate monitoring data of dam response. Based on this separation process, two multivariate dam safety monitoring models are proposed. In model I, the upper control limits (UCLs) are calculated by performing kernel density estimation (KDE) on the square prediction error (SPE) of the offline data. For new monitoring data, we can judge whether they are abnormal by comparing the newly calculated SPE with the UCL. When abnormal data are detected, the SPE contribution plots and the SPE control chart of the new monitoring data are jointly used to qualitatively identify the reason for the abnormalities. Model II is a dam monitoring model based on latent variables that can be calculated from the separation process of the environmental and noise effects. The least squares support vector machines (LS-SVMs) model is adopted to simulate the nonlinear mapping from environmental variables to latent variables. The latent variables are predicted, and the prediction interval is calculated to provide a control range for the future monitoring data. The two monitoring models are applied to analyze the monitoring data of the horizontal displacement and hydraulic uplift pressure of a roller-compacted concrete (RCC) gravity dam. The analysis results demonstrate the good performance of the two models.  相似文献   

20.
Regression analysis is a machine learning approach that aims to accurately predict the value of continuous output variables from certain independent input variables, via automatic estimation of their latent relationship from data. Tree-based regression models are popular in literature due to their flexibility to model higher order non-linearity and great interpretability. Conventionally, regression tree models are trained in a two-stage procedure, i.e. recursive binary partitioning is employed to produce a tree structure, followed by a pruning process of removing insignificant leaves, with the possibility of assigning multivariate functions to terminal leaves to improve generalisation. This work introduces a novel methodology of node partitioning which, in a single optimisation model, simultaneously performs the two tasks of identifying the break-point of a binary split and assignment of multivariate functions to either leaf, thus leading to an efficient regression tree model. Using six real world benchmark problems, we demonstrate that the proposed method consistently outperforms a number of state-of-the-art regression tree models and methods based on other techniques, with an average improvement of 7–60% on the mean absolute errors (MAE) of the predictions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号