首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Bayesian feature and model selection for Gaussian mixture models   总被引:1,自引:0,他引:1  
We present a Bayesian method for mixture model training that simultaneously treats the feature selection and the model selection problem. The method is based on the integration of a mixture model formulation that takes into account the saliency of the features and a Bayesian approach to mixture learning that can be used to estimate the number of mixture components. The proposed learning algorithm follows the variational framework and can simultaneously optimize over the number of components, the saliency of the features, and the parameters of the mixture model. Experimental results using high-dimensional artificial and real data illustrate the effectiveness of the method.  相似文献   

2.
In this paper, we propose a Bayesian nonparametric approach for modeling and selection based on a mixture of Dirichlet processes with Dirichlet distributions, which can also be seen as an infinite Dirichlet mixture model. The proposed model uses a stick-breaking representation and is learned by a variational inference method. Due to the nature of Bayesian nonparametric approach, the problems of overfitting and underfitting are prevented. Moreover, the obstacle of estimating the correct number of clusters is sidestepped by assuming an infinite number of clusters. Compared to other approximation techniques, such as Markov chain Monte Carlo (MCMC), which require high computational cost and whose convergence is difficult to diagnose, the whole inference process in the proposed variational learning framework is analytically tractable with closed-form solutions. Additionally, the proposed infinite Dirichlet mixture model with variational learning requires only a modest amount of computational power which makes it suitable to large applications. The effectiveness of our model is experimentally investigated through both synthetic data sets and challenging real-life multimedia applications namely image spam filtering and human action videos categorization.  相似文献   

3.
We developed a variational Bayesian learning framework for the infinite generalized Dirichlet mixture model (i.e. a weighted mixture of Dirichlet process priors based on the generalized inverted Dirichlet distribution) that has proven its capability to model complex multidimensional data. We also integrate a “feature selection” approach to highlight the features that are most informative in order to construct an appropriate model in terms of clustering accuracy. Experiments on synthetic data as well as real data generated from visual scenes and handwritten digits datasets illustrate and validate the proposed approach.  相似文献   

4.
Variational methods, which have become popular in the neural computing/machine learning literature, are applied to the Bayesian analysis of mixtures of Gaussian distributions. It is also shown how the deviance information criterion, (DIC), can be extended to these types of model by exploiting the use of variational approximations. The use of variational methods for model selection and the calculation of a DIC are illustrated with real and simulated data. The variational approach allows the simultaneous estimation of the component parameters and the model complexity. It is found that initial selection of a large number of components results in superfluous components being eliminated as the method converges to a solution. This corresponds to an automatic choice of model complexity. The appropriateness of this is reflected in the DIC values.  相似文献   

5.
Finite mixture is widely used in the fields of information processing and data analysis. However, its model selection, i.e., the selection of components in the mixture for a given sample data set, has been still a rather difficult task. Recently, the Bayesian Ying-Yang (BYY) harmony learning has provided a new approach to the Gaussian mixture modeling with a favorite feature that model selection can be made automatically during parameter learning. In this paper, based on the same BYY harmony learning framework for finite mixture, we propose an adaptive gradient BYY learning algorithm for Poisson mixture with automated model selection. It is demonstrated well by the simulation experiments that this adaptive gradient BYY learning algorithm can automatically determine the number of actual Poisson components for a sample data set, with a good estimation of the parameters in the original or true mixture where the components are separated in a certain degree. Moreover, the adaptive gradient BYY learning algorithm is successfully applied to texture classification.  相似文献   

6.
In the past years, many authors have considered application of machine learning methodologies to effect robot learning by demonstration. Gaussian mixture regression (GMR) is one of the most successful methodologies used for this purpose. A major limitation of GMR models concerns automatic selection of the proper number of model states, i.e., the number of model component densities. Existing methods, including likelihood- or entropy-based criteria, usually tend to yield noisy model size estimates while imposing heavy computational requirements. Recently, Dirichlet process (infinite) mixture models have emerged in the cornerstone of nonparametric Bayesian statistics as promising candidates for clustering applications where the number of clusters is unknown a priori. Under this motivation, to resolve the aforementioned issues of GMR-based methods for robot learning by demonstration, in this paper we introduce a nonparametric Bayesian formulation for the GMR model, the Dirichlet process GMR model. We derive an efficient variational Bayesian inference algorithm for the proposed model, and we experimentally investigate its efficacy as a robot learning by demonstration methodology, considering a number of demanding robot learning by demonstration scenarios.  相似文献   

7.
We describe approaches for positive data modeling and classification using both finite inverted Dirichlet mixture models and support vector machines (SVMs). Inverted Dirichlet mixture models are used to tackle an outstanding challenge in SVMs namely the generation of accurate kernels. The kernels generation approaches, grounded on ideas from information theory that we consider, allow the incorporation of data structure and its structural constraints. Inverted Dirichlet mixture models are learned within a principled Bayesian framework using both Gibbs sampler and Metropolis-Hastings for parameter estimation and Bayes factor for model selection (i.e., determining the number of mixture’s components). Our Bayesian learning approach uses priors, which we derive by showing that the inverted Dirichlet distribution belongs to the family of exponential distributions, over the model parameters, and then combines these priors with information from the data to build posterior distributions. We illustrate the merits and the effectiveness of the proposed method with two real-world challenging applications namely object detection and visual scenes analysis and classification.  相似文献   

8.
There has been growing interest in subspace data modeling over the past few years. Methods such as principal component analysis, factor analysis, and independent component analysis have gained in popularity and have found many applications in image modeling, signal processing, and data compression, to name just a few. As applications and computing power grow, more and more sophisticated analyses and meaningful representations are sought. Mixture modeling methods have been proposed for principal and factor analyzers that exploit local gaussian features in the subspace manifolds. Meaningful representations may be lost, however, if these local features are nongaussian or discontinuous. In this article, we propose extending the gaussian analyzers mixture model to an independent component analyzers mixture model. We employ recent developments in variational Bayesian inference and structure determination to construct a novel approach for modeling nongaussian, discontinuous manifolds. We automatically determine the local dimensionality of each manifold and use variational inference to calculate the optimum number of ICA components needed in our mixture model. We demonstrate our framework on complex synthetic data and illustrate its application to real data by decomposing functional magnetic resonance images into meaningful-and medically useful-features.  相似文献   

9.
In this paper, we consider the problem of unsupervised discrete feature selection/weighting. Indeed, discrete data are an important component in many data mining, machine learning, image processing, and computer vision applications. However, much of the published work on unsupervised feature selection has concentrated on continuous data. We propose a probabilistic approach that assigns relevance weights to discrete features that are considered as random variables modeled by finite discrete mixtures. The choice of finite mixture models is justified by its flexibility which has led to its widespread application in different domains. For the learning of the model, we consider both Bayesian and information-theoretic approaches through stochastic complexity. Experimental results are presented to illustrate the feasibility and merits of our approach on a difficult problem which is clustering and recognizing visual concepts in different image data. The proposed approach is successfully applied also for text clustering.  相似文献   

10.
Estimating reliable class-conditional probability is the prerequisite to implement Bayesian classifiers, and how to estimate the probability density functions (PDFs) is also a fundamental problem for other probabilistic induction algorithms. The finite mixture model (FMM) is able to represent arbitrary complex PDFs by using a mixture of mutimodal distributions, but it assumes that the component mixtures follows a given distribution, which may not be satisfied for real world data. This paper presents a non-parametric kernel mixture model (KMM) based probability density estimation approach, in which the data sample of a class is assumed to be drawn by several unknown independent hidden subclasses. Unlike traditional FMM schemes, we simply use the k-means clustering algorithm to partition the data sample into several independent components, and the regional density diversities of components are combined using the Bayes theorem. On the basis of the proposed kernel mixture model, we present a three-step Bayesian classifier, which includes partitioning, structure learning, and PDF estimation. Experimental results show that KMM is able to improve the quality of estimated PDFs of conventional kernel density estimation (KDE) method, and also show that KMM-based Bayesian classifiers outperforms existing Gaussian, GMM, and KDE-based Bayesian classifiers.  相似文献   

11.
For multimode processes, Gaussian mixture model (GMM) has been applied to estimate the probability density function of the process data under normal-operational condition in last few years. However, learning GMM with the expectation maximization (EM) algorithm from process data can be difficult or even infeasible for high-dimensional and collinear process variables. To address this issue, a novel multimode process monitoring approach based on PCA mixture model is proposed. First, the PCA technique is directly applied to the covariance matrix of each Gaussian component to reduce the dimension of process variables and to obtain nonsingular covariance matrices. Then the Bayesian Ying-Yang incremental EM algorithm is adopted to automatically optimize the number of mixture components. With the obtained PCA mixture model, a novel process monitoring scheme is derived for fault detection of multimode processes. Three case studies are provided to evaluate the monitoring performance of the proposed method.  相似文献   

12.
李绍园  韦梦龙  黄圣君 《软件学报》2022,33(4):1274-1286
传统监督学习需要训练样本的真实标记信息,而在很多情况下,真实标记并不容易收集.与之对比,众包学习从多个可能犯错的非专家收集标注,通过某种融合方式估计样本的真实标记.注意到现有深度众包学习工作对标注者相关性建模不足,而非深度众包学习方面的工作表明,标注者相关性建模利用有助于改善学习效果.提出一种深度生成式众包学习方法,以...  相似文献   

13.
卿湘运  王行愚 《计算机学报》2007,30(8):1333-1343
子空间聚类的目标是在不同的特征子集上对给定的一组数据归类.此非监督学习方法试图发现数据"在不同表达下的相似"模式,并且引起了相关领域大量的关注和研究.首先扩展Hoff提出的"均值与方差平移"模型为一个新的基于特征子集的非参数聚类模型,其优点是能应用变分贝叶斯方法学习模型参数.此模型结合Dirichlet过程混合模型和选择特征子集的非参数模型,能自动选择聚类个数和进行子空间聚类.然后给出基于马尔可夫链蒙特卡罗的参数后验推断算法.出于计算速度上的考虑,提出应用变分贝叶斯方法学习模型参数.在仿真数据上的实验结果及在人脸聚类问题上的应用均表明了此模型能同时选择相关特征和在这些特征上具有相似模式的数据点.在UCI"多特征数据库"上应用无需抽样的变分贝叶斯方法,其实验结果说明此方法能快速推断模型参数.  相似文献   

14.
已有的轨迹预测方法难以对移动对象运动轨迹进行准确地描述,尤其在复杂且不确定的车载自组织网络(vehicular ad hoc network)(也称车联网)环境中.为了解决这一问题,提出基于变分高斯混合模型(variational Gaussian mixture model, VGMM)的环境自适应轨迹预测方法ESATP(environment self-adaptive prediction method based on VGMM).首先,在传统高斯混合模型的基础上使用变分贝叶斯推理近似方法处理混合高斯分布;其次设计变分贝叶斯期望最大化算法学习计算高斯混合模型参数,有效运用参数先验信息得到更高精度预测模型;最后,针对输入轨迹数据特征,使用参数自适应选择算法自动调节参数组合,灵活调整混合高斯分量的个数和轨迹段大小.实验结果表明:所提方法在实验中表现出较高的预测准确性,可应用于车辆移动定位产品中.  相似文献   

15.
Parameter estimation is a cornerstone of most fundamental problems of statistical research and practice. In particular, finite mixture models have long been heavily relied on deterministic approaches such as expectation maximization (EM). Despite their successful utilization in wide spectrum of areas, they have inclined to converge to local solutions. An alternative approach is the adoption of Bayesian inference that naturally addresses data uncertainty while ensuring good generalization. To this end, in this paper we propose a fully Bayesian approach for Langevin mixture model estimation and selection via MCMC algorithm based on Gibbs sampler, Metropolis–Hastings and Bayes factors. We demonstrate the effectiveness and the merits of the proposed learning framework through synthetic data and challenging applications involving topic detection and tracking and image categorization.  相似文献   

16.
近年来,使用高斯混合模型作为块先验的贝叶斯方法取得了优秀的图像复原性能,针对这类模型分量固定及主要依赖外部学习的缺点,提出了一种新的基于狄利克雷过程混合模型的图像先验模型。该模型从干净图像数据库中学习外部通用先验,从退化图像中学习内部先验,借助模型中统计量的可累加性自然实现内外部先验融合。通过聚类的新增及归并机制,模型的复杂度随着数据的增大或缩小而自适应地变化,可以学习到可解释及紧凑的模型。为了求解所有隐变量的变分后验分布,提出了一种结合新增及归并机制的批次更新可扩展变分算法,解决了传统坐标上升算法在大数据集下效率较低、容易陷入局部最优解的问题。在图像去噪及填充实验中,相比传统方法,所提模型无论在客观质量评价还是视觉观感上都更有优势,验证了该模型的有效性。  相似文献   

17.
针对电网净负荷时序数据关联的特点,提出基于数据关联的狄利克雷混合模型(Data-relevance Dirichlet process mixture model,DDPMM)来表征净负荷的不确定性.首先,使用狄利克雷混合模型对净负荷的观测数据与预测数据进行拟合,得到其混合概率模型;然后,提出考虑数据关联的变分贝叶斯推断方法,改进后验分布对该混合概率模型进行求解,从而得到混合模型的最优参数;最后,根据净负荷预测值的大小得到其对应的预测误差边缘概率分布,实现不确定性表征.本文基于比利时电网的净负荷数据进行检验,算例结果表明:与传统的狄利克雷混合模型和高斯混合模型(Gaussian mixture model,GMM)等方法相比,所提出的基于数据关联狄利克雷混合模型可以更为有效地表征净负荷的不确定性.  相似文献   

18.
In this paper, we propose a novel approach of simultaneous localized feature selection and model detection for unsupervised learning. In our approach, local feature saliency, together with other parameters of Gaussian mixtures, are estimated by Bayesian variational learning. Experiments performed on both synthetic and real-world data sets demonstrate that our approach is superior over both global feature selection and subspace clustering methods.  相似文献   

19.
The advent of mixture models has opened the possibility of flexible models which are practical to work with. A common assumption is that practitioners typically expect that data are generated from a Gaussian mixture. The inverted Dirichlet mixture has been shown to be a better alternative to the Gaussian mixture and to be of significant value in a variety of applications involving positive data. The inverted Dirichlet is, however, usually undesirable, since it forces an assumption of positive correlation. Our focus here is to develop a Bayesian alternative to both the Gaussian and the inverted Dirichlet mixtures when dealing with positive data. The alternative that we propose is based on the generalized inverted Dirichlet distribution which offers high flexibility and ease of use, as we show in this paper. Moreover, it has a more general covariance structure than the inverted Dirichlet. The proposed mixture model is subjected to a fully Bayesian analysis based on Markov Chain Monte Carlo (MCMC) simulation methods namely Gibbs sampling and Metropolis–Hastings used to compute the posterior distribution of the parameters, and on Bayesian information criterion (BIC) used for model selection. The adoption of this purely Bayesian learning choice is motivated by the fact that Bayesian inference allows to deal with uncertainty in a unified and consistent manner. We evaluate our approach on the basis of two challenging applications concerning object classification and forgery detection.  相似文献   

20.
为提高图像分割的抗噪鲁棒性并解决分割数目的自适应确定问题,通过在聚类标签先验概率的折棍构造过程中建立Markov随机场,将空间相关性约束引入Dirichlet过程混合模型的概率建模,使聚类的空间平滑性得以增强,并采用变分推断方法获得聚类标签的收敛解析解,提出一种基于折棍变分贝叶斯推断的图像分割算法,实现了对像素聚类标签和分割数目的同步自适应学习,避免了传统方法中因引入空间相关性约束而出现的计算复杂问题.基于Berkeley BSD500图像测试数据集的数值实验结果表明,该算法具有比现有的混合模型聚类图像分割算法更高的PRI值,且在低于0.1的噪声方差条件下表现出了更优的抗噪鲁棒性.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号