首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The advent of mixture models has opened the possibility of flexible models which are practical to work with. A common assumption is that practitioners typically expect that data are generated from a Gaussian mixture. The inverted Dirichlet mixture has been shown to be a better alternative to the Gaussian mixture and to be of significant value in a variety of applications involving positive data. The inverted Dirichlet is, however, usually undesirable, since it forces an assumption of positive correlation. Our focus here is to develop a Bayesian alternative to both the Gaussian and the inverted Dirichlet mixtures when dealing with positive data. The alternative that we propose is based on the generalized inverted Dirichlet distribution which offers high flexibility and ease of use, as we show in this paper. Moreover, it has a more general covariance structure than the inverted Dirichlet. The proposed mixture model is subjected to a fully Bayesian analysis based on Markov Chain Monte Carlo (MCMC) simulation methods namely Gibbs sampling and Metropolis–Hastings used to compute the posterior distribution of the parameters, and on Bayesian information criterion (BIC) used for model selection. The adoption of this purely Bayesian learning choice is motivated by the fact that Bayesian inference allows to deal with uncertainty in a unified and consistent manner. We evaluate our approach on the basis of two challenging applications concerning object classification and forgery detection.  相似文献   

2.
We developed a variational Bayesian learning framework for the infinite generalized Dirichlet mixture model (i.e. a weighted mixture of Dirichlet process priors based on the generalized inverted Dirichlet distribution) that has proven its capability to model complex multidimensional data. We also integrate a “feature selection” approach to highlight the features that are most informative in order to construct an appropriate model in terms of clustering accuracy. Experiments on synthetic data as well as real data generated from visual scenes and handwritten digits datasets illustrate and validate the proposed approach.  相似文献   

3.
We describe approaches for positive data modeling and classification using both finite inverted Dirichlet mixture models and support vector machines (SVMs). Inverted Dirichlet mixture models are used to tackle an outstanding challenge in SVMs namely the generation of accurate kernels. The kernels generation approaches, grounded on ideas from information theory that we consider, allow the incorporation of data structure and its structural constraints. Inverted Dirichlet mixture models are learned within a principled Bayesian framework using both Gibbs sampler and Metropolis-Hastings for parameter estimation and Bayes factor for model selection (i.e., determining the number of mixture’s components). Our Bayesian learning approach uses priors, which we derive by showing that the inverted Dirichlet distribution belongs to the family of exponential distributions, over the model parameters, and then combines these priors with information from the data to build posterior distributions. We illustrate the merits and the effectiveness of the proposed method with two real-world challenging applications namely object detection and visual scenes analysis and classification.  相似文献   

4.
Data clustering is a fundamental unsupervised learning task in several domains such as data mining, computer vision, information retrieval, and pattern recognition. In this paper, we propose and analyze a new clustering approach based on both hierarchical Dirichlet processes and the generalized Dirichlet distribution, which leads to an interesting statistical framework for data analysis and modelling. Our approach can be viewed as a hierarchical extension of the infinite generalized Dirichlet mixture model previously proposed in Bouguila and Ziou (IEEE Trans Neural Netw 21(1):107–122, 2010). The proposed clustering approach tackles the problem of modelling grouped data where observations are organized into groups that we allow to remain statistically linked by sharing mixture components. The resulting clustering model is learned using a principled variational Bayes inference-based algorithm that we have developed. Extensive experiments and simulations, based on two challenging applications namely images categorization and web service intrusion detection, demonstrate our model usefulness and merits.  相似文献   

5.
In this paper, we propose a Bayesian nonparametric approach for modeling and selection based on a mixture of Dirichlet processes with Dirichlet distributions, which can also be seen as an infinite Dirichlet mixture model. The proposed model uses a stick-breaking representation and is learned by a variational inference method. Due to the nature of Bayesian nonparametric approach, the problems of overfitting and underfitting are prevented. Moreover, the obstacle of estimating the correct number of clusters is sidestepped by assuming an infinite number of clusters. Compared to other approximation techniques, such as Markov chain Monte Carlo (MCMC), which require high computational cost and whose convergence is difficult to diagnose, the whole inference process in the proposed variational learning framework is analytically tractable with closed-form solutions. Additionally, the proposed infinite Dirichlet mixture model with variational learning requires only a modest amount of computational power which makes it suitable to large applications. The effectiveness of our model is experimentally investigated through both synthetic data sets and challenging real-life multimedia applications namely image spam filtering and human action videos categorization.  相似文献   

6.
Gaussian mixture model based on the Dirichlet distribution (Dirichlet Gaussian mixture model) has recently received great attention for modeling and processing data. This paper studies the new Dirichlet Gaussian mixture model for image segmentation. First, we propose a new way to incorporate the local spatial information between neighboring pixels based on the Dirichlet distribution. The main advantage is its simplicity, ease of implementation and fast computational speed. Secondly, existing Dirichlet Gaussian model uses complex log-likelihood function and require many parameters that are difficult to estimate. The total parameters in the proposed model lesser and the log-likelihood function have a simpler form. Finally, to estimate the parameters of the proposed Dirichlet Gaussian mixture model, a gradient method is adopted to minimize the negative log-likelihood function. Numerical experiments are conducted using the proposed model on various synthetic, natural and color images. We demonstrate through extensive simulations that the proposed model is superior to other algorithms based on the model-based techniques for image segmentation.  相似文献   

7.
This paper proposes an unsupervised algorithm for learning a finite mixture of scaled Dirichlet distributions. Parameters estimation is based on the maximum likelihood approach, and the minimum message length (MML) criterion is proposed for selecting the optimal number of components. This research work is motivated by the flexibility issues of the Dirichlet distribution, the widely used model for multivariate proportional data, which has prompted a number of scholars to search for generalizations of the Dirichlet. By introducing the extra parameters of the scaled Dirichlet, several useful statistical models could be obtained. Experimental results are presented using both synthetic and real datasets. Moreover, challenging real-world applications are empirically investigated to evaluate the efficiency of our proposed statistical framework.  相似文献   

8.
Learning appropriate statistical models is a fundamental data analysis task which has been the topic of continuing interest. Recently, finite Dirichlet mixture models have proved to be an effective and flexible model learning technique in several machine learning and data mining applications. In this article, the problem of learning and selecting finite Dirichlet mixture models is addressed using an expectation propagation (EP) inference framework. Within the proposed EP learning method, for finite mixture models, all the involved parameters and the model complexity (i.e. the number of mixture components), can be evaluated simultaneously in a single optimization framework. Extensive simulations using synthetic data along with two challenging real-world applications involving automatic image annotation and human action videos categorization demonstrate that our approach is able to achieve better results than comparable techniques.  相似文献   

9.
This paper presents an online algorithm for mixture model-based clustering. Mixture modeling is the problem of identifying and modeling components in a given set of data. The online algorithm is based on unsupervised learning of finite Dirichlet mixtures and a stochastic approach for estimates updating. For the selection of the number of clusters, we use the minimum message length (MML) approach. The proposed method is validated by synthetic data and by an application concerning the dynamic summarization of image databases.  相似文献   

10.
针对电网净负荷时序数据关联的特点,提出基于数据关联的狄利克雷混合模型(Data-relevance Dirichlet process mixture model,DDPMM)来表征净负荷的不确定性.首先,使用狄利克雷混合模型对净负荷的观测数据与预测数据进行拟合,得到其混合概率模型;然后,提出考虑数据关联的变分贝叶斯推断方法,改进后验分布对该混合概率模型进行求解,从而得到混合模型的最优参数;最后,根据净负荷预测值的大小得到其对应的预测误差边缘概率分布,实现不确定性表征.本文基于比利时电网的净负荷数据进行检验,算例结果表明:与传统的狄利克雷混合模型和高斯混合模型(Gaussian mixture model,GMM)等方法相比,所提出的基于数据关联狄利克雷混合模型可以更为有效地表征净负荷的不确定性.  相似文献   

11.
Finite mixture models are one of the most widely and commonly used probabilistic techniques for image segmentation. Although the most well known and commonly used distribution when considering mixture models is the Gaussian, it is certainly not the best approximation for image segmentation and other related image processing problems. In this paper, we propose and investigate the use of several other mixture models based namely on Dirichlet, generalized Dirichlet and Beta–Liouville distributions, which offer more flexibility in data modeling, for image segmentation. A maximum likelihood (ML) based algorithm is applied for estimating the resulted segmentation model’s parameters. Spatial information is also employed for figuring out the number of regions in an image and several color spaces are investigated and compared. The experimental results show that the proposed segmentation framework yields good overall performance, on various color scenes, that is better than comparable techniques.  相似文献   

12.
The multinomial distribution has been widely used to model count data. To increase clustering efficiency, we use an approximation to the Fisher scoring algorithm, which is more robust regarding the choice of initial parameter values. Then, we use a novel approach to estimate the optimal number of components, based on minimum message length criterion. Moreover, we consider a generalization of the multinomial model obtained by introducing the Dirichlet as prior, yielding the Dirichlet Compound Multinomial (DCM). Even though DCM can address the burstiness phenomenon of count data, the presence of Gamma function in its density function usually leads to undesired complications. In this article, we use two alternative representations of DCM distribution to perform clustering based on finite mixture models, where the mixture parameters are estimated using the minorization–maximization framework. To evaluate and compare the performance of our proposed models, we have considered three challenging real‐world applications that involve high‐dimensional count vectors, namely, sentiment analysis, facial expression recognition, and human action recognition. The results show that the proposed algorithms increase the clustering efficiency of their respective models remarkably, and the best results are achieved by the second parametrization of DCM, which can accommodate over‐dispersed count data.  相似文献   

13.
This paper presents an unsupervised approach for feature selection and extraction in mixtures of generalized Dirichlet (GD) distributions. Our method defines a new mixture model that is able to extract independent and non-Gaussian features without loss of accuracy. The proposed model is learned using the Expectation-Maximization algorithm by minimizing the message length of the data set. Experimental results show the merits of the proposed methodology in the categorization of object images.  相似文献   

14.
Recently hybrid generative discriminative approaches have emerged as an efficient knowledge representation and data classification engine. However, little attention has been devoted to the modeling and classification of non-Gaussian and especially proportional vectors. Our main goal, in this paper, is to discover the true structure of this kind of data by building probabilistic kernels from generative mixture models based on Liouville family, from which we develop the Beta-Liouville distribution, and which includes the well-known Dirichlet as a special case. The Beta-Liouville has a more general covariance structure than the Dirichlet which makes it more practical and useful. Our learning technique is based on a principled purely Bayesian approach which resulted models are used to generate support vector machine (SVM) probabilistic kernels based on information divergence. In particular, we show the existence of closed-form expressions of the Kullback-Leibler and Rényi divergences between two Beta-Liouville distributions and then between two Dirichlet distributions as a special case. Through extensive simulations and a number of experiments involving synthetic data, visual scenes and texture images classification, we demonstrate the effectiveness of the proposed approaches.  相似文献   

15.
闫小喜  韩崇昭 《自动化学报》2011,37(11):1313-1321
针对概率假设密度(Probability hypothesis density, PHD)高斯混合实现算法中的分量删减问题, 提出了基于Dirichlet分布的分量删减算法以改进概率假设密度高斯混合实现算法的性能. 算法采用极大后验准则估计混合参数, 采用仅依赖于混合权重的负指数Dirichlet分布作为混合参数的先验分布, 利用拉格朗日乘子推导了混合权重的更新公式. 算法利用负指数Dirichlet分布的不稳定性,在极大后验迭代过程中驱使与目标强度不相关的分量消亡. 该不稳定性还能够解决多个相近分量共同描述一个强度峰值的问题, 有利于后续多目标状态的提取. 仿真结果表明, 基于Dirichlet分布的分量删减算法优于典型高斯混合实现中的删减算法.  相似文献   

16.
基于高斯混合模型(Gaussian mixture model,GMM)的点集非刚性配准算法易受重尾点和异常点影响,提出含局部空间约束的t分布混合模型的点集非刚性配准算法. 通过期望最大化(Expectation maximization,EM)框架将高斯混合模型推广为t分布混合模型;把Dirichlet分布作为浮动点的先验权重,并构造含局部空间约束性质的Dirichlet 分布参数. 使用EM算法获得配准参数的闭合解;计算浮动点的自由度,改变其概率密度分布,避免异常点水平估计误差. 实验表明,本文提出的配准算法具有配准误差小、鲁棒性好、抗干扰能力强等优点.  相似文献   

17.
An infinite mixture of autoregressive models is developed. The unknown parameters in the mixture autoregressive model follow a mixture distribution, which is governed by a Dirichlet process prior. One main feature of our approach is the generalization of a finite mixture model by having the number of components unspecified. A Bayesian sampling scheme based on a weighted Chinese restaurant process is proposed to generate partitions of observations. Using the partitions, Bayesian prediction, while accounting for possible model uncertainty, determining the most probable number of mixture components, clustering of time series and outlier detection in time series, can be done. Numerical results from simulated and real data are presented to illustrate the methodology.  相似文献   

18.
Defining valid patents in a particular technological field is an indispensable step in patent analysis. To minimise the risk of missing valid patents, domain experts manually exclude irrelevant patents, known as noise patents, from an initial patent set derived using a loose retrieval query. However, this task has become time-consuming and labour intensive due to the increasing number of patents and rising complexity of technological knowledge. This study proposes a semi-automated approach to noise patent filtering based on information entropy theory and latent Dirichlet allocation. The proposed approach comprises four discrete steps: (1) structuring patents using a term-weighting method; (2) recommending noise patent seeds based on the information quantity of patents in terms of focal keyword groups; (3) measuring text similarities for patent clustering using latent Dirichlet allocation; and (4) identifying potential noise patent clusters with respect to the noise patent seeds. Our case study confirms that the proposed approach is valuable as a complementary noise patent filtering tool that will enable domain experts to focus more on their own knowledge-intensive tasks such as prior art analysis and research and development (R&D) strategy formulation.  相似文献   

19.
An infinite mixture of autoregressive models is developed. The unknown parameters in the mixture autoregressive model follow a mixture distribution, which is governed by a Dirichlet process prior. One main feature of our approach is the generalization of a finite mixture model by having the number of components unspecified. A Bayesian sampling scheme based on a weighted Chinese restaurant process is proposed to generate partitions of observations. Using the partitions, Bayesian prediction, while accounting for possible model uncertainty, determining the most probable number of mixture components, clustering of time series and outlier detection in time series, can be done. Numerical results from simulated and real data are presented to illustrate the methodology.  相似文献   

20.
This paper investigates the possibility of extracting latent aspects of a video in order to develop a video fingerprinting framework. Semantic visual information about humans, more specifically face occurrences in video frames, along with a generative probabilistic model, namely the Latent Dirichlet Allocation (LDA), are used for this purpose. The latent variables, namely the video topics are modeled as a mixture of distributions of faces in each video. The method also involves a clustering approach based on Scale Invariant Features Transform (SIFT) for clustering the detected faces and adapts the bag-of-words concept into a bag-of-faces one, in order to ensure exchangeability between topics distributions. Experimental results, on three different data sets, provide low misclassification rates of the order of 2% and false rejection rates of 0%. These rates provide evidence that the proposed method performs very efficiently for video fingerprinting.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号