首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 203 毫秒
1.
Dirichlet过程是一种应用于非参数贝叶斯模型中的随机过程。通过其良好的聚类性质,基于此的模型可以通过简单的Gibbs采样决定参数的个数,从而为模型选择提供方便。近年来,在理论和应用上均得到了迅速的发展,引起越来越多的关注。分层Dirichlet过程是LDA模型的非参数模型推广,可以用来构建无穷个基本组元的混合模型。HDP被广泛地应用在概率话题模型的领域。首先说明Dirichlet过程的原理和采样方法,然后将其推广到分层Dirichlet过程中,并重点介绍基于Dirichlet过程的混合模型,最后对分层Dirichlet过程的应用进行了介绍。  相似文献   

2.
张林  刘辉 《自动化学报》2012,38(10):1709-1713
面向 Illumina GoldenGate 甲基化微阵列数据提出了一种基于模型的聚类算法. 算法通过建立贝塔无限混合模型, 采用 Dirichlet 过程作为先验, 实现了基于数据和模型的聚类结构的建立, 实验结果表明该算法能够有效估计出聚类类别个数、 每个聚类类别的混合权重、每个聚类类别的特征等信息, 达到比较理想的聚类效果.  相似文献   

3.
经典非参数Dirichlet混合过程模型图像分割算法具备在未知类数情况下实现图像自动分割的特点,但是由于其计算速度较慢,限制了该方法在临床上的实时应用.本文在经典非参数模型基础上进行改进,该算法首先将图像进行各项异性扩散滤波平滑,然后将马尔科夫随机场空间约束作为Dirichlet混合过程模型的先验进行分割计算.文中使用新算法对15例脑肿瘤磁共振图像进行分割实验,结果显示新算法能更有效控制收敛时图像分割类数,并且在图像分割的精度和计算速度等特性方面都明显优于经典的Dirichlet混合过程模型分割算法.  相似文献   

4.
卿湘运  王行愚 《计算机学报》2007,30(8):1333-1343
子空间聚类的目标是在不同的特征子集上对给定的一组数据归类.此非监督学习方法试图发现数据"在不同表达下的相似"模式,并且引起了相关领域大量的关注和研究.首先扩展Hoff提出的"均值与方差平移"模型为一个新的基于特征子集的非参数聚类模型,其优点是能应用变分贝叶斯方法学习模型参数.此模型结合Dirichlet过程混合模型和选择特征子集的非参数模型,能自动选择聚类个数和进行子空间聚类.然后给出基于马尔可夫链蒙特卡罗的参数后验推断算法.出于计算速度上的考虑,提出应用变分贝叶斯方法学习模型参数.在仿真数据上的实验结果及在人脸聚类问题上的应用均表明了此模型能同时选择相关特征和在这些特征上具有相似模式的数据点.在UCI"多特征数据库"上应用无需抽样的变分贝叶斯方法,其实验结果说明此方法能快速推断模型参数.  相似文献   

5.
Dirichlet 过程及其在自然语言处理中的应用   总被引:2,自引:0,他引:2  
Dirichlet过程是一种典型的变参数贝叶斯模型,其优点是参数的个数和性质灵活可变,可通过模型和数据来自主地计算,近年来它已成为机器学习和自然语言处理研究领域中的一个研究热点。该文较为系统的介绍了Dirichlet过程的产生、发展,并重点介绍了其模型计算,同时结合自然语言处理中的具体应用问题进行了详细分析。最后讨论了Dirichlet过程未来的研究方向和发展趋势。  相似文献   

6.
主题模型是挖掘微博潜在主题的重要工具.然而,现有的主题模型多由 Latent Dirichlet Allocation (LDA)派生,它需要用户预先指定主题数目.为了自动挖掘微博主题,作者提出了一个基于分层 Dirichlet 过程(Hierarchical Dirichlet Process,HDP)的非参数贝叶斯模型 MB-HDP.首先,针对微博应用场景,假设消息是不可交换的;接着,利用微博的时间信息、用户兴趣以及话题标签,聚合主题相关的消息以解决微博短文本的数据稀疏问题;然后,扩展Chinese Restaurant Franchise (CRF)对微博数据进行主题建模;最后,设计一个相应的 Markov Chain Monte Carlo (MCMC)采样方法,推导 MB-HDP 模型的分布参数.实验表明,在生成主题质量、内容困惑度和模型复杂度等指标上,MB-HDP 模型明显优于 LDA 和 HDP 两种模型.  相似文献   

7.
认知无线电网络通过动态频谱接入技术,利用授权频段的空闲时段实现频谱共享。对频谱利用特征的描述和未来利用率的预测有利于实现高效频谱感知算法,进而优化频谱接入策略。通过对标准的分层Dirichlet过程进行扩展,提出了一种跨信道的非参数贝叶斯模型UTD-HDP(UTD扩展的分层Dirichlet过程),用于无线频谱利用率数据的聚类分析和分布参数估计。利用该模型,可以自适应地描述无线频谱利用率的特征,实现了对未来时间频谱利用率的高精度预测。  相似文献   

8.
基于目标出生强度在线估计的多目标跟踪算法   总被引:1,自引:0,他引:1  
针对多目标跟踪中未知的目标出生强度, 提出了基于Dirichlet分布的目标出生强度在线估计算法, 来改进概率假设密度滤波器在多目标跟踪中的性能. 算法采用有限混合模型来描述未知目标出生强度, 使用仅依赖于混合权重的负指数Dirichlet分布作为混合模型参数的先验分布. 利用拉格朗日乘子法推导了混合权重在极大后验意义下的在线估计公式; 混合权重在线估计过程利用了负指数Dirichlet分布的不稳定性, 驱使与目标出生数据不相关分量的消亡. 以随机近似过程为分量均值和方差的在线估计策略, 推导了基于缺失数据的分量均值与方差的在线估计公式. 在无法获得初始步出生目标先验分布的约束下, 提出了在混合模型上增加均匀分量的初始化方法. 以当前时刻的多目标状态估计值为出发点, 提出了利用概率假设密度滤波器消弱杂波影响的出生目标数据获取方法. 仿真结果表明, 提出的目标出生强度在线估计算法改进了概率假设密度滤波器在多目标跟踪中的性能.  相似文献   

9.
提出一种采用非参数Dirichlet过程混合模型实现图像自动分割的算法。该方法在图像分割时不需要对分类数进行初始化,具有在分割过程中自动获得图像分类数的特点。模型中使用有控制参数的随机变量来代替聚类数,通过调整参数来指定聚类数的范围。使用该算法对具有高噪声的自然图像和临床磁共振图像进行分割实验,并与其他分割算法进行比较。实验结果显示本算法抗噪声性能强,且可以抑制磁共振图像分割过程中的偏场效应。准确度分析显示,图像分割结果的Dice相似性系数均高于90%,表明提出的新算法具有很高的精确性和鲁棒性。  相似文献   

10.
闫小喜  韩崇昭 《自动化学报》2011,37(11):1313-1321
针对概率假设密度(Probability hypothesis density, PHD)高斯混合实现算法中的分量删减问题, 提出了基于Dirichlet分布的分量删减算法以改进概率假设密度高斯混合实现算法的性能. 算法采用极大后验准则估计混合参数, 采用仅依赖于混合权重的负指数Dirichlet分布作为混合参数的先验分布, 利用拉格朗日乘子推导了混合权重的更新公式. 算法利用负指数Dirichlet分布的不稳定性,在极大后验迭代过程中驱使与目标强度不相关的分量消亡. 该不稳定性还能够解决多个相近分量共同描述一个强度峰值的问题, 有利于后续多目标状态的提取. 仿真结果表明, 基于Dirichlet分布的分量删减算法优于典型高斯混合实现中的删减算法.  相似文献   

11.
We develop hierarchical, probabilistic models for objects, the parts composing them, and the visual scenes surrounding them. Our approach couples topic models originally developed for text analysis with spatial transformations, and thus consistently accounts for geometric constraints. By building integrated scene models, we may discover contextual relationships, and better exploit partially labeled training images. We first consider images of isolated objects, and show that sharing parts among object categories improves detection accuracy when learning from few examples. Turning to multiple object scenes, we propose nonparametric models which use Dirichlet processes to automatically learn the number of parts underlying each object category, and objects composing each scene. The resulting transformed Dirichlet process (TDP) leads to Monte Carlo algorithms which simultaneously segment and recognize objects in street and office scenes.  相似文献   

12.
In this paper we analyze the problem of learning and updating of uncertainty in Dirichlet models, where updating refers to determining the conditional distribution of a single variable when some evidence is known. We first obtain the most general family of prior-posterior distributions which is conjugate to a Dirichlet likelihood and we identify those hyperparameters that are influenced by data values. Next, we describe some methods to assess the prior hyperparameters and we give a numerical method to estimate the Dirichlet parameters in a Bayesian context, based on the posterior mode. We also give formulas for updating uncertainty by determining the conditional probabilities of single variables when the values of other variables are known. A time series approach is presented for dealing with the cases in which samples are not identically distributed, that is, the Dirichlet parameters change from sample to sample. This typically occurs when the population is observed at different times. Finally, two examples are given that illustrate the learning and updating processes and the time series approach.  相似文献   

13.
Clustering analysis aims to group a set of similar data objects into the same cluster. Topic models, which belong to the soft clustering methods, are powerful tools to discover latent clusters/topics behind large data sets. Due to the dynamic nature of temporal data, clusters often exhibit complicated patterns such as birth, branch and death. However, most existing temporal clustering models assume that clusters evolve as a linear chain, and they cannot model and detect branching of clusters. In this paper, we present evolving Dirichlet processes (EDP for short) to model nonlinear evolutionary traces behind temporal data, especially for temporal text collections. In the setting of EDP, temporal collections are divided into epochs. In order to model cluster branching over time, EDP allows each cluster in an epoch to form Dirichlet processes (DP) and uses a combination of the cluster-specific DPs as the prior for cluster distributions in the next epoch. To model hierarchical temporal data, such as online document collections, we propose a new class of evolving hierarchical Dirichlet processes (EHDP for short) which extends the hierarchical Dirichlet processes (HDP) to model evolving temporal data. We design an online learning framework based on Gibbs sampling to infer the evolutionary traces of clusters over time. In experiments, we validate that EDP and EHDP can capture nonlinear evolutionary traces of clusters on both synthetic and real-world text collections and achieve better results than its peers.  相似文献   

14.
分层狄利克雷过程是一种贝叶斯无参模型,用以分析海量数据的概率主题模型解决潜在狄利克雷分布无法解决的动态聚类的问题。本文从因子图的角度出发将消息传递算法与吉布斯采样算法结合用以解决贝叶斯无参模型后验概率推断问题,最终将该算法与LDA算法以及HDP算法在混淆度方面进行对比。实验结果表明该算法相比HDP采样算法收敛较快,最终也能收敛到LDA模型最优主题数目下的混淆度。  相似文献   

15.
The advent of mixture models has opened the possibility of flexible models which are practical to work with. A common assumption is that practitioners typically expect that data are generated from a Gaussian mixture. The inverted Dirichlet mixture has been shown to be a better alternative to the Gaussian mixture and to be of significant value in a variety of applications involving positive data. The inverted Dirichlet is, however, usually undesirable, since it forces an assumption of positive correlation. Our focus here is to develop a Bayesian alternative to both the Gaussian and the inverted Dirichlet mixtures when dealing with positive data. The alternative that we propose is based on the generalized inverted Dirichlet distribution which offers high flexibility and ease of use, as we show in this paper. Moreover, it has a more general covariance structure than the inverted Dirichlet. The proposed mixture model is subjected to a fully Bayesian analysis based on Markov Chain Monte Carlo (MCMC) simulation methods namely Gibbs sampling and Metropolis–Hastings used to compute the posterior distribution of the parameters, and on Bayesian information criterion (BIC) used for model selection. The adoption of this purely Bayesian learning choice is motivated by the fact that Bayesian inference allows to deal with uncertainty in a unified and consistent manner. We evaluate our approach on the basis of two challenging applications concerning object classification and forgery detection.  相似文献   

16.
This paper proposes an unsupervised algorithm for learning a finite mixture of scaled Dirichlet distributions. Parameters estimation is based on the maximum likelihood approach, and the minimum message length (MML) criterion is proposed for selecting the optimal number of components. This research work is motivated by the flexibility issues of the Dirichlet distribution, the widely used model for multivariate proportional data, which has prompted a number of scholars to search for generalizations of the Dirichlet. By introducing the extra parameters of the scaled Dirichlet, several useful statistical models could be obtained. Experimental results are presented using both synthetic and real datasets. Moreover, challenging real-world applications are empirically investigated to evaluate the efficiency of our proposed statistical framework.  相似文献   

17.
Data clustering is a fundamental unsupervised learning task in several domains such as data mining, computer vision, information retrieval, and pattern recognition. In this paper, we propose and analyze a new clustering approach based on both hierarchical Dirichlet processes and the generalized Dirichlet distribution, which leads to an interesting statistical framework for data analysis and modelling. Our approach can be viewed as a hierarchical extension of the infinite generalized Dirichlet mixture model previously proposed in Bouguila and Ziou (IEEE Trans Neural Netw 21(1):107–122, 2010). The proposed clustering approach tackles the problem of modelling grouped data where observations are organized into groups that we allow to remain statistically linked by sharing mixture components. The resulting clustering model is learned using a principled variational Bayes inference-based algorithm that we have developed. Extensive experiments and simulations, based on two challenging applications namely images categorization and web service intrusion detection, demonstrate our model usefulness and merits.  相似文献   

18.
垃圾邮件处理中LDA特征选择方法   总被引:1,自引:0,他引:1       下载免费PDF全文
垃圾邮件处理是一项长期研究课题,越来越多的文本分类技术被移植到垃圾邮件处理应用当中。LDA(Latent Dirichlet Allocation)等topic模型在自动摘要、信息获取和其他离散数据应用中受到越来越多的关注。将LDA模型作为一种特征选择方法,引入垃圾邮件处理应用中。将LDA特征选择方法与质心+KNN分类器结合,得到简单的测试用垃圾邮件过滤器。初步实验结果表明,基于LDA的特征选择方法优于通常的IG、MI特征选择方法;测试过滤器的过滤性能与其他过滤器相当。  相似文献   

19.
层次主题模型是构建主题层次的重要工具. 现有的层次主题模型大多通过在主题模型中引入nCRP构造方法, 为文档主题提供树形结构的先验分布, 但无法生成具有明确领域涵义的主题层次结构, 即领域主题层次. 同时, 领域主题不仅存在层次关系, 而且不同父主题下的子主题之间还存在子领域方面共享的关联关系, 在现有主题关系研究中没有合适的模型来生成这种领域主题层次. 为了从领域文本中自动、有效地挖掘出领域主题的层次关系和关联关系, 在4个方面进行创新研究. 首先, 通过主题共享机制改进nCRP构造方法, 提出nCRP+层次构造方法, 为主题模型中的主题提供具有分层主题方面共享的树形先验分布; 其次, 结合nCRP+和HDP模型构建重分层的Dirichlet过程, 提出rHDP (reallocated hierarchical Dirichlet processes)层次主题模型; 第三, 结合领域分类信息、词语语义和主题词的领域代表性, 定义领域知识, 包括基于投票机制的领域隶属度、词语与领域主题的语义相关度和层次化的主题-词语贡献度; 最后, 通过领域知识改进rHDP主题模型中领域主题和主题词的分配过程, 提出结合领域知识的层次主题模型rHDP_DK (rHDP with domain knowledge), 并改进采样过程. 实验结果表明, 基于nCRP+的层次主题模型在评价指标方面均优于基于nCRP的层次主题模型(hLDA, nHDP)和神经主题模型(TSNTM); 通过rHDP_DK模型生成的主题层次结构具有领域主题层次清晰、关联子主题的主题词领域差异明确的特点. 此外, 该模型将为领域主题层次提供一个通用的自动挖掘框架.  相似文献   

20.
Finite mixture models have been applied for different computer vision, image processing and pattern recognition tasks. The majority of the work done concerning finite mixture models has focused on mixtures for continuous data. However, many applications involve and generate discrete data for which discrete mixtures are better suited. In this paper, we investigate the problem of discrete data modeling using finite mixture models. We propose a novel, well motivated mixture that we call the multinomial generalized Dirichlet mixture. The novel model is compared with other discrete mixtures. We designed experiments involving spatial color image databases modeling and summarization, and text classification to show the robustness, flexibility and merits of our approach.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号