首页 | 本学科首页   官方微博 | 高级检索  
     


An Experimental Comparison of Model-Based Clustering Methods
Authors:Meilă  Marina  Heckerman  David
Affiliation:(1) Microsoft Research, Redmond, WA 98052, USA
Abstract:We compare the three basic algorithms for model-based clustering on high-dimensional discrete-variable datasets. All three algorithms use the same underlying model: a naive-Bayes model with a hidden root node, also known as a multinomial-mixture model. In the first part of the paper, we perform an experimental comparison between three batch algorithms that learn the parameters of this model: the Expectation–Maximization (EM) algorithm, a ldquowinner take allrdquo version of the EM algorithm reminiscent of the K-means algorithm, and model-based agglomerative clustering. We find that the EM algorithm significantly outperforms the other methods, and proceed to investigate the effect of various initialization methods on the final solution produced by the EM algorithm. The initializations that we consider are (1) parameters sampled from an uninformative prior, (2) random perturbations of the marginal distribution of the data, and (3) the output of agglomerative clustering. Although the methods are substantially different, they lead to learned models that are similar in quality.
Keywords:clustering  model-based clustering  naive-Bayes model  multinomial-mixture model  EM algorithm  agglomerative clustering  initialization
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号