首页 | 本学科首页   官方微博 | 高级检索  
     


Improving Markov Chain Monte Carlo Model Search for Data Mining
Authors:Giudici  Paolo  Castelo  Robert
Affiliation:(1) Department of Economics and Quantitative Methods, University of Pavia, Via San Felice n. 5, 27100 Pavia, Italy;(2) Institute of Information and Computing Sciences, University of Utrecht, P.O. Box 80089, 3508 Utrecht, The Netherlands
Abstract:The motivation of this paper is the application of MCMC model scoring procedures to data mining problems, involving a large number of competing models and other relevant model choice aspects.To achieve this aim we analyze one of the most popular Markov Chain Monte Carlo methods for structural learning in graphical models, namely, the MC3 algorithm proposed by D. Madigan and J. York (International Statistical Review, 63, 215–232, 1995). Our aim is to improve their algorithm to make it an effective and reliable tool in the field of data mining. In such context, typically highly dimensional in the number of variables, little can be known a priori and, therefore, a good model search algorithm is crucial.We present and describe in detail our implementation of the MC3 algorithm, which provides an efficient general framework for computations with both Directed Acyclic Graphical (DAG) models and Undirected Decomposable Models (UDG). We believe that the possibility of commuting easily between the two classes of models constitutes an important asset in data mining, where an a priori knowledge of causal effects is usually difficult to establish.Furthermore, in order to improve the MC3 method we propose provide several graphical monitors which can help extracting results and assessing the goodness of the Markov chain Monte Carlo approximation to the posterior distribution of interest.We apply our proposed methodology first to the well-known coronary heart disease dataset (D. Edwards &; T. Havránek, Biometrika, 72:2, 339–351, 1985). We then introduce a novel data mining application which concerns market basket analysis.
Keywords:Bayesian structural learning  convergence diagnostics  Dirichlet distribution  market basket analysis  Markov chain Monte Carlo
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号