首页 | 本学科首页   官方微博 | 高级检索  
     


Scalable pattern mining with Bayesian networks as background knowledge
Authors:Szymon Jaroszewicz  Tobias Scheffer  Dan A. Simovici
Affiliation:(1) National Institute of Telecommunications, Warsaw, Poland;(2) Max Planck Institute for Computer Science, Saarbrucken, Germany;(3) University of Massachusetts at Boston, Boston, MA, USA
Abstract:We study a discovery framework in which background knowledge on variables and their relations within a discourse area is available in the form of a graphical model. Starting from an initial, hand-crafted or possibly empty graphical model, the network evolves in an interactive process of discovery. We focus on the central step of this process: given a graphical model and a database, we address the problem of finding the most interesting attribute sets. We formalize the concept of interestingness of attribute sets as the divergence between their behavior as observed in the data, and the behavior that can be explained given the current model. We derive an exact algorithm that finds all attribute sets whose interestingness exceeds a given threshold. We then consider the case of a very large network that renders exact inference unfeasible, and a very large database or data stream. We devise an algorithm that efficiently finds the most interesting attribute sets with prescribed approximation bound and confidence probability, even for very large networks and infinite streams. We study the scalability of the methods in controlled experiments; a case-study sheds light on the practical usefulness of the approach.
Keywords:Association rule  Background knowledge  Interestingness  Bayesian network  Data stream
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号