首页 | 本学科首页   官方微博 | 高级检索  
     


MODL: A Bayes optimal discretization method for continuous attributes
Authors:Marc Boullé
Affiliation:(1) France Telecom R&D, 2, Avenue Pierre Marzin, 22300 Lannion, France
Abstract:While real data often comes in mixed format, discrete and continuous, many supervised induction algorithms require discrete data. Efficient discretization of continuous attributes is an important problem that has effects on speed, accuracy and understandability of the induction models. In this paper, we propose a new discretization method MODL1, founded on a Bayesian approach. We introduce a space of discretization models and a prior distribution defined on this model space. This results in the definition of a Bayes optimal evaluation criterion of discretizations. We then propose a new super-linear optimization algorithm that manages to find near-optimal discretizations. Extensive comparative experiments both on real and synthetic data demonstrate the high inductive performances obtained by the new discretization method. Editor: Tom Fawcett 1French patent No. 04 00179.
Keywords:Data mining  Machine learning  Discretization  Bayesianism  Data analysis
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号