MODL: A Bayes optimal discretization method for continuous attributes |
| |
Authors: | Marc Boullé |
| |
Affiliation: | (1) France Telecom R&D, 2, Avenue Pierre Marzin, 22300 Lannion, France |
| |
Abstract: | While real data often comes in mixed format, discrete and continuous, many supervised induction algorithms require discrete
data. Efficient discretization of continuous attributes is an important problem that has effects on speed, accuracy and understandability
of the induction models. In this paper, we propose a new discretization method MODL1, founded on a Bayesian approach. We introduce a space of discretization models and a prior distribution defined on this model
space. This results in the definition of a Bayes optimal evaluation criterion of discretizations. We then propose a new super-linear
optimization algorithm that manages to find near-optimal discretizations. Extensive comparative experiments both on real and
synthetic data demonstrate the high inductive performances obtained by the new discretization method.
Editor: Tom Fawcett
1French patent No. 04 00179. |
| |
Keywords: | Data mining Machine learning Discretization Bayesianism Data analysis |
本文献已被 SpringerLink 等数据库收录! |
|