Discretisation of Continuous Commercial Database Features for a Simulated Annealing Data Mining Algorithm |
| |
Authors: | Justin C.W. Debuse Victor J. Rayward-Smith |
| |
Affiliation: | (1) School of Information Systems, University of East Anglia, Norwich, NR4 7TJ, UK |
| |
Abstract: | An introduction to the approaches used to discretise continuous database features is given, together with a discussion of the potential benefits of such techniques. These benefits are investigated by applying discretisation algorithms to two large commercial databases; the discretisations yielded are then evaluated using a simulated annealing based data mining algorithm. The results produced suggest that dramatic reductions in problem size may be achieved, yielding improvements in the speed of the data mining algorithm. However, it is also demonstrated under certain circumstances that the discretisation produced may give an increase in problem size or allow overfitting by the data mining algorithm. Such cases, within which often only a small proportion of the database belongs to the class of interest, highlight the need both for caution when producing discretisations and for the development of more robust discretisation algorithms. |
| |
Keywords: | discretisation data mining simulated annealing |
本文献已被 SpringerLink 等数据库收录! |
|