首页 | 本学科首页   官方微博 | 高级检索  
     


Discretisation of Continuous Commercial Database Features for a Simulated Annealing Data Mining Algorithm
Authors:Justin C.W. Debuse  Victor J. Rayward-Smith
Affiliation:(1) School of Information Systems, University of East Anglia, Norwich, NR4 7TJ, UK
Abstract:An introduction to the approaches used to discretise continuous database features is given, together with a discussion of the potential benefits of such techniques. These benefits are investigated by applying discretisation algorithms to two large commercial databases; the discretisations yielded are then evaluated using a simulated annealing based data mining algorithm. The results produced suggest that dramatic reductions in problem size may be achieved, yielding improvements in the speed of the data mining algorithm. However, it is also demonstrated under certain circumstances that the discretisation produced may give an increase in problem size or allow overfitting by the data mining algorithm. Such cases, within which often only a small proportion of the database belongs to the class of interest, highlight the need both for caution when producing discretisations and for the development of more robust discretisation algorithms.
Keywords:discretisation  data mining  simulated annealing
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号