Noise reduction for instance-based learning with a local maximal margin approach |
| |
Authors: | Nicola Segata Enrico Blanzieri Sarah Jane Delany Pádraig Cunningham |
| |
Affiliation: | (1) Dipartimento di Ingegneria e Scienza dellInformazione, University of Trento, Trento, Italy;(2) Dublin Institute of Technology, Dublin, Ireland;(3) Computer Science, University College Dublin, Dublin, Ireland |
| |
Abstract: | To some extent the problem of noise reduction in machine learning has been finessed by the development of learning techniques
that are noise-tolerant. However, it is difficult to make instance-based learning noise tolerant and noise reduction still
plays an important role in k-nearest neighbour classification. There are also other motivations for noise reduction, for instance the elimination of noise
may result in simpler models or data cleansing may be an end in itself. In this paper we present a novel approach to noise
reduction based on local Support Vector Machines (LSVM) which brings the benefits of maximal margin classifiers to bear on
noise reduction. This provides a more robust alternative to the majority rule on which almost all the existing noise reduction
techniques are based. Roughly speaking, for each training example an SVM is trained on its neighbourhood and if the SVM classification
for the central example disagrees with its actual class there is evidence in favour of removing it from the training set.
We provide an empirical evaluation on 15 real datasets showing improved classification accuracy when using training data edited
with our method as well as specific experiments regarding the spam filtering application domain. We present a further evaluation
on two artificial datasets where we analyse two different types of noise (Gaussian feature noise and mislabelling noise) and
the influence of different class densities. The conclusion is that LSVM noise reduction is significantly better than the other
analysed algorithms for real datasets and for artificial datasets perturbed by Gaussian noise and in presence of uneven class
densities. |
| |
Keywords: | |
本文献已被 SpringerLink 等数据库收录! |
|