首页 | 本学科首页   官方微博 | 高级检索  
     


Self-Organizing Map Oversampling (SOMO) for imbalanced data set learning
Affiliation:1. Department of Industrial Management, University of Seville, Spain;2. Engineering School, University of Oviedo, Spain;1. DEI, University of Padua, viale Gradenigo 6, Padua, Italy;2. BioMediTech Institute and Faculty of Biomedical Sciences and Engineering, Tampere University of Technology, BioMediTech, Tampere, Finland;3. Computer Information Systems, Missouri State University, 901 S. National, Springfield, MO 65804, USA
Abstract:Learning from imbalanced datasets is challenging for standard algorithms, as they are designed to work with balanced class distributions. Although there are different strategies to tackle this problem, methods that address the problem through the generation of artificial data constitute a more general approach compared to algorithmic modifications. Specifically, they generate artificial data that can be used by any algorithm, not constraining the options of the user. In this paper, we present a new oversampling method, Self-Organizing Map-based Oversampling (SOMO), which through the application of a Self-Organizing Map produces a two dimensional representation of the input space, allowing for an effective generation of artificial data points. SOMO comprises three major stages: Initially a Self-Organizing Map produces a two-dimensional representation of the original, usually high-dimensional, space. Next it generates within-cluster synthetic samples and finally it generates between cluster synthetic samples. Additionally we present empirical results that show the improvement in the performance of algorithms, when artificial data generated by SOMO are used, and also show that our method outperforms various oversampling methods.
Keywords:
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号