首页 | 本学科首页   官方微博 | 高级检索  
     


Synthetic minority oversampling for function approximation problems
Authors:Lourdes Pelayo  Scott Dick
Affiliation:Department of Electrical and Computer Engineering, University of Alberta, Edmonton, Alberta, Canada
Abstract:Imbalanced data sets are a common occurrence in important machine learning problems. Research in improving learning under imbalanced conditions has largely focused on classification problems (ie, problems with a categorical dependent variable). However, imbalanced data also occur in function approximation, and far less attention has been paid to this case. We present a novel stratification approach for imbalanced function approximation problems. Our solution extends the SMOTE oversampling preprocessing technique to continuous-valued dependent variables by identifying regions of the feature space with a low density of examples and high variance in the dependent variable. Synthetic examples are then generated between nearest neighbors in these regions. In an empirical validation, our approach reduces the normalized mean-squared prediction error in 18 out of 21 benchmark data sets, and compares favorably with state-of-the-art approaches.
Keywords:data mining  learning from imbalanced data sets  machine learning  sample selection bias  stratification
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号