Synthetic minority oversampling for function approximation problems期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Synthetic minority oversampling for function approximation problems

Authors:	Lourdes Pelayo Scott Dick

Affiliation:	Department of Electrical and Computer Engineering, University of Alberta, Edmonton, Alberta, Canada

Abstract:	Imbalanced data sets are a common occurrence in important machine learning problems. Research in improving learning under imbalanced conditions has largely focused on classification problems (ie, problems with a categorical dependent variable). However, imbalanced data also occur in function approximation, and far less attention has been paid to this case. We present a novel stratification approach for imbalanced function approximation problems. Our solution extends the SMOTE oversampling preprocessing technique to continuous-valued dependent variables by identifying regions of the feature space with a low density of examples and high variance in the dependent variable. Synthetic examples are then generated between nearest neighbors in these regions. In an empirical validation, our approach reduces the normalized mean-squared prediction error in 18 out of 21 benchmark data sets, and compares favorably with state-of-the-art approaches.

Keywords:	data mining learning from imbalanced data sets machine learning sample selection bias stratification