首页 | 本学科首页   官方微博 | 高级检索  
     

SMOTE过采样及其改进算法研究综述
引用本文:石洪波,陈雨文,陈鑫.SMOTE过采样及其改进算法研究综述[J].智能系统学报,2019,14(6):1073-1083.
作者姓名:石洪波  陈雨文  陈鑫
作者单位:山西财经大学 信息学院, 山西 太原 030031
摘    要:近年来不平衡分类问题受到广泛关注。SMOTE过采样通过添加生成的少数类样本改变不平衡数据集的数据分布,是改善不平衡数据分类模型性能的流行方法之一。本文首先阐述了SMOTE的原理、算法以及存在的问题,针对SMOTE存在的问题,分别介绍了其4种扩展方法和3种应用的相关研究,最后分析了SMOTE应用于大数据、流数据、少量标签数据以及其他类型数据的现有研究和面临的问题,旨在为SMOTE的研究和应用提供有价值的借鉴和参考。

关 键 词:不平衡数据分类  SMOTE  算法  k-NNk-NN  过采样  欠采样  高维数据  分类型数据

Summary of research on SMOTE oversampling and its improved algorithms
SHI Hongbo,CHEN Yuwen,CHEN Xin.Summary of research on SMOTE oversampling and its improved algorithms[J].CAAL Transactions on Intelligent Systems,2019,14(6):1073-1083.
Authors:SHI Hongbo  CHEN Yuwen  CHEN Xin
Affiliation:School of Information, Shanxi University of Finance and Economics, Taiyuan, Shanxi, 030031
Abstract:In recent years, the problem of imbalanced classification has received considerable attention. The synthetic minority oversampling technique (SMOTE), a popular method for improving the classification performance of imbalanced data, adds generated minority samples to change the distribution of imbalanced data sets. In this paper, we first describe the fundamentals, algorithms, and existing problems of SMOTE. Then, with respect to the existing problems of SMOTE, we introduce related research on four types of extension methods and three types of applications. Finally, to provide valuable reference information for the research and application of SMOTE, we analyze the existing difficulties of applying SMOTE to big data, streaming data, a small amount of label data, and other types of data.
Keywords:imbalanced data classification  SMOTE  algorithm  k-NNk-NN  oversampling  undersampling  high dimensional data  categorical data
点击此处可从《智能系统学报》浏览原始摘要信息
点击此处可从《智能系统学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号