首页 | 本学科首页   官方微博 | 高级检索  
     

面向不平衡数据集的改进型SMOTE算法
引用本文:王超学,;张涛,;马春森.面向不平衡数据集的改进型SMOTE算法[J].计算机科学与探索,2014(6):727-734.
作者姓名:王超学  ;张涛  ;马春森
作者单位:[1]西安建筑科技大学信息与控制工程学院,西安710055; [2]中国农业科学院植物保护研究所,北京100193
基金项目:The National Natural Science Foundation of China under Grant No.31170393,the Natural Science Foundation of Shaanxi Province of China under Grant No.2012JM8023,the Natural Science Fund of Education Depart-ment of Shaanxi Province under Grant No.12JK0726
摘    要:针对SMOTE(synthetic minority over-sampling technique)在合成少数类新样本时存在的不足,提出了一种改进的SMOTE算法GA-SMOTE。该算法的关键将是遗传算法中的3个基本算子引入到SMOTE中,利用选择算子实现对少数类样本有区别的选择,使用交叉、变异算子实现对合成样本质量的控制.结合GA-SMOTE与SVM(support vector machine)算法来处理不平衡数据的分类问题.UCI数据集上的大量实验表明,GA-SMOTE在新样本的整体合成效果上表现出色,有效提高了SVM在不平衡数据集上的分类性能。

关 键 词:不平衡数据集  分类  遗传算子  少数类样本合成过采样技术(SMOTE)  synthetic  minority  over-sampling  technique  (SMOTE)

Improved SMOTE Algorithm for Imbalanced Datasets
Affiliation:WANG Chaoxue, ZHANG Tao, MA Chunsen ( 1. School of Information and Control Engineering, Xi' an University of Architecture and Technology, Xi' an 710055, China 2. China Institute of Plant Protection, Chinese Academy of Agricultural Sciences, Beijing 100193, China)
Abstract:Based on analyzing the shortages of SMOTE (synthetic minority over-sampling technique) in the synthesis of minority class samples, this paper presents an improved SMOTE (GA-SMOTE). The key of GA-SMOTE lies on leading three basic genetic operators of genetic algorithm (GA) into SMOTE, making use of the selection operator to achieve the different samples from the minority class and depending on crossover operator and mutation operator to realize the fine control of the synthesis quality to the minority class samples. GA-SMOTE and SVM (support vector machine) are combined to handle the classification problem on imbalanced datasets. A large amount of experiments on the UCI datasets show that GA-SMOTE promises prominent synthesis effect to the minority class samples, and brings better classification performance on imbalanced datasets with SVM.
Keywords:imbalanced dataset  classification  genetic operator
本文献已被 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号