首页 | 本学科首页   官方微博 | 高级检索  
     

基于改进SMOTE的非平衡数据集分类研究
引用本文:王超学,潘正茂,董丽丽,马春森,张星. 基于改进SMOTE的非平衡数据集分类研究[J]. 计算机工程与应用, 2013, 49(2): 184-187,245
作者姓名:王超学  潘正茂  董丽丽  马春森  张星
作者单位:1. 西安建筑科技大学信息与控制工程学院,西安,710055
2. 中国农业科学院植物保护研究所,北京,100193
基金项目:国家自然科学基金,陕西省教育厅自然科学项目
摘    要:针对SMOTE(Synthetic Minority Over-sampling Technique)在合成少数类新样本时存在的不足,提出了一种改进的SMOTE算法(SSMOTE)。该算法的关键是将支持度概念和轮盘赌选择技术引入到SMOTE中,并充分利用了异类近邻的分布信息,实现了对少数类样本合成质量和数量的精细控制。将SSMOTE与KNN(K-Nearest Neighbor)算法结合来处理不平衡数据集的分类问题。通过在UCI数据集上与其他重要文献中的相关算法进行的大量对比实验表明,SSMOTE在新样本的整体合成效果上表现出色,有效提高了KNN在非平衡数据集上的分类性能。

关 键 词:非平衡数据集  分类  支持度  轮盘赌选择  合成少数过采样技术(SMOTE)

Research on classification for imbalanced dataset based on improved SMOTE
WANG Chaoxue , PAN Zhengmao , DONG Lili , MA Chunsen , ZHANG Xing. Research on classification for imbalanced dataset based on improved SMOTE[J]. Computer Engineering and Applications, 2013, 49(2): 184-187,245
Authors:WANG Chaoxue    PAN Zhengmao    DONG Lili    MA Chunsen    ZHANG Xing
Affiliation:1.School of Information and Control Engineering,Xi’an University of Architecture and Technology,Xi’an 710055,China 2.Institute of Plant Protection,Chinese Academy of Agricultural Sciences,Beijing 100193,China
Abstract:Based on analyzing the shortages of SMOTE(Synthetic Minority Over-sampling Technique),an improved SMOTE(SSMOTE)is presented.The key of SSMOTE lies on leading the concept of support and roulette wheel selection into SMOTE and making full use of the heterogeneous nearest-neighbor distribution information to achieve the fine control of the synthesis quality and quantity to the minority class samples.SSMOTE and KNN(K-Nearest Neighbor)are combined to handle the classification problem on imbalanced datasets,and extensive experiments are conducted to compare SSMOTE and algorithms in pertinent literatures on the UCI datasets.The simulation results show SSMOTE promises prominent synthesis effect to the minority class samples,and brings better classification performance on imbalanced datasets with KNN.
Keywords:imbalanced datasets  classification  support  roulette wheel selection  Synthetic Minority Over-sampling Technique(SMOTE)
本文献已被 CNKI 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号