首页 | 本学科首页   官方微博 | 高级检索  
     

改进SMOTE的非平衡数据集分类算法研究
引用本文:赵清华,张艺豪,马建芬,段倩倩. 改进SMOTE的非平衡数据集分类算法研究[J]. 计算机工程与应用, 2018, 54(18): 168-173. DOI: 10.3778/j.issn.1002-8331.1705-0334
作者姓名:赵清华  张艺豪  马建芬  段倩倩
作者单位:太原理工大学?信息工程学院&新型传感器和智能控制教育部(山西)重点实验室 微纳系统研究中心,太原?030600
摘    要:针对随机森林和SMOTE组合算法在处理不平衡数据集上存在数据集边缘化分布以及计算复杂度大等问题,提出了基于SMOTE的改进算法TSMOTE(triangle SMOTE)和MDSMOTE(Max Distance SMOTE),其核心思想是将新样本的产生限制在一定区域,使得样本集分布趋于中心化,用更少的正类样本点人为构造样本,从而达到限制样本区域、降低算法复杂度的目的。在6种不平衡数据集上的大量实验表明,改进算法与传统算法相比,算法消耗时间大幅减少,取得更高的G-mean值、F-value值和AUC值。

关 键 词:随机森林  SMOTE算法  不平衡数据集  

Research on classification algorithm of imbalanced datasets based on improved SMOTE
ZHAO Qinghua,ZHANG Yihao,MA Jianfen,DUAN Qianqian. Research on classification algorithm of imbalanced datasets based on improved SMOTE[J]. Computer Engineering and Applications, 2018, 54(18): 168-173. DOI: 10.3778/j.issn.1002-8331.1705-0334
Authors:ZHAO Qinghua  ZHANG Yihao  MA Jianfen  DUAN Qianqian
Affiliation:MicroNano System Research Center, College of Information Engineering & Key Lab of Advanced Transducers and Intelligent Control System(Ministry of Education), Taiyuan University of Technology, Taiyuan 030600, China
Abstract:There are dataset marginal distribution problem and the computational complexity shortcomings using random forest combined SMOTE algorithm in dealing with imbalanced dataset. This paper proposes a TSMOTE algorithm (triangle SMOTE) and MDSMOTE algorithm (Max Distance SMOTE). The core idea of the improved algorithm is to restrict the generation  of new samples in a certain area, so that the distribution of the sample set tends to be centralized, which reduces the complexity of the traditional SMOTE algorithm and the time complexity of the algorithm. Extensive experiments on six imbalanced datasets show that the improved algorithm reduces the time consumption and achieves higher G-mean value, F-value value, AUC value compared with the state-of-art method SMOTE.
Keywords:random forest  SMOTE algorithm  imbalanced dataset  
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号