首页 | 本学科首页   官方微博 | 高级检索  
     

结合样本局部密度的非平衡数据集成分类算法
引用本文:杨浩,陈红梅.结合样本局部密度的非平衡数据集成分类算法[J].计算机科学与探索,2020,14(2):274-284.
作者姓名:杨浩  陈红梅
作者单位:西南交通大学 信息科学与技术学院,成都 611756;西南交通大学 信息科学与技术学院,成都 611756;云计算与智能技术高校重点实验室(西南交通大学),成都 611756
基金项目:国家自然科学基金No.61572406~~
摘    要:传统的过采样方法是解决非平衡数据分类问题的有效方法之一。基于SMOTE的过采样方法在数据集出现类别重叠(class-overlapping)和小析取项(small-disjuncts)问题时将降低采样的效果,针对该问题提出了一种基于样本局部密度的过采样算法MOLAD。在此基础上,为了解决非平衡数据的分类问题,提出了一种在采样阶段将MOLAD算法和基于Bagging的集成学习结合的算法LADBMOTE。LADBMOTE首先根据MOLAD计算每个少数类样本的K近邻,然后选择所有的K近邻进行采样,生成K个平衡数据集,最后利用基于Bagging的集成学习方法将K个平衡数据集训练得到的分类器集成。在KEEL公开的20个非平衡数据集上,将提出的LADBMOTE算法与当前流行的7个处理非平衡数据的算法对比,实验结果表明LADBMOTE在不同的分类器上的分类性能更好,鲁棒性更强。

关 键 词:非平衡数据  近邻计算策略  集成学习  过采样

Ensemble Classification Algorithm for Imbalanced Data Combined with Local Area Density
YANG Hao,CHEN Hongmei.Ensemble Classification Algorithm for Imbalanced Data Combined with Local Area Density[J].Journal of Frontier of Computer Science and Technology,2020,14(2):274-284.
Authors:YANG Hao  CHEN Hongmei
Affiliation:(School of Information Science and Technology,Southwest Jiaotong University,Chengdu 611756,China;Key Laboratory of Cloud Computing and Intelligent Technology,Southwest Jiaotong University,Chengdu 611756,China)
Abstract:Oversampling method is one of the effective ways to deal with imbalance classification problems.This paper focuses on the problems of reducing sampling result faced by the oversampling methods based on SMOTE(synthetic minority oversampling technique)in the occurrence of class-overlapping and small-disjuncts in dataset.An oversampling method MOLAD based on the local area density is proposed.Furthermore,a method LADBMOTE which combines the MOLAD and Bagging-based ensemble learning in sampling stage is proposed in order to solve the classification problem for imbalanced dataset.LADBMOTE first calculates the K nearest neighbors of each minority class sample according to MOLAD,and then selects all K nearest neighbors for sampling,thus K balanced datasets will be generated.Then,the Bagging-based ensemble learning is used to ensemble classifiers obtained from training K balanced datasets.The proposed method LADBMOTE is compared with 7 currently popular algorithms for handling imbalanced data by employing 20 imbalanced datasets published on KEEL.The experimental results show that the classification performance of LADBMOTE on different classifiers is better and more robust.
Keywords:imbalanced data  strategy for calculating nearest neighbors  ensemble learning  oversampling
本文献已被 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号