首页 | 本学科首页   官方微博 | 高级检索  
     

基于AdaBoost的类不平衡学习算法
引用本文:秦孟梅,邱建林,陆鹏程,陈璐璐,赵伟康.基于AdaBoost的类不平衡学习算法[J].计算机应用研究,2017,34(11).
作者姓名:秦孟梅  邱建林  陆鹏程  陈璐璐  赵伟康
作者单位:南通大学 电子信息学院,南通大学 计算机科学与技术学院,南通大学 电子信息学院,南通大学 电子信息学院,南通大学 计算机科学与技术学院
基金项目:国家自然科学基金(NSF61202006/61272424),计算机软件新技术国家重点实验室开放课题(KFKT2012B29),江苏省自然科学基金(BK2010277),江苏省科技创新基金(BC2013167),江苏省高校自然科学基金(12KJB520014)。
摘    要:处理类不平衡数据时,少数类的边界实例非常容易被错分。为了降低类不平衡对分类器性能的影响,提出了自适应边界采样算法(AB-SMOTE)。AB-SMOTE算法对少数类的边界样本进行自适应采样,提高了数据集的平衡度和有效性。同时将AB-SMOTE算法与数据清理技术融合,形成基于AdaBoost的集成算法ABTAdaBoost。ABTAdaBoost算法主要包括三个阶段:第一个阶段对训练数据集采用AB-SMOTE算法,降低数据集的类不平衡度;第二个阶段使用Tomek links数据清理技术,清除数据集中的噪声和抽样方法产生的重叠样例,有效提高数据的可用性;第三个阶段使用AdaBoost集成算法生成一个基于N个弱分类器的集成分类器。实验分别以J48决策树和朴素贝叶斯作为基分类器,在12个UCI数据集上的实验结果表明:ABTAdaBoost算法的预测性能优于其它几种算法。

关 键 词:机器学习  类不平衡学习  集成学习  SMOTE  数据清理技术  
收稿时间:2016/8/5 0:00:00
修稿时间:2017/8/28 0:00:00

AdaBoost-based class imbalance learning algorithm
Qin Mengmei,Qiu Jianlin,Lu Pengcheng,Chen Lulu and Zhao Weikang.AdaBoost-based class imbalance learning algorithm[J].Application Research of Computers,2017,34(11).
Authors:Qin Mengmei  Qiu Jianlin  Lu Pengcheng  Chen Lulu and Zhao Weikang
Affiliation:School of electronic information,Nantong University,,School of electronic information,Nantong University,School of electronic information,Nantong University,School of Computer Science and Technology,Nantong University
Abstract:When dealing with unbalanced data sets, the borderline examples of the minority class are more easily misclassified. To reduce the impact of class imbalanced about the performance of classifier, this paper presents a Adaptive Borderline-SMOTE(AB-SMOTE). AB-SMOTE algorithm samples the boundary samples of the minority adaptively, which improves the degree of balance and efficiency of the data sets. At the same time, the AB-SMOTE algorithm is combined with the data cleaning technology to form a new ensemble algorithm ABTAdaBoost based on AdaBoost. ABTAdaBoost algorithm consists of three stages: in the first stage, the training data sets adopt AB-SMOTE algorithm to reduce the degree of imbalance of data sets; in the second stage, using Tomek links data cleaning techniques to remove the noise and overlapping instances which are introduced from sampling methods in the data sets, the availability of data is improved at the same time; in the third stage, using the AdaBoost algorithm to generate an ensemble classifier based on N weak classifier. Experiment uses J48 decision tree and naive bayes as the base classifier respectively, the results show that ABTAdaBoost algorithm has the best overall performance compared with other algorithms in 12 UCI data sets.
Keywords:machine learning  class imbalance learning  ensemble learning  SMOTE  data cleaning techniques  
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号