首页 | 本学科首页   官方微博 | 高级检索  
     

基于Boosting的不平衡数据分类算法研究
引用本文:李秋洁,茅耀斌,王执锉.基于Boosting的不平衡数据分类算法研究[J].计算机科学,2011,38(12):224-228.
作者姓名:李秋洁  茅耀斌  王执锉
作者单位:南京理工大学自动化学院 南京210094
基金项目:国家自然科学基金(60974129,70931002)资助
摘    要:研究基于boosting的不平衡数据分类算法,归纳分析现有算法,在此基础上提出权重采样boosting算法。对样本进行权重采样,改变原有数据分布,从而得到适用于不平衡数据的分类器。算法本质是利用采样函数调整原始boosting损失函数形式,进一步强调正样本的分类损失,使得分类器侧重对正样本的有效判别,提高正样本的整体识别率。算法实现简单,实用性强,在UCI数据集上的实验结果表明,对于不平衡数据分类问题,权重采样boosting优于原始boosting及前人算法。

关 键 词:不平衡数据分类,Boosting,采样

Research on Boosting-based Imbalanced Data Classification
L Qiu-jie,MAO Yao-bin,WANG Zhi-quan.Research on Boosting-based Imbalanced Data Classification[J].Computer Science,2011,38(12):224-228.
Authors:L Qiu-jie  MAO Yao-bin  WANG Zhi-quan
Affiliation:LI Qiu-jie MAO Yao-bin WANG Zhi-quan(School of Automation,Nanjing University of Science and Technology,Nanjing 210094,China)
Abstract:This paper aimed to investigate boosting-based unbalanced data classification algorithms. hhrough the deep analysis of existing algorithms, a weight sampling boosting algorithm was proposed. Changing the data distribution by weight sampling,the trained classifier was made suitable for unbalanced data classification. The natural of the proposed algorithm is that the loss function of naW c boosting is adjusted by the sampling function and the positive examples are emphasized so that the classifier focuses on correctly classifying these examples and finally the recognition rate of positive examples is improved. The new algorithm is simple and practical and has been shown to outperform naive boosting and previous algorithms in the problem of unbalanced data classification on the UCI data sets.
Keywords:Imbalanced data classification  Boosting  Sampling
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《计算机科学》浏览原始摘要信息
点击此处可从《计算机科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号