首页 | 本学科首页   官方微博 | 高级检索  
     

一种基于重取样的代价敏感学习算法
引用本文:谷琼,袁磊,宁彬,熊启军,华丽,李文新.一种基于重取样的代价敏感学习算法[J].计算机工程与科学,2011,33(9):130.
作者姓名:谷琼  袁磊  宁彬  熊启军  华丽  李文新
作者单位:襄樊学院数学与计算机科学学院,湖北襄阳,441053
基金项目:国家自然科学基金,国家863计划资助项目,湖北省自然科学基金,湖北省教育厅中青年项目
摘    要:大多数非均衡数据集的研究集中于纯重构数据集或者纯代价敏感学习,本文针对数据集类分布非均衡和不相等误分类代价往往同时发生这一事实,提出了一种以最小误分类代价为目标的基于混合重取样的代价敏感学习算法。该算法将两种不同类型解决方案有机地融合在一起,先用样本类空间重构的方法使原始数据集的两类数据达到基本均衡,然后再引入代价敏感学习算法进行分类,能提高少数类分类精度,同时有效降低总的误分类代价。实验结果验证了该算法在处理非均衡类问题时比传统算法要优越。

关 键 词:分类  非均衡数据集  混合重取样  代价敏感学习

A Novel Cost Sensitive Learning Algorithm Based on Re-sampling
GU Qiong,YUAN Lei,NING Bin,XIONG Qi-jun,HUA Li,LI Wen-xin.A Novel Cost Sensitive Learning Algorithm Based on Re-sampling[J].Computer Engineering & Science,2011,33(9):130.
Authors:GU Qiong  YUAN Lei  NING Bin  XIONG Qi-jun  HUA Li  LI Wen-xin
Abstract:Most studies on the imbalanced data set classification focus on the discussion of re-sampling or cost-sensitive learning systems themselves;however,the fact that the costs of imbalanced class distribution and unequal misclassification errors always occur simultaneously is neglected.We propose a novel cost sensitive learning(CSL) algorithm which combines the methods of re-sampling and the CSL techniques together in order to solve the misclassification problem of imbalanced data set.On one hand,the re-sampling technique allows the balanced data sets by reconstructing both the majority and the minority class.On the other hand,the classification is performed based on the minimal misclassification cost but not the maximal accuracy.Here the misclassification cost for the minority class is much higher than the misclassification cost for the majority class.A cost-sensitive learning procedure is then conducted for classification.The experimental results show that the proposed method can improve the classification accuracy and decrease the misclassification cost effectively,and the algorithm is superior to the traditional algorithms as for dealing with the imbalanced problem.
Keywords:classification  imbalanced dataset  hybrid re-sampling  cost sensitive learning
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机工程与科学》浏览原始摘要信息
点击此处可从《计算机工程与科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号