首页 | 本学科首页   官方微博 | 高级检索  
     

不均衡数据集学习中基于初分类的过抽样算法
引用本文:韩慧,王路,温明,王文渊.不均衡数据集学习中基于初分类的过抽样算法[J].计算机应用,2006,26(8):1894-1897.
作者姓名:韩慧  王路  温明  王文渊
作者单位:清华大学,自动化系,北京,100084
摘    要:为了有效地提高不均衡数据集中少数类的分类性能,提出了基于初分类的过抽样算法。首先,对测试集进行初分类,以尽可能多地保留多数类的有用信息;其次,对于被初分类预测为少数类的样本进行再次分类,以有效地提高少数类的分类性能。使用美国加州大学欧文分校的数据集将基于初分类的过抽样算法与合成少数类过抽样算法、欠抽样方法进行了实验比较。结果表明,基于初分类的过抽样算法的少数类与多数类的分类性能都优于其他两种算法。

关 键 词:不均衡数据集  过抽样  欠抽样
文章编号:1001-9081(2006)08-1894-04
收稿时间:2006-03-01
修稿时间:2006-03-012006-05-08

Over-sampling algorithm based on preliminary classification in imbalanced data sets learning
HAN Hui,WANG Lu,WEN Ming,WANG Wen-yuan.Over-sampling algorithm based on preliminary classification in imbalanced data sets learning[J].journal of Computer Applications,2006,26(8):1894-1897.
Authors:HAN Hui  WANG Lu  WEN Ming  WANG Wen-yuan
Affiliation:Department of Automation, Tsinghua University, Belting 100084, China
Abstract:To significantly improve the classification performance of the minority class, an over-sampling algorithm based on preliminary classification was presented. Firstly, preliminary classification was made on the test data in order to save the useful information of the majority class as much as possible, Then the test data that were predicted to belong to minority class were reclassified to improve the classification performance of the minority class. Using the data sets provided by University of California, Irvine, the new algorithm was compared with synthetic minority over-sampling technique and under-sampling method. The experimental results show that the new algorithm performs better than the others in terms of the classification performance of the minority class and majority class.
Keywords:imbalanced data sets  over-sampling  under-sampling
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号