首页 | 本学科首页   官方微博 | 高级检索  
     

基于聚类的少数类样本采样方法
引用本文:陈川,张化祥.基于聚类的少数类样本采样方法[J].山东电子,2011(5):65-68.
作者姓名:陈川  张化祥
作者单位:[1]山东师范大学信息科学与工程学院,济南250014 [2]山东省分布式计算机软件新技术重点实验室,250014
基金项目:山东省科技研究计划项目(2008B0026,ZR2010FM021,2010G0020115)资助
摘    要:针对处理不平衡数据集的分类问题,SMOTE通过在相邻样例间线性插值实现少数类样例过采样。但SMOTE插值的结果是样例密集的地方依然相对密集,样例稀疏的地方依然相对稀疏,影响分类性能。针对该问题本文提出一种基于聚类的过采样方法-C-SMOTE。该方法首先将少数类样例聚成多个簇,再以簇为单位结合SMOTE方法产生新样例。实验结果表明,C-SMOTE既保证了数据集整体分类准确率,又能提高少数类分类精度。

关 键 词:SMOTE  不平衡数据集  聚类  少数类

Clustering Based Oversampling Approach for Minority Class Data
Affiliation:CHEN Chuan ZHANG Hua -xiang
Abstract:Abstract SMOTE is a popular oversampling method in dealing with the classification of imbalanced data sets, which produces instances by linear interpolation between existing minority instances. The main disadvantage is that after SMOTE, it's still intensive where itg intensive and still sparse where it~ sparse in sample space. So we propose a cluster - based oversampling method ( C - SMOTE) to overcome this disadvantage. In this method, the minority class samples will be clustered into several groups firstly, and then groups are used as units to generate new samples with SMOTE. Experimental results show that our method works well in improving the classification ac- curacy for minority class without reducing the overall accuracy.
Keywords:SMOTE Imbalanced dataset Clustering Minority class
本文献已被 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号