首页 | 本学科首页   官方微博 | 高级检索  
     

聚类边界过采样不平衡数据分类方法
引用本文:楼晓俊,孙雨轩,刘海涛.聚类边界过采样不平衡数据分类方法[J].浙江大学学报(自然科学版 ),2013,47(6):944-950.
作者姓名:楼晓俊  孙雨轩  刘海涛
作者单位:1. 中国科学院上海微系统与信息技术研究所,上海 200050; 2. 无锡物联网产业研究院,江苏 无锡 214135
基金项目:国家“973”重点基础研究发展规划资助项目(2011CB302906);国家科技重大专项基金资助项目(2010ZX03006-004).
摘    要:针对传统SMOTE过采样方法在生成合成样本的过程中存在的盲目性,以及对噪声敏感且容易出现过拟合现象的问题,提出一种改进的聚类边界样本过采样(CB-SMOTE)方法,通过引入“聚类一致性系数”找到少数类样本的边界,利用边界样本的最近邻密度来剔除噪声点和确定合成样本的数量,对SMOTE方法的新样本合成规则进行了优化.该方法是一种指导性的过采样方法,合成样本更加有利于分类器的学习.通过实验对比6种不同方法在UCI公共数据集上的分类性能,结果表明:CB-SMOTE方法对少数类样本和多数类样本都具有较高的分类准确率,且对过采样倍数的变化具有更高的稳定性.


Clustering boundary over-sampling classification method for imbalanced data sets
LOU Xiao-jun,SUN Yu-xuan,LIU Hai-tao.Clustering boundary over-sampling classification method for imbalanced data sets[J].Journal of Zhejiang University(Engineering Science),2013,47(6):944-950.
Authors:LOU Xiao-jun  SUN Yu-xuan  LIU Hai-tao
Abstract:The synthetic minority over-sampling technique (SMOTE) is a widely used method for imbalanced data classification. However, SMOTE synthesizes new samples without any guidance, which may lead to noise-sensitive and over-fitting. To resolve this problem, a novel over-sampling classification method for imbalanced data sets, called cluster boundary-synthetic minority over-sampling technique (CB-SMOTE), was proposed. Clustering consistency index was introduced to find the boundary minority samples. Then, k-nearest density was defined to calculate the number of synthetic new samples and to reject the noise samples, and it modified the rule of new samples synthesis. It is an over-sampling method with guidance, and the new samples generated by this method are much more beneficial for classifier learning. Six classification methods were compared using University of California Irvine (UCI) data sets. Experimental results show that the proposed method outperforms other methods in both minority samples and majority samples, and it is more stable in different over-sampling rates.
Keywords:
点击此处可从《浙江大学学报(自然科学版 )》浏览原始摘要信息
点击此处可从《浙江大学学报(自然科学版 )》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号