首页 | 本学科首页   官方微博 | 高级检索  
     

基于主动学习SMOTE的非均衡数据分类
引用本文:张永,李卓然,刘小丹.基于主动学习SMOTE的非均衡数据分类[J].计算机应用与软件,2012(3):91-93,162.
作者姓名:张永  李卓然  刘小丹
作者单位:辽宁师范大学计算机与信息技术学院
基金项目:国家自然科学基金项目(10771092);辽宁省科技厅博士启动基金项目(20081079);大连市科学技术基金项目(2010J21DW019)
摘    要:少数类样本合成过采样技术(SMOTE)是一种典型的过采样数据预处理方法,它能够有效平衡非均衡数据,但会带来噪音等问题,影响分类精度。为解决此问题,借助主动学习支持向量机的分类性能,提出一种基于主动学习SMOTE的非均衡数据分类方法 ALSMOTE。由于主动学习支持向量机采用基于距离的主动选择最佳样本的学习策略,因此能够主动选择非均衡数据中的有价值的多数类样本,舍弃价值较小的样本,从而提高运算效率,改进SMOTE带来的问题。首先运用SMOTE方法均衡小部分样本,得到初始分类器;然后利用主动学习策略调整分类器精度。实验结果表明,该方法有效提高了非均衡数据的分类准确率。

关 键 词:主动学习  不平衡数据集  少数类样本合成过采样技术  支持向量机

ACTIVE LEARNING SMOTE BASED IMBALANCED DATA CLASSIFICATION
Zhang Yong Li Zhuoran Liu Xiaodan.ACTIVE LEARNING SMOTE BASED IMBALANCED DATA CLASSIFICATION[J].Computer Applications and Software,2012(3):91-93,162.
Authors:Zhang Yong Li Zhuoran Liu Xiaodan
Affiliation:Zhang Yong Li Zhuoran Liu Xiaodan(College of Computer and Information Technology,Liaoning Normal University,Dalian 116081,Liaoning,China)
Abstract:Synthetic Minority Over-sampling Technique(SMOTE) is a typical over-sampling data preprocessing method which can effectively balance the imbalanced data.However,it brings about noise as well as other problems,so that the classification accuracy is downgraded.To solve the problem,with the help of the classification performance of active learning SVM,an imbalance data classification approach,called ALSMOTE,which is based on active learning SMOTE,is proposed.Since active learning SVM relies on distance-based active selection optimal samples learning strategies,it can actively choose from imbalanced data the valuable majority class samples by discarding valueless samples,so as to enhance operational efficiency and mitigate the problems brought about by SMOTE.First of all SMPTE approach is used to balance a small part of samples to obtain an initial classification;then active learning strategies are followed to adjust the classification accuracy.Experimental results show that the proposed method can effectively improve the imbalanced data’s classification accuracy.
Keywords:Active learning Imbalanced data set SMOTE SVM
本文献已被 CNKI 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号