首页 | 本学科首页   官方微博 | 高级检索  
     

基于NKSMOTE算法的非平衡数据集分类方法
引用本文:王莉,陈红梅. 基于NKSMOTE算法的非平衡数据集分类方法[J]. 计算机科学, 2018, 45(9): 260-265
作者姓名:王莉  陈红梅
作者单位:西南交通大学信息科学与技术学院 成都611756 云计算与智能技术高校重点实验室西南交通大学 成都611756,西南交通大学信息科学与技术学院 成都611756 云计算与智能技术高校重点实验室西南交通大学 成都611756
基金项目:本文受国家自然科学基金(61572406)资助
摘    要:SMOTE(Synthetic Minority Over-sampling TEchnique)在进行样本合成时只在少数类中求其K近邻,这会导致过采样之后少数类样本的密集程度不变的问题。鉴于此,提出一种新的过采样算法NKSMOTE(New Kernel Synthetic Minority Over-Sampling Technique)。该算法首先利用一个非线性映射函数将样本映射到一个高维的核空间,然后在核空间上计算少数类样本在所有样本中的K个近邻,最后根据少数类样本的分布对算法分类性能的影响程度赋予少数类样本不同的向上采样倍率,从而改变数据集的非平衡度。实验采用决策树(Decision Tree,DT)、误差逆传播算法(error BackPropagation,BP)、随机森林(Random Forest,RF)作为分类算法,并将几类经典的过采样方法和文中提出的过采样方法进行多组对比实验。在UCI数据集上的实验结果表明,NKSMOTE算法具有更好的分类性能。

关 键 词:SMOTE算法  过采样  核空间  非平衡度  分类
收稿时间:2017-08-12
修稿时间:2017-12-19

NKSMOTE Algorithm Based Classification Method for Imbalanced Dataset
WANG Li and CHEN Hong-mei. NKSMOTE Algorithm Based Classification Method for Imbalanced Dataset[J]. Computer Science, 2018, 45(9): 260-265
Authors:WANG Li and CHEN Hong-mei
Affiliation:School of Information Science and Technology,Southwest Jiaotong University,Chengdu 611756,China Key Laboratory of Cloud Computing and Intelligent TechnologySouthwest Jiaotong University,Chengdu 611756,China and School of Information Science and Technology,Southwest Jiaotong University,Chengdu 611756,China Key Laboratory of Cloud Computing and Intelligent TechnologySouthwest Jiaotong University,Chengdu 611756,China
Abstract:In SMOTE(Synthetic Minority Over-sampling TEchnique),only minority class samples nearest to neighbors are computed when samples are synthesized,causing the problem that the density of the minority class samples remains unchanged after oversampling.This paper proposed an improved NKSMOTE(New Kernel Synthetic Minority Over-Sampling Technique) algorithm to overcome the shortage of SMOTE.Firstly,a nonlinear mapping function is used to map samples to a high-dimensional kernel space,and then the K nearest neighbors of samples of minority class from the whole samples are computed.In addition,different over-sampling rates are set on different minority samples to change the imbalanced multiplying power according to the influence caused by the distribution of minority class samples on the classification performance of algorithm.In the experiments,some classical oversampling methods were compared with the proposed oversampling method,and Decision Tree(DT),error BackPropagation(BP) and Random Forest(RF) were chosen as base classifier.Experimental results on UCI data sets show better classification performance of NKSMOTE algorithm.
Keywords:SMOTE algorithm  Over-sampling  Kernel space  Imbalanced rate  Classification
点击此处可从《计算机科学》浏览原始摘要信息
点击此处可从《计算机科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号