首页 | 本学科首页   官方微博 | 高级检索  
     

一种基于密度的SMOTE方法研究
引用本文:王俊红,段冰倩.一种基于密度的SMOTE方法研究[J].智能系统学报,2017,12(6):865-872.
作者姓名:王俊红  段冰倩
作者单位:山西大学 计算机与信息技术学院, 山西 太原 030006
摘    要:重采样技术在解决非平衡类分类问题上得到了广泛的应用。其中,Chawla提出的SMOTE(Synthetic Minority Oversampling Technique)算法在一定程度上缓解了数据的不平衡程度,但这种方法对少数类数据不加区分地进行过抽样,容易造成过拟合。针对此问题,本文提出了一种新的过采样方法:DS-SMOTE方法。DS-SMOTE算法基于样本的密度来识别稀疏样本,并将其作为采样过程中的种子样本;然后在采样过程中采用SMOTE算法的思想,在种子样本与其k近邻之间产生合成样本。实验结果显示,DS-SMOTE算法与其他同类方法相比,准确率以及G值有较大的提高,说明DS-SMOTE算法在处理非平衡数据分类问题上具有一定优势。

关 键 词:非平衡  分类  采样  准确率  密度

Research on the SMOTE method based on density
WANG Junhong,DUAN Bingqian.Research on the SMOTE method based on density[J].CAAL Transactions on Intelligent Systems,2017,12(6):865-872.
Authors:WANG Junhong  DUAN Bingqian
Affiliation:School of Computer and Information Technology, Shanxi University, Taiyuan 030006, China
Abstract:In recent years, over-sampling has been widely used in the field of classification of imbalanced classes. The SMOTE(Synthetic Minority Oversampling Technique) algorithm, presented by Chawla, alleviates the degree of data imbalance to a certain extent, but can lead to over-fitting. To solve this problem, this paper presents a new sampling method, DS-SMOTE, which identifies sparse samples based on their density and uses them as seed samples in the process of sampling. The SMOTE algorithm is then adopted, and a synthetic sample is generated between the seed sample and its k neighbor. The proposed algorithm showed great improvement in precision and G-mean compared with similar algorithms, and it has advantage of treating imbalanced data classification.
Keywords:imbalance  classification  sampling  precision  density
点击此处可从《智能系统学报》浏览原始摘要信息
点击此处可从《智能系统学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号