首页 | 本学科首页   官方微博 | 高级检索  
     

整合DBSCAN和改进SMOTE的过采样算法
引用本文:王亮,冶继民.整合DBSCAN和改进SMOTE的过采样算法[J].计算机工程与应用,2020,56(18):111-118.
作者姓名:王亮  冶继民
作者单位:西安电子科技大学 数学与统计学院,西安 710126
基金项目:中央高校基本科研基金;国家自然科学基金
摘    要:针对SMOTE(Synthetic Minority Over-sampling Technique)等传统过采样算法存在的忽略类内不平衡、扩展少数类的分类区域以及合成的新样本高度相似等问题,基于综合考虑类内不平衡和合成样本多样性的思想,提出了一种整合DBSCAN和改进SMOTE的过采样算法DB-MCSMOTE(DBSCAN and Midpoint Centroid Synthetic Minority Over-sampling Technique)。该算法对少数类样本进行DBSCAN聚类,根据提出的簇密度分布函数,计算各个簇的簇密度和采样权重,在各个簇中利用改进的SMOTE算法(MCSMOTE)在相距较远的少数类样本点之间的连线上进行过采样,提高合成样本的多样性,得到新的类间和类内综合平衡数据集。通过对一个二维合成数据集和九个UCI数据集的实验表明,DB-MCSMOTE可以有效提高分类器对少数类样本和整体数据集的分类性能。

关 键 词:过采样  类内不平衡  少数类  多样性  SMOTE算法  DBSCAN算法  

Hybrid Algorithm of DBSCAN and Improved SMOTE for Oversampling
WANG Liang,YE Jimin.Hybrid Algorithm of DBSCAN and Improved SMOTE for Oversampling[J].Computer Engineering and Applications,2020,56(18):111-118.
Authors:WANG Liang  YE Jimin
Affiliation:School of Mathematics and Statistics, Xidian University, Xi’an 710126, China
Abstract:For conventional oversampling algorithms, for example, SMOTE (Synthetic Minority Over-sampling Technique), there are several problems such as ignoring within-class imbalances, extending the classification regions of minority class and synthesizing highly similar samples. Based on the comprehensive consideration of within-class imbalance and synthetic samples in diversity, an oversampling algorithm, which is a hybrid of DBSCAN and improved SMOTE (DB-MCSMOTE), is proposed. It utilizes the DBSCAN algorithm to cluster the minority class samples. According to the proposed cluster density distribution function, the cluster density and sampling weight of each cluster are calculated. The MCSMOTE algorithm is adopted to oversample on the lines of the location-distant minority class samples in each cluster, the diversity of synthetic samples is improved and a new balanced dataset between and within classes is obtained. Experiments on a two-dimensional synthesis data set and nine UCI data sets show that DB-MCSMOTE can effectively improve the classification performance of the classifier for the minority class samples and the overall data set.
Keywords:oversampling  within-class imbalance  minority class  diversity  Synthetic Minority Over-sampling Technique(SMOTE) algorithm  Density-Based Spatial Clustering of Applications with Noise(DBSCAN) algorithm  
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号