首页 | 本学科首页   官方微博 | 高级检索  
     

基于层次密度聚类的去噪自适应混合采样
引用本文:姜新盈,王舒梵,严涛.基于层次密度聚类的去噪自适应混合采样[J].计算机系统应用,2022,31(10):206-210.
作者姓名:姜新盈  王舒梵  严涛
作者单位:上海工程技术大学 数理与统计学院, 上海 201620
摘    要:针对非平衡数据存在的类内不平衡、噪声、生成样本覆盖面小等问题, 提出了基于层次密度聚类的去噪自适应混合采样算法(adaptive denoising hybrid sampling algorithm based on hierarchical density clustering, ADHSBHD). 首先引入HDBSCAN聚类算法, 将少数类和多数类分别聚类, 将全局离群点和局部离群点的交集视为噪声集, 在剔除噪声样本之后对原数据集进行处理, 其次, 根据少数类样本中每簇的平均距离, 采用覆盖面更广的采样方法自适应合成新样本, 最后删除一部分多数类样本集中的对分类贡献小的点, 使数据集均衡. ADHSBHD算法在7个真实数据集上进行评估, 结果证明了其有效性.

关 键 词:不平衡数据  分类  聚类  混合采样
收稿时间:2022/1/27 0:00:00
修稿时间:2022/2/24 0:00:00

Denoising and Adaptive Hybrid Sampling Based on Hierarchical Density Clustering
JIANG Xin-Ying,WANG Shu-Fan,YAN Tao.Denoising and Adaptive Hybrid Sampling Based on Hierarchical Density Clustering[J].Computer Systems& Applications,2022,31(10):206-210.
Authors:JIANG Xin-Ying  WANG Shu-Fan  YAN Tao
Affiliation:School of Mathematics, Physics and Statistics, Shanghai University of Engineering Science, Shanghai 201620, China
Abstract:As imbalanced data are exposed to problems such as intra-class imbalance, noise, and small coverage of generated samples, an adaptive denoising hybrid sampling algorithm based on hierarchical density clustering (ADHSBHD) is proposed. Firstly, the clustering algorithm HDBSCAN is introduced to perform clustering on minority classes and majority classes separately; the intersection of global and local outliers is regarded as the noise set, and the original data set is processed after noise samples are eliminated. Secondly, according to the average distance between clusters of samples in minority classes, the adaptive sampling method with broader coverage is used to synthesize new samples. Finally, some points that contribute little to the classification of majority classes are deleted to balance the dataset. The ADHSBHD algorithm is evaluated on six real data sets, and the results can prove its effectiveness.
Keywords:imbalanced data  classification  cluster  hybrid sampling
点击此处可从《计算机系统应用》浏览原始摘要信息
点击此处可从《计算机系统应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号