首页 | 本学科首页   官方微博 | 高级检索  
     

基于聚类混合采样的不平衡数据分类
引用本文:史明华,吴广潮.基于聚类混合采样的不平衡数据分类[J].计算机与现代化,2020,0(5):34-38.
作者姓名:史明华  吴广潮
作者单位:华南理工大学数学学院,广东 广州 510641;华南理工大学数学学院,广东 广州 510641
摘    要:不平衡分类问题广泛地应用于现实生活中,针对大多数重采样算法侧重于类间平衡,较少关注类内数据分布不平衡问题,提出一种基于聚类的混合采样算法。首先对原始数据集聚类,然后对每一簇样本计算不平衡比,根据不平衡比的大小对该簇样本做出相应处理,最后将平衡后的数据集放入GBDT分类器进行训练。实验表明该算法与几种传统算法相比F1-value和AUC更高,分类效果更好。

关 键 词:不平衡数据  聚类    混合采样  GBDT  
收稿时间:2020-05-21

An Imbalanced Data Classification of Hybrid Sampling Based on Clustering
SHI Ming-hua,WU Guang-chao.An Imbalanced Data Classification of Hybrid Sampling Based on Clustering[J].Computer and Modernization,2020,0(5):34-38.
Authors:SHI Ming-hua  WU Guang-chao
Abstract:The imbalanced classification problem is widely used in real life. For most resampling algorithms, it focuses on the balance between classes and pays less attention to the problem of data distribution imbalance within classes, a hybrid sampling algorithm based on clustering is proposed. Firstly, the original data set is clustered, then the imbalance ratio is calculated for each cluster sample, and the cluster sample is processed according to the imbalance ratio. Finally, the balanced data set is put into the GBDT classifier for training. Experiments show that the algorithm has higher F1-value, AUC and better classification results than several traditional algorithms.
Keywords:imbalanced data  clustering  hybrid sampling  GBDT  
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机与现代化》浏览原始摘要信息
点击此处可从《计算机与现代化》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号