首页 | 本学科首页   官方微博 | 高级检索  
     

一种混合CGAN与SMOTEENN的不平衡数据处理方法
引用本文:刘宁,朱波,阴艳超,李岫宸.一种混合CGAN与SMOTEENN的不平衡数据处理方法[J].控制与决策,2023,38(9):2614-2621.
作者姓名:刘宁  朱波  阴艳超  李岫宸
作者单位:昆明理工大学 机电工程学院,昆明 650500
基金项目:国家自然科学基金项目(52065033).
摘    要:CGAN能够从数据中学习其分布特性,被引入不平衡数据处理中对少数类样本进行过采样,可以生成符合原始数据分布的新样本,因此比传统的重采样方法具有更好的处理效果.然而,CGAN对数据分布特性的学习易受限于样本规模,在少数类样本规模较小时不能充分学习其分布特性,难以保证生成样本的质量.针对这一问题,提出一种将CGAN与SMOTEENN相结合的不平衡数据平衡化处理方法.首先,从既有的少数类样本出发,采用SMOTEENN方法生成一定规模的少数类样本;然后,在此基础上训练CGAN模型,保证其能够生成符合原始少数类样本分布特征的新样本;最后,再利用CGAN重新生成符合原始少数类样本分布的新样本构建平衡数据集.为验证所提出方法的有效性,基于公开的不平衡数据集开展对比实验研究.实验结果表明,相对几种经典的不平衡数据处理方法与近期文献报道的方法,所提出方法在几项不平衡数据分类评价指标上表现出明显的优势.

关 键 词:不平衡数据  数据平衡化处理  重采样方法  CGAN  SMOTEENN

An imbalanced data processing method based on hybrid CGAN and SMOTEENN
LIU Ning,ZHU Bo,YIN Yan-chao,LI Xiu-chen.An imbalanced data processing method based on hybrid CGAN and SMOTEENN[J].Control and Decision,2023,38(9):2614-2621.
Authors:LIU Ning  ZHU Bo  YIN Yan-chao  LI Xiu-chen
Affiliation:Faculty of Mechanical and Electrical Engineering,Kunming University of Science and Technology,Kunming 650500,China
Abstract:Conditional generative adversarial networks(CGAN) can learn its distribution characteristics from the data, and is introduced into the imbalanced data processing to oversample the minority class samples, which can generate new samples that conform to the original data distribution, so it has a better processing effect than traditional resampling methods. However, the learning of data distribution characteristics by a CGAN is easily limited by the sample size. When the sample size of the minority class is small, its distribution characteristics cannot be fully learned, and it is difficult to ensure the quality of the generated samples. To solve this problem, this paper proposes an unbalanced data balance processing method combined with the CGAN and the synthetic minority over-sampling technique edited nearest neighbor(SMOTEENN). Firstly, starting from the existing minority class samples, the SMOTEENN method is used to generate a certain scale of minority class samples, and then the CGAN model is trained on this basis to ensure that it can generate consistent the new samples with the distribution characteristics of the original minority class samples. Finally, the CGAN is used to regenerate new samples that conform to the original minority class sample distribution to construct a balanced dataset. The experimental results show that, compared with several classical imbalanced data processing methods and methods reported in recent literature, the proposed method has obvious advantages in several imbalanced data classification evaluation indicators.
Keywords:
点击此处可从《控制与决策》浏览原始摘要信息
点击此处可从《控制与决策》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号