面向不均衡数据的融合谱聚类的自适应过采样法 Spectral clustering-fused adaptive synthetic oversampling approach for imbalanced data processing期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

面向不均衡数据的融合谱聚类的自适应过采样法

引用本文：	刘金平,周嘉铭,贺俊宾,,唐朝晖,徐鹏飞,张国勇.面向不均衡数据的融合谱聚类的自适应过采样法[J].智能系统学报,2020,15(4):732-739.

作者姓名：	刘金平周嘉铭贺俊宾唐朝晖徐鹏飞张国勇

作者单位：	1. 湖南师范大学智能计算与语言信息处理湖南省重点实验室，湖南长沙 410081;2. 湖南省计量检测研究院，湖南长沙 410014;3. 中南大学自动化学院，湖南长沙 410082

摘要：	分类是模式识别领域中的研究热点，大多数经典的分类器往往默认数据集是分布均衡的，而现实中的数据集往往存在类别不均衡问题，即属于正常/多数类别的数据的数量与属于异常/少数类数据的数量之间的差异很大。若不对数据进行处理往往会导致分类器忽略少数类、偏向多数类，使得分类结果恶化。针对数据的不均衡分布问题，本文提出一种融合谱聚类的综合采样算法。首先采用谱聚类方法对不均衡数据集的少数类样本的分布信息进行分析，再基于分布信息对少数类样本进行过采样，获得相对均衡的样本，用于分类模型训练。在多个不均衡数据集上进行了大量实验，结果表明，所提方法能有效解决数据的不均衡问题，使得分类器对于少数类样本的分类精度得到提升。
关键词：	不自适应综合采样法不均衡数据集谱聚类过采样模式分类数据分布有偏分类器数据预处理
Spectral clustering-fused adaptive synthetic oversampling approach for imbalanced data processing

LIU Jinping,ZHOU Jiaming,HE Junbin,,TANG Zhaohui,XU Pengfei,ZHANG Guoyong.Spectral clustering-fused adaptive synthetic oversampling approach for imbalanced data processing[J].CAAL Transactions on Intelligent Systems,2020,15(4):732-739.

Authors:	LIU Jinping ZHOU Jiaming HE Junbin TANG Zhaohui XU Pengfei ZHANG Guoyong

Affiliation:	1. Hu’nan Provincial Key Laboratory of Intelligent Computing and Language Information Processing, Hu’nan Normal University, Changsha 410081, China;2. Hu’nan Institute of Metrology and Test, Changsha 410014, China;3. School of Automation, Central South University, Changsha 410082, China

Abstract:	Classification is a research hotspot in the field of machine learning. Most classic classifiers assume that the distribution of dataset is generally balanced, while the data se-t in reality often has a problem of class imbalance. Namely, the number of data belonging to the normal/majority category and the amount of anomaly/minority data vary greatly. If the data is not processed, the classifier will ignore the minority and be biased towards the majority, which deteriorates the classification results. Focusing on the problem of data imbalance, this paper proposes a spectral clustering-fused comprehensive sampling algorithm (SCF-ADASYN). First, the spectral clustering method is employed to analyze the distribution information of the minority-type samples in the imbalanced dataset, and the samples of minority class are oversampled to obtain a relatively balanced dataset, used for the classification model training. A large number of experiments have been carried out on multiple unbalanced datasets. The results show that the SCF-ADASYN can effectively improve the imbalance on the data se-t, and the classification accuracies of the testing classifiers on the unbalanced data se-t can be significantly improved.

Keywords:	adaptive synthetic sampling approach (ADASYN) imbalanced data se-t spectral clustering oversampling pattern classification data distribution biased classifier data pre-processing

	点击此处可从《智能系统学报》浏览原始摘要信息
	点击此处可从《智能系统学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏