首页 | 本学科首页   官方微博 | 高级检索  
     

一种基于聚类的过抽样算法
引用本文:王换,周忠眉.一种基于聚类的过抽样算法[J].山东大学学报(工学版),2018,48(3):134-139.
作者姓名:王换  周忠眉
作者单位:闽南师范大学计算机学院, 福建 漳州 363000
基金项目:国家自然科学基金资助项目(61170129)
摘    要:在过抽样技术研究中,为了合成较有意义的新样本,提出一种基于聚类的过抽样算法ClusteredSMOTE-Boost。过滤小类的噪声样本,将剩余的每个小类样本作为目标样本参与合成新样本。对整个训练集聚类,根据聚类后目标样本所在簇的特点确定其权重及合成个数。将所有目标样本聚类,在目标样本所在的簇内选取K个近邻,并从中任选一个与目标样本合成新样本,使新样本与目标样本簇内的样本尽量相似,并减少由于添加样本而造成的边界复杂度。试验结果表明,ClusteredSMOTE-Boost算法在各个度量上均明显优于SMOTE-Boost、ADASYN-Boost和BorderlineSMOTE-Boost三种经典算法。

关 键 词:过抽样  样本权重  聚类  分类  不平衡数据  
收稿时间:2017-08-24

An over sampling algorithm based on clustering
WANG Huan,ZHOU Zhongmei.An over sampling algorithm based on clustering[J].Journal of Shandong University of Technology,2018,48(3):134-139.
Authors:WANG Huan  ZHOU Zhongmei
Affiliation:School of Computer, Minnan Normal University, Zhangzhou 363000, Fujian, China
Abstract:In the research of over sampling, in order to generate meaningful new samples, the ClusteredSMOTE-Boost was proposed, which was based on the clustering technique. The algorithm filtered the noisy of minority class samples and took the remaining minority class samples as target samples to synthesize new samples. According to characteristics of the cluster of target samples after clustering determined the weight and the number of the target samples for the whole training set. All target samples were clustered and K-nearest neighbors in the cluster of the target sample were selected, and then a sample from K-nearest neighbors was randomly chosen to synthesize new sample with target sample. Thus, new samples were similar with samples in the target cluster. This method reduced the complexity of the boundary caused by the additional new samples. The experimental results showed that the ClusteredSMOTE-Boost algorithm was superior to the three classical algorithms SMOTE-Boost, ADASYN-Boost, BorderlineSMOTE-Boost on the variety of measures.
Keywords:over sampling  instance weights  classification  cluster  imbalanced data  
本文献已被 CNKI 等数据库收录!
点击此处可从《山东大学学报(工学版)》浏览原始摘要信息
点击此处可从《山东大学学报(工学版)》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号