首页 | 本学科首页   官方微博 | 高级检索  
     

一种基于聚类融合欠抽样的不平衡数据分类方法
引用本文:张枭山,罗强. 一种基于聚类融合欠抽样的不平衡数据分类方法[J]. 计算机科学, 2015, 42(Z11): 63-66
作者姓名:张枭山  罗强
作者单位:重庆邮电大学计算机科学与技术学院 重庆400065重庆邮电大学移通学院计算机系 重庆401520,重庆邮电大学计算机科学与技术学院 重庆400065重庆邮电大学移通学院计算机系 重庆401520
摘    要:在面对现实中广泛存在的不平衡数据分类问题时,大多数 传统分类算法假定数据集类分布是平衡的,分类结果偏向多数类,效果不理想。为此,提出了一种基于聚类融合欠抽样的改进AdaBoost分类算法。该算法首先进行聚类融合,根据样本权值从每个簇中抽取一定比例的多数类和全部的少数类组成平衡数据集。使用AdaBoost算法框架,对多数类和少数类的错分类给予不同的权重调整,选择性地集成分类效果较好的几个基分类器。实验结果表明,该算法在处理不平衡数据分类上具有一定的优势。

关 键 词:机器学习  不平衡数据  聚类融合  欠抽样  集成学习

Unbalanced Data Classification Algorithm Based on Clustering Ensemble Under-sampling
ZHANG Xiao-shan and LUO Qiang. Unbalanced Data Classification Algorithm Based on Clustering Ensemble Under-sampling[J]. Computer Science, 2015, 42(Z11): 63-66
Authors:ZHANG Xiao-shan and LUO Qiang
Affiliation:Institute of Computer Science & Technology,Chongqing University of Posts and Telecommunications,Chongqing 400065,China Department of Computer,College of Mobile Telecommunications,Chongqing University of Posts and Telecommunications,Chongqing 401520,China and Institute of Computer Science & Technology,Chongqing University of Posts and Telecommunications,Chongqing 400065,China Department of Computer,College of Mobile Telecommunications,Chongqing University of Posts and Telecommunications,Chongqing 401520,China
Abstract:Imbalanced data exists widely in the real world,under such circumstances,most traditional classification algorithms assume the balanced data distribution,which results in the classification outcome offset to the majority class,so the effort is not ideal.The enhanced AdaBoost based on the clustering ensemble under-sampling technique was proposed in this paper.The algorithm firstly clusters the sample data by clustering ensemble,according to the sample weight.And the majority class from each cluster in certain proportion are randomly selected and then merge with all minority class to generate a balanced training set.By use of the AdaBoost algorithm framework,the algorithm gives different weight adjustment to the majority class and the minority class respectively,and selectes several base classifiers with better effect to get the final ensemble.The experiment result show that:this algorithm has a certain advantage dealing with unbalanced data classification.
Keywords:Machine learning  Imbalanced data  Clustering ensemble  Under-sampling  Ensemble learning
点击此处可从《计算机科学》浏览原始摘要信息
点击此处可从《计算机科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号