一种基于聚类融合欠抽样的不平衡数据分类方法 Unbalanced Data Classification Algorithm Based on Clustering Ensemble Under-sampling期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

一种基于聚类融合欠抽样的不平衡数据分类方法

引用本文：	张枭山,罗强. 一种基于聚类融合欠抽样的不平衡数据分类方法[J]. 计算机科学, 2015, 42(Z11): 63-66

作者姓名：	张枭山罗强

作者单位：	重庆邮电大学计算机科学与技术学院重庆400065重庆邮电大学移通学院计算机系重庆401520,重庆邮电大学计算机科学与技术学院重庆400065重庆邮电大学移通学院计算机系重庆401520

摘要：	在面对现实中广泛存在的不平衡数据分类问题时,大多数传统分类算法假定数据集类分布是平衡的,分类结果偏向多数类,效果不理想。为此,提出了一种基于聚类融合欠抽样的改进AdaBoost分类算法。该算法首先进行聚类融合,根据样本权值从每个簇中抽取一定比例的多数类和全部的少数类组成平衡数据集。使用AdaBoost算法框架,对多数类和少数类的错分类给予不同的权重调整,选择性地集成分类效果较好的几个基分类器。实验结果表明,该算法在处理不平衡数据分类上具有一定的优势。
关键词：	机器学习不平衡数据聚类融合欠抽样集成学习
Unbalanced Data Classification Algorithm Based on Clustering Ensemble Under-sampling

ZHANG Xiao-shan and LUO Qiang. Unbalanced Data Classification Algorithm Based on Clustering Ensemble Under-sampling[J]. Computer Science, 2015, 42(Z11): 63-66

Authors:	ZHANG Xiao-shan and LUO Qiang

Affiliation:	Institute of Computer Science & Technology,Chongqing University of Posts and Telecommunications,Chongqing 400065,China Department of Computer,College of Mobile Telecommunications,Chongqing University of Posts and Telecommunications,Chongqing 401520,China and Institute of Computer Science & Technology,Chongqing University of Posts and Telecommunications,Chongqing 400065,China Department of Computer,College of Mobile Telecommunications,Chongqing University of Posts and Telecommunications,Chongqing 401520,China

Abstract:	Imbalanced data exists widely in the real world,under such circumstances,most traditional classification algorithms assume the balanced data distribution,which results in the classification outcome offset to the majority class,so the effort is not ideal.The enhanced AdaBoost based on the clustering ensemble under-sampling technique was proposed in this paper.The algorithm firstly clusters the sample data by clustering ensemble,according to the sample weight.And the majority class from each cluster in certain proportion are randomly selected and then merge with all minority class to generate a balanced training set.By use of the AdaBoost algorithm framework,the algorithm gives different weight adjustment to the majority class and the minority class respectively,and selectes several base classifiers with better effect to get the final ensemble.The experiment result show that:this algorithm has a certain advantage dealing with unbalanced data classification.

Keywords:	Machine learning Imbalanced data Clustering ensemble Under-sampling Ensemble learning

	点击此处可从《计算机科学》浏览原始摘要信息
	点击此处可从《计算机科学》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏