首页 | 本学科首页   官方微博 | 高级检索  
     

不平衡数据集中的组合分类算法
引用本文:吴广潮,陈奇刚. 不平衡数据集中的组合分类算法[J]. 计算机工程与设计, 2007, 28(23): 5687-5689,5761
作者姓名:吴广潮  陈奇刚
作者单位:华南理工大学,数学科学学院,广东,广州,510640;华南理工大学,计算机科学与工程学院,广东,广州,510640;华南理工大学,数学科学学院,广东,广州,510640
基金项目:国家自然科学基金 , 广东省自然科学基金 , 华南理工大学自然科学基金
摘    要:为提高少数类的分类性能,对基于数据预处理的组合分类器算法进行了研究.利用Tomek links对数据集进行预处理;把新数据集里的多数类样本按照不平衡比拆分为多个子集,每个子集和少数类样本合并成新子集;用最小二乘支持向量机对每个新子集进行训练,把训练后的各个子分类器组合为一个分类系统,新的测试样本的类别将由这个分类系统投票表决.数据试验结果表明,该算法在多数类和少数类的分类性能方面,都优于最小二乘支持向量机过抽样方法和欠抽样方法.

关 键 词:不平衡数据集  最小二乘支持向量机  组合分类器  数据预处理  不平衡比
文章编号:1000-7024(2007)23-5687-03
收稿时间:2007-02-25
修稿时间:2007-02-25

Combined classifier algorithm for imbalanced datasets
WU Guang-chao,CHEN Qi-gang. Combined classifier algorithm for imbalanced datasets[J]. Computer Engineering and Design, 2007, 28(23): 5687-5689,5761
Authors:WU Guang-chao  CHEN Qi-gang
Abstract:In order to improve the performance of the minority class, a combined classifier algorithm is presented based on data pre- processing. Firstly, Tomek links method is applied to preprocess a dataset, in which all Tomek links data points are removed to form a new dataset. Then the data points ofthe majority ctass in the new dataset are split into several disjoint subsets according to the imbalanced ratio, and each subset is combined with minority class to form a new training dataset. Finally, each training dataset is trained by least squares support vector machine(LS-SVM), and all of the LS-SVM classifiers are combined to form a classifying system. The label of a new testing data point is determined based on the voting strategy. The experimental results show that the proposed algorithm performs better than LS-SVM, synthetic minority over-sampling technique(SMOTE)and under-sampling(US)in terms of the classification performance of the majority class and the minority one.
Keywords:imbalanced datasets  least squares support vector machine  combined classifier  data preprocessing  imbalanced ratio
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号