首页 | 本学科首页   官方微博 | 高级检索  
     

密度不均衡数据分类算法
作者姓名:&#  &#  &#  &#  &#
作者单位:1.商洛学院数学与计算机应用学院
摘    要:针对不均衡数据下分类超平面偏移、少数类识别率较低的问题,提出一种基于样本密度的不均衡数据分类算法。该算法首先计算样本密度和类样本密度,依据类样本密度之间的关系确定聚类类数, 然后利用K-means聚类算法对多数类样本进行聚类,用聚类所得类中心作为样本集取代原多数类样本集, 最后对新构造的训练集进行训练得到最终决策函数。其实验结果表明,该算法能够提高SVM在不均衡数据下的分类性能,尤其是少数类的分类性能。  

关 键 词:支持向量机    不均衡数据集    样本密度    欠取样    K-近邻  

A Classification Algorithm for Imbalanced Dataset of Sample Density
&#,&#,&#,&#,&#.A Classification Algorithm for Imbalanced Dataset of Sample Density[J].Journal of Xihua University:Natural Science Edition,2015,34(5):16-23, 74.
Authors:DU Hong-le  ZAHGN Yan
Abstract:In order to resolve the classifiers' over fitting phenomenon to enhance classification performance, a new algorithm based on sample density is proposed for imbalanced data classification. Firstly, it computes the density of samples and the density of every class. Then it works out the number of class with cluster algorithm according to the relation of sample density of every class. Then it clusters the samples of majority class using K- means algorithm with above class number. The cluster centers are treated as the new samples and then a new training dataset is constructed with the new samples and minority dataset. According to the new training dataset, we can get the decision function. The method may resolve the problem of imbalanced dataset and improve the classification performance of SVM. Results of experiments with artificial dataset and six groups of UCI dataset show that the algorithm is effective for imbalanced dataset, especially for the minority class samples.  
Keywords:
点击此处可从《西华大学学报(自然科学版)》浏览原始摘要信息
点击此处可从《西华大学学报(自然科学版)》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号