首页 | 本学科首页   官方微博 | 高级检索  
     

面向非平衡类问题的k近邻分类算法
引用本文:郭华平,周俊,邬长安,范明.面向非平衡类问题的k近邻分类算法[J].计算机应用,2018,38(4):955-959.
作者姓名:郭华平  周俊  邬长安  范明
作者单位:1. 信阳师范学院 计算机与信息技术学院, 河南 信阳 464000;2. 郑州大学 信息工程学院, 郑州 450000
摘    要:针对k近邻(kNN)方法不能很好地解决非平衡类问题,提出一种新的面向非平衡类问题的k近邻分类算法。与传统k近邻方法不同,在学习阶段,该算法首先使用划分算法(如K-Means)将多数类数据集划分为多个簇,然后将每个簇与少数类数据集合并成一个新的训练集用于训练一个k近邻模型,即该算法构建了一个包含多个k近邻模型的分类器库。在预测阶段,使用划分算法(如K-Means)从分类器库中选择一个模型用于预测样本类别。通过这种方法,提出的算法有效地保证了k近邻模型既能有效发现数据局部特征,又能充分考虑数据的非平衡性对分类器性能的影响。另外,该算法也有效地提升了k近邻的预测效率。为了进一步提高该算法的性能,将合成少数类过抽样技术(SMOTE)应用到该算法中。KEEL数据集上的实验结果表明,即使对采用随机划分策略划分的多数类数据集,所提算法也能有效地提高k近邻方法在评价指标recall、g-mean、f-measure和AUC上的泛化性能;另外,过抽样技术能进一步提高该算法在非平衡类问题上的性能,并明显优于其他高级非平衡类处理方法。

关 键 词:非平衡类技术  k近邻  划分  过抽样  
收稿时间:2017-09-08
修稿时间:2017-10-30

k-nearest neighbor classification method for class-imbalanced problem
GUO Huaping,ZHOU Jun,WU Chang'an,FAN Ming.k-nearest neighbor classification method for class-imbalanced problem[J].journal of Computer Applications,2018,38(4):955-959.
Authors:GUO Huaping  ZHOU Jun  WU Chang'an  FAN Ming
Affiliation:1. School of Computer and Information Technology, Xinyang Normal University, Xinyang Henan 464000, China;2. School of Information Engineering, Zhengzhou University, Zhengzhou Henan 450000, China
Abstract:To improve the performance of k-Nearest Neighbor (kNN) model on class-imbalanced data, a new kNN classification algorithm was proposed. Different from the traditional kNN, for the learning process, the majority set was partitioned into several clusters by using partitioning method (such as K-Means), then each cluster was merged with the minority set as a new training set to train a kNN model, therefore a classifier library was constructed consisting of serval kNN models. For the prediction, using a partitioning method (such as K-Means), a model was selected from the classifier library to predict the class category of a sample. By this way, it is guaranteed that the kNN model can efficiently discover local characteristics of the data, and also fully consider the effect of imbalance of the data on the performance of the classifier. Besides, the efficiency of kNN was also effectively promoted. To further enhance the performance of the proposed algorithm, Synthetic Minority Over-sampling TEchnique (SMOTE) was applied to the proposed algorithm. Experimental results on KEEL data sets show that the proposed algorithm effectively enhances the generalization performance of kNN method on evaluation measures of recall, g-mean, f-measure and Area Under ROC Curve (AUC) on majority set partitioned by random partition strategy, and it also shows great superiority to other state-of-the-art methods.
Keywords:class-imbalanced problem                                                                                                                        k-Nearest Neighbor (kNN)" target="_blank">k-Nearest Neighbor (kNN)')">k-Nearest Neighbor (kNN)                                                                                                                        partitioning                                                                                                                        oversampling
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号