首页 | 本学科首页   官方微博 | 高级检索  
     

基于冗余度的KNN训练样本裁剪新算法
引用本文:王子旗,何锦雯,蒋良孝. 基于冗余度的KNN训练样本裁剪新算法[J]. 计算机工程与应用, 2019, 55(22): 40-45. DOI: 10.3778/j.issn.1002-8331.1809-0275
作者姓名:王子旗  何锦雯  蒋良孝
作者单位:中国地质大学(武汉)计算机学院,武汉,430074;中国地质大学(武汉)计算机学院,武汉,430074;中国地质大学(武汉)计算机学院,武汉,430074
基金项目:国家自然科学基金联合基金重点项目
摘    要:作为数据挖掘领域十大算法之一,K-近邻算法(K-Nearest-Neighbor,KNN)因具有非参数、无需训练时间、简单有效等特点而得到广泛应用。然而,KNN算法在面对高维的大训练样本集时,分类时间复杂度高的问题成为其应用的瓶颈。另外,因训练样本的类分布不均匀而导致的类不平衡问题也会影响其分类性能。针对这两个问题,提出了一种基于冗余度的KNN分类器训练样本裁剪新算法(简记为RBKNN)。RBKNN通过引入训练样本集预处理过程,对每个训练样本进行冗余度计算,并随机裁剪掉部分高冗余度的训练样本,从而达到减小训练样本规模、均衡样本分布的目的。实验结果表明,RBKNN可在保持或改善分类精度的前提下显著提升KNN的分类效率。

关 键 词:KNN分类器  样本裁剪  快速分类  类不平衡

New Redundancy-Based Algorithm for Reducing Amount of Training Examples in KNN
WANG Ziqi,HE Jinwen,JIANG Liangxiao. New Redundancy-Based Algorithm for Reducing Amount of Training Examples in KNN[J]. Computer Engineering and Applications, 2019, 55(22): 40-45. DOI: 10.3778/j.issn.1002-8331.1809-0275
Authors:WANG Ziqi  HE Jinwen  JIANG Liangxiao
Affiliation:School of Computer Science, China University of Geosciences, Wuhan 430074, China
Abstract:As one of the top 10 algorithms in data mining, the K-Nearest-Neighbor(KNN) algorithm is widely used because it is an non-parametric, simple and effective algorithm without training time. However, when it faces to massive amount of high-dimensional training examples, its high classification time complexity becomes a bottleneck of its application. In addition, its classification performance is often harmed, when the class distribution of training examples is skewed and the class imbalance problem occurs. To address these two issues, this paper proposes a new redundancy-based algorithm for reducing the amount of training examples (simply RBKNN). RBKNN at first computes the redundancy of each training example, and then randomly deletes some high redundant training examples by introducing a pre-processing process. RBKNN can not only reduce the size of training example set, but also make the class distribution of training examples more balanced. The experimental results show that RBKNN significantly promotes the efficiency of KNN, yet at the same time maintains or improves the classification accuracy of KNN.
Keywords:KNN classifiers  example reduction  fast classification  class imbalance  
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号