首页 | 本学科首页   官方微博 | 高级检索  
     

大数据下的快速kNN分类算法*
引用本文:苏毅娟,邓振云,程德波,宗鸣.大数据下的快速kNN分类算法*[J].计算机应用研究,2016,33(4).
作者姓名:苏毅娟  邓振云  程德波  宗鸣
作者单位:广西师范学院计算机与信息工程学院,广西师范大学计算机信息工程学院,广西师范大学计算机信息工程学院,广西师范大学计算机信息工程学院
基金项目:国家自然科学基金(61170131, 61263035 和 61363009)、国家863项目(2012AA011005)、国家973项目(2013CB329404)、广西自然科学基金(2012GXNSFGA060004)、广西八桂创新团队和广西百人计划、广西高校科学技术研究重点项目2013ZD041、广西自然科学基金项目(2014jjAA70175)
摘    要:针对K最近邻算法测试复杂度至少为线性,导致其在大数据样本情况下的效率很低的问题,提出了一种应用于大数据下的快速KNN分类算法。该算法创新的在K最近邻算法中引入训练过程,即通过线性复杂度聚类方法对大数据样本进行分块,然后在测试过程中找出与待测样本距离最近的块,并将其作为新的训练样本进行K最近邻分类。这样的过程大幅度的减少了K最近邻算法的测试开销,使其能在大数据集中得以应用。实验表明,本文算法在与经典KNN分类准确率保持近似的情况下,分类的速度明显快于经典KNN算法。

关 键 词:k-最近邻  大数据  分块  聚类中心
收稿时间:2014/11/30 0:00:00
修稿时间:2016/2/21 0:00:00

A fast KNN classification algorithm under large data
SU Yi-juan,DENG Zhen-yun,CHENG De-bo and ZONG Ming.A fast KNN classification algorithm under large data[J].Application Research of Computers,2016,33(4).
Authors:SU Yi-juan  DENG Zhen-yun  CHENG De-bo and ZONG Ming
Affiliation:College of Computer information Technology,Guangxi Teachers Education University,Nanning,,College of Computer Science information,GuangXi Normal University,Guilin,College of Computer Science information,GuangXi Normal University,Guilin
Abstract:Aim at the problems of the K-Nearest Neighbor algorithm testing complex is linear at least, and lead to the accuracy is low when the samples large. This paper proposed a fast KNN classification algorithm faster than the traditional KNN does. The proposed algorithm innovation introduce the training process during the KNN method, i.e., the algorithm blocks the big data by linear complexity clustering. Then, the algorithm select the nearest cluster as new training samples and establish a classification model. This process reduces the KNN algorithm testing overhead, which makes the proposed algorithm can be applied to big data. Experiments result showed that the accuracy of the proposed KNN classification is similarity than the traditional KNN, but the classification speed has been significantly improved.
Keywords:k-nearest neighbor  big data  cluster  cluster centers
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号