首页 | 本学科首页   官方微博 | 高级检索  
     

一种快速KNN文本分类算法
引用本文:孙荣宗.一种快速KNN文本分类算法[J].数字社区&智能家居,2010(1).
作者姓名:孙荣宗
作者单位:同济大学电子与信息工程学院计算机科学与技术系;
摘    要:KNN(K-Nearest Neighbor)是向量空间模型中最好的文本分类算法之一。但是,当样本集较大以及文本向量维数较多时,KNN算法分类的效率就会大大降低。该文提出了一种提高KNN分类效率的改进算法。算法在训练过程中计算出各类文本的分布范围,在分类过程中,根据待分类文本向量在样本空间中的分布位置,缩小其K最近邻搜索范围。实验表明改进的算法可以在保持KNN分类性能基本不变的情况下,显著提高分类效率。

关 键 词:文本分类  K-最近邻  算法  

An Improved KNN Algorithm for Text Classification
SUN Rong-zong.An Improved KNN Algorithm for Text Classification[J].Digital Community & Smart Home,2010(1).
Authors:SUN Rong-zong
Affiliation:Department of Computer Science and Technology;Tongji University;Shanghai 201804;China
Abstract:KNN(K-Nearest Neighbor) is one of the best text classification algorithms by Vector Support Model.However, its efficiency rate is very low for text classification task with high dimension and huge samples.In this paper, a new algorithm is introduced to improve the efficiency rate.The distribution of training samples of each class is computed in the training process.According to the position of the documents in the sample space, this algorithm can reduce the searching range of their K nearest neighbors in th...
Keywords:text classification  KNN  algorithm  
本文献已被 CNKI 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号