首页 | 本学科首页   官方微博 | 高级检索  
     

一种基于关联分析的KNN文本分类方法
引用本文:范恒亮,成卫青. 一种基于关联分析的KNN文本分类方法[J]. 计算机技术与发展, 2014, 0(6): 71-74
作者姓名:范恒亮  成卫青
作者单位:南京邮电大学计算机学院,江苏南京210003
基金项目:国家自然科学基金资助项目(61170322,71171117); 江苏省自然科学基金资助项目(BK2010524)
摘    要:KNN算法在数据挖掘的分支-文本分类中有重要的应用。在分析了传统KNN方法不足的基础上,提出了一种基于关联分析的KNN改进算法。该方法首先针对不同类别的训练文本提取每个类别的频繁特征集及其关联的文本,然后基于对各个类别文本的关联分析结果,为未知类别文本确定适当的近邻数k,并在已知类别的训练文本中快速选取k个近邻,进而根据近邻的类别确定未知文本的类别。相比于基于传统KNN的文本分类方法,改进方法能够较好地确定k值,并能降低时间复杂度。实验结果表明,文中提出的基于改进KNN的文本分类方法提高了文本分类的效率和准确率。

关 键 词:数据挖掘  文本分类  KNN  关联分析

An Improved KNN Approach of Text Classification Based on Association Analysis
FAN Heng-liang,CHENG Wei-qing. An Improved KNN Approach of Text Classification Based on Association Analysis[J]. Computer Technology and Development, 2014, 0(6): 71-74
Authors:FAN Heng-liang  CHENG Wei-qing
Affiliation:( School of Computer,Nanjing University of Posts and Telecommunications ,Nanjing 210003 ,China)
Abstract:The KNN algorithm is largely applied in text classification,one branch of data mining.On the basis of analyzing the deficiencies of the traditional KNN method,an improved KNN algorithm based on association analysis is proposed in this paper.In this method,frequent feature sets for each class of training documents and associated documents should be extracted in advance.When a document with unknown class is to be classified,by the use of the results of association analysis,the number of nearest neighbors,k can be decided,k nearest neighbors can be found quickly from all classes of training documents,and the class of the document can be decided by the classes of its neighbors.Compared with the traditional KNN algorithm,this method has greatly improved in the selection of the best number of nearest neighbors.Moreover,it can also reduce the time complexity of the algorithm.The experimental results show that the proposed algorithm has better efficiency and accuracy in text classification.
Keywords:data mining  text classification  KNN  association analysis
本文献已被 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号