首页 | 本学科首页   官方微博 | 高级检索  
     

基于粗糙集与改进KNN算法的文本分类方法的研究
引用本文:邵莉.基于粗糙集与改进KNN算法的文本分类方法的研究[J].计算机与现代化,2012(2):86-89.
作者姓名:邵莉
作者单位:阿坝师范高等专科学校教务处,四川汶川623000
基金项目:四川省科技厅2010年科研立项支持课题(2010JY0J41); 四川省教育厅2010年科研立项课题(10SA090); 阿坝师范高等专科学校规划课题(ASB10-14)
摘    要:KNN算法是文本自动分类领域中的一种常用算法,对于低维度的文本分类,其分类准确率较高。然而在处理大量高维度文本时,传统KNN算法由于需处理大量训练样本导致样本相似度的计算量增加,降低了分类效率。为解决相关问题,本文首先利用粗糙集对高维文本信息进行属性约简,删除冗余属性,而后用改进的基于簇的KNN算法进行文本分类。通过仿真实验,证明该方法能够提高文本的分类精度和准确率。

关 键 词:粗糙集  改进KNN  文本分类

Study of Text Classification Method Based on Rough Set and Improved KNN Algorithm
SHAO Li.Study of Text Classification Method Based on Rough Set and Improved KNN Algorithm[J].Computer and Modernization,2012(2):86-89.
Authors:SHAO Li
Affiliation:SHAO Li(Teaching Affairs Office,Aba Teachers College,Wenchuan,623000,China)
Abstract:The KNN algorithm is a common method in the field of automatic text classification.It has high classification accuracy for texts with low dimensional vectors.However,when it deals with large numbers of high-dimensional texts,the traditional KNN algorithm,due to the need to process considerable the training samples,result in increased similarity calculation and reduced classification efficiency.To solve ensuing problems,this paper uses the rough set method to reduce the attributes of decision table and remove redundant attributes,and then the improved cluster-based KNN algorithm is used to classify texts.Simulation results show that the method can improve the precision and accuracy rate of text classification.
Keywords:rough set  improved KNN algorithm  text classification method
本文献已被 CNKI 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号