首页 | 本学科首页   官方微博 | 高级检索  
     

基于粗糙集的快速KNN文本分类算法
引用本文:孙荣宗,苗夺谦,卫志华,李文. 基于粗糙集的快速KNN文本分类算法[J]. 计算机工程, 2010, 36(24): 175-177
作者姓名:孙荣宗  苗夺谦  卫志华  李文
作者单位:(同济大学 a. 电子与信息工程学院计算机科学与技术系,b. 嵌入式系统与服务计算教育部重点实验室,上海 201804)
基金项目:国家自然科学基金资助项目,博士学科点专项科研基金资助项目
摘    要:
传统K最近邻一个明显缺陷是样本相似度的计算量很大,在具有大量高维样本的文本分类中,由于复杂度太高而缺乏实用性。为此,将粗糙集理论引入到文本分类中,利用上下近似概念刻画各类训练样本的分布,并在训练过程中计算出各类上下近似的范围。在分类过程中根据待分类文本向量在样本空间中的分布位置,改进算法可以直接判定一些文本的归属,缩小K最近邻搜索范围。实验表明,该算法可以在保持K最近邻分类性能基本不变的情况下,显著提高分类效率。

关 键 词:文本分类  K最近邻  粗糙集

Fast KNN Algorithm for Text Classification Based on Rough Set
SUN Rong-zong,MIAO Duo qian,WEI Zhi-hua,LI Wen. Fast KNN Algorithm for Text Classification Based on Rough Set[J]. Computer Engineering, 2010, 36(24): 175-177
Authors:SUN Rong-zong  MIAO Duo qian  WEI Zhi-hua  LI Wen
Affiliation:(a. Department of Computer Science and Technology, School of Electronics and Information Engineering; b. Key Laboratory of Embedded System and Service Computing, Ministry of Education, Tongji University, Shanghai 201804, China)
Abstract:
The traditional K Nearest Neighbor(KNN) has a fatal defect that time of similarity computing is huge. For text classification task with high dimension and huge samples, it has extremely complexity. This is not practicable for real applications. In this paper, rough set theory is introduced into classification process. The distribution of training samples is described with the concepts of upper approximation and lower approximation and also the range of upper approximation space and lower approximation space of each class are computed in the training process. According to the position of the documents in the sample space, this algorithm can label some documents directly. It reduces the searching range of KNN of some documents in the classification process. The results of experiments show that this algorithm can save largely the classification time and has almost the same classification performance as that of the traditional KNN classification algorithm.
Keywords:text classification  K Nearest Neighbor(KNN)  rough set
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《计算机工程》浏览原始摘要信息
点击此处可从《计算机工程》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号