首页 | 本学科首页   官方微博 | 高级检索  
     

一种新的中文文本分类算法-One ClassSVM—KNN算法
引用本文:刘文,吴陈. 一种新的中文文本分类算法-One ClassSVM—KNN算法[J]. 微机发展, 2012, 0(5): 83-86
作者姓名:刘文  吴陈
作者单位:江苏科技大学智能信息处理实验室,江苏镇江212003
摘    要:中文文本分类在数据库及搜索引擎中得到广泛的应用,K-近邻(KNN)算法是常用于中文文本分类中的分类方法,但K-近邻在分类过程中需要存储所有的训练样本,并且直到待测样本需要分类时才建立分类,而且还存在类倾斜现象以及存储和计算的开销大等缺陷。单类SVM对只有一类的分类问题具有很好的效果,但不适用于多类分类问题,因此针对KNN存在的缺陷及单类SVM的特点提出OneClassSVM—KNN算法,并给出了算法的定义及详细分析。通过实验证明此方法很好地克服了KNN算法的缺陷,并且查全率、查准率明显优于K-近邻算法。

关 键 词:中文文本分类  支持向量机  K-近邻  One  Class  SVM—KNN

A New Text Classification Algorithm One Class SVM-KNN
LIU Wen,WU Chen. A New Text Classification Algorithm One Class SVM-KNN[J]. Microcomputer Development, 2012, 0(5): 83-86
Authors:LIU Wen  WU Chen
Affiliation:( The Opening Laboratory of Intelligent Computing, Jiangsu University of Science and Technology1 Zhenjiang 212003 ,China)
Abstract:Text classification is widely used in database and search engine. KNN is widely used in Chinese text categorization,however, KNN has many defects in the application of text classification. The deficiency of KNN classification algorithm is that all the training sam- pies are kept until the samples are classified. When the size of samples is very large, the storage and computation will be costly, which will result in classification deviation. One class SVM is a simple and effective classification algorithm in one class. To solve KNN problems, a new algorithm based on harmonic one-class-SVM and KNN was proposed,which will achieve better classification effect. The experiment result is shown that the recall computed using the proposed method is obviously more highly than the KNN method.
Keywords:Chinese text classification  support vector machine  K-nearest neighbour  One Class SVlVI-KNN
本文献已被 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号