首页 | 本学科首页   官方微博 | 高级检索  
     

利用未标记数据提高SVM分类器性能的研究
引用本文:祝宇,聂峰光,郭力. 利用未标记数据提高SVM分类器性能的研究[J]. 计算机工程与应用, 2006, 42(27): 166-167,170
作者姓名:祝宇  聂峰光  郭力
作者单位:中国科学院过程工程研究所多相反应实验室,北京,100080;中国科学院研究生院,北京,100049;中国科学院过程工程研究所多相反应实验室,北京,100080
摘    要:监督学习算法的一个主要困难在于需要大量标记过的训练集数据,采用人工的方法不够现实。文章提出了SVM分类器在少量标记训练样本情况下,采用Rocchio法和KNN方法从大量的未标记数据中,挑选相似度较高、区别度较大的数据加入到训练集中,弥补训练样本的不足。实验表明该算法有效地利用了丰富的未标记数据,减少了人工标记量,较好地提高了SVM分类器的性能。

关 键 词:文本分类  未标记  Rocchio法  K近邻法  支持向量机
文章编号:1002-8331-(2006)27-0166-02
收稿时间:2005-12-01
修稿时间:2005-12-01

Research on Using Unlabeled Data to Improve the Performance of SVM Classifier
ZHU Yu,NIE Feng-guang,GUO Li. Research on Using Unlabeled Data to Improve the Performance of SVM Classifier[J]. Computer Engineering and Applications, 2006, 42(27): 166-167,170
Authors:ZHU Yu  NIE Feng-guang  GUO Li
Affiliation:1.Lab of Multi-Phase Reaction,Institute of Process Engineering, Chinese Academy of Sciences,Beijing 100080; 2.Graduate University of Chinese Academy of Sciences,Beijing 100049
Abstract:The trouble of supervision method is the demand of much labeled data for training.It's difficult by human inspection.This paper presents a method to use the unlabeled data to improve the performance of SVM classifier,which has only a few labeled training examples.It selects some data with high similarity and appropriate difference from the unlabeled data set by the algorithm of Rocchio or K Nearest Neighbor.Experiments show that this method can use the unlabeled data effectively,reduce the need of labeled data,and improve the classifier's performance.
Keywords:text categorization  unlabeled  Rocchio's algorithm  K Nearest Neighbor  Support Vector Machine
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号