首页 | 本学科首页   官方微博 | 高级检索  
     


SVM based adaptive learning method for text classification from positive and unlabeled documents
Authors:Tao Peng  Wanli Zuo  Fengling He
Affiliation:(1) College of Computer Science and Technology, Jilin University, No.2699 Qianjin Road, Changchun, Jilin, 130012, China;(2) Key Laboratory of Symbol Computation and Knowledge Engineering of the Ministry of Education, Changchun, 130012, China
Abstract:Automatic text classification is one of the most important tools in Information Retrieval. This paper presents a novel text classifier using positive and unlabeled examples. The primary challenge of this problem as compared with the classical text classification problem is that no labeled negative documents are available in the training example set. Firstly, we identify many more reliable negative documents by an improved 1-DNF algorithm with a very low error rate. Secondly, we build a set of classifiers by iteratively applying the SVM algorithm on a training data set, which is augmented during iteration. Thirdly, different from previous PU-oriented text classification works, we adopt the weighted vote of all classifiers generated in the iteration steps to construct the final classifier instead of choosing one of the classifiers as the final classifier. Finally, we discuss an approach to evaluate the weighted vote of all classifiers generated in the iteration steps to construct the final classifier based on PSO (Particle Swarm Optimization), which can discover the best combination of the weights. In addition, we built a focused crawler based on link-contexts guided by different classifiers to evaluate our method. Several comprehensive experiments have been conducted using the Reuters data set and thousands of web pages. Experimental results show that our method increases the performance (F1-measure) compared with PEBL, and a focused web crawler guided by our PSO-based classifier outperforms other several classifiers both in harvest rate and target recall.
Keywords:Text classification  Machine learning  Improved 1-DNF algorithm  SVM  PSO  Focused web crawling
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号