基于主动学习的文档分类 Active Learning Based Text Categorization期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于主动学习的文档分类

引用本文：	覃刚力,黄科,杨家本.基于主动学习的文档分类[J].计算机科学,2003,30(10):45-48.

作者姓名：	覃刚力黄科杨家本

作者单位：	1. 清华大学自动化系,北京,100084 2. 清华大学计算机系,北京,100084

摘要：	In the field of text categorization,the number of unlabeled documents is generally much gretaer than that of labeled documents. Text categorization is the problem of categorization in high-dimension vector space, and more training samples will generally improve the accuracy of text classifier. How to add the unlabeled documents of training set so as to expand training set is a valuable problem. The theory of active learning is introducted and applied to the field of text categorization in this paper ,exploring the method of using unlabeled documents to improve the accuracy oftext classifier. It is expected that such technology will improve text classifier's accuracy through adopting relativelylarge number of unlabelled documents samples. We brought forward an active learning based algorithm for text categorization,and the experiments on Reuters news corpus showed that when enough training samples available,it′s effective for the algorithm to promote text classifier's accuracy through adopting unlabelled document samples.
关键词：	机器学习主动学习文档分类算法特征提取
Active Learning Based Text Categorization

QIN Gang-Li HUANG Ke YANG Jia-Ben.Active Learning Based Text Categorization[J].Computer Science,2003,30(10):45-48.

Authors:	QIN Gang-Li HUANG Ke YANG Jia-Ben

Abstract:	In the field of text categorization,the number of unlabeled documents is generally much gretaer than that of labeled documents. Text categorization is the problem of categorization in high-dimension vector space, and more training samples will generally improve the accuracy of text classifier. How to add the unlabeled documents of training set so as to expand training set is a valuable problem. The theory of active learning is introducted and applied to the field of text categorization in this paper,exploring the method of using unlabeled documents to improve the accuracy of text classifier. It is expected that such technology will improve text classifier's accuracy through adopting relatively large number of unlabelled documents samples. We brought forward an active learning based algorithm for text categorization,and the experiments on Reuters news corpus showed that when enough training samples available,it's effective for the algorithm to promote text classifier's accuracy through adopting unlabelled document samples.

Keywords:	Active learning Text categorization VSM Machine learning
本文献已被 CNKI 维普万方数据等数据库收录！
	点击此处可从《计算机科学》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏