利用未标识文档提高中心分类法性能的研究 Research on Using Unlabled Text to Improve the Performance of Centroid-based Classification Algorithms期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

利用未标识文档提高中心分类法性能的研究

引用本文：	何尧,张顺淼.利用未标识文档提高中心分类法性能的研究[J].数字社区&智能家居,2007,3(16):1125-1126.

作者姓名：	何尧张顺淼

作者单位：	福建工程学院,计算机与信息科学系,福建,福州,350014 福建工程学院,计算机与信息科学系,福建,福州,350014

摘要：	中心分类法性能高效,但需要大量的训练文档(已标识文档)来训练分类器以保证分类的正确性.而训练文档因需花费大量人力物力来分类而数量有限,同时,网络上存在着很多未标识文档.为此,对中心分类法进行改进,提出了ONUC和0FFUC算法,以弥补当训练文档不足时,中心分类法性能急剧下降的缺陷.考虑到中心分类法易受孤立点的影响,采取了去边处理.实验证明,与普通的中心分类法、其它半监督经典算法比较,在训练文档很少的情况下,该算法能获得较好的性能.
关键词：	中心分类法文本分类未标识文档已标识文档
文章编号：	1009-3044(2007)16-31125-02
修稿时间：	2007年8月2日
Research on Using Unlabled Text to Improve the Performance of Centroid-based Classification Algorithms

HE Yao,ZHANG Shun-miao.Research on Using Unlabled Text to Improve the Performance of Centroid-based Classification Algorithms[J].Digital Community & Smart Home,2007,3(16):1125-1126.

Authors:	HE Yao ZHANG Shun-miao

Abstract:	Centroid-based Classification Algorithms is a high efficient class of Algorithms for Text Categorization.However,in order to obtain classification model well,it requires a number of labeled documents.in practical applications,labeled documents are often very sparse because manually labeling data is tedious and costly,while there are often abundant unlabeled documents.So,we propose OFFUC and ONUC algorithms to mend the matter that centroid-based classification algorithms degrade dramatically when the training data is scarce.Considering that the training data items that are far away from the center of its training category reduce the accuracy of classification.,we exclude them from consideration.Experiment results show that OFFUC and ONUC algorithms,proposed in this paper,can improve the performance of centroid-based Classification Algorithms and outperforms the generic semi-supervised methods when the the number of labeled text is very small.

Keywords:	Centroid-based Classification Algorithms Text Categorization Unlabled Document Labeled Document
本文献已被 CNKI 维普等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏