首页 | 本学科首页   官方微博 | 高级检索  
     

无标记训练样本的Web文本分类方法
引用本文:刘丽珍,宋瀚涛,陆玉昌.无标记训练样本的Web文本分类方法[J].计算机科学,2006,33(3):200-201.
作者姓名:刘丽珍  宋瀚涛  陆玉昌
作者单位:1. 首都师范大学信息工程学院,北京,100037
2. 北京理工大学计算机系,北京,100081
3. 清华大学计算机系,北京,100084
基金项目:科技部科研项目;北京市优秀人才培养基金
摘    要:在文本分类中获得有类别标记训练样本的代价是很高昂的,本文针对这个问题对传统的模糊聚类方法进行改进,提出模糊划分聚类方法FPCM,将聚类的无监督性和样本的先验知识结合起来,通过相似度度量聚类相关文本,取得比较客观的簇和少量标记文本,为监督学习找到分类依据,并结合朴素贝叶斯增量学习方式进行分类器的学习.本文进一步用估计分类误差损失的方法平衡选取候选样本,提高了分类准确率,实现了应用范围更加广泛的无标记文本分类学习模型.

关 键 词:Web文本分类  模糊聚类  朴素贝叶斯

The Method of Web Text Classification of Using Non-labeled Training Sample
LIU Li-Zhen,SONG Han-Tao,LU Yu-Chang.The Method of Web Text Classification of Using Non-labeled Training Sample[J].Computer Science,2006,33(3):200-201.
Authors:LIU Li-Zhen  SONG Han-Tao  LU Yu-Chang
Affiliation:1.Information Engineering College, Capital Normal University, Beijing 100037;2. Department of Computer, Beijing Institute of Technology, Beijing 100081;3. Department of Computer, Tsinghua University, Beijing 100084
Abstract:Bayes learning theory is to obtain estimate of non-labeled samples by transcendental information and sample data.The application of text classification is to classify non-labeled texts by learning labeled class samples.But it is very difficult to obtain labeled training samples.In the paper the problem is analyzed in point of clustering view.The clustering is a non-supervised learning method,and has a character of independence on defined classes and labeled train- ing samples.The thesis improve on tradition fuzzy clustering to bring forward Fuzzy Partition Clustering Method (FPCM).FPCM is a dynamic clustering method based on centroid technique.A few labeled texts are obtained to find classification foundation for supervised learning by fuzzy Partition clustering non-labeled Web texts.The sample's transcendental knowledge and clustering's non-supervisory are combined,and correlation texts are clustered by meas- uring similar degree.Naive Bayes augment learning style is further used to design and learn classifier.At the same time,classification precision is advanced using the way of selecting balance candidate samples after estimating the loss of classifying error.The model of text classifying using non-labeled training sample with more extensive application is realized.
Keywords:Web text classification  Fuzzy clustering  Naive Bayes
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《计算机科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号