首页 | 本学科首页   官方微博 | 高级检索  
     

面向中文文本分类的C4.5Bagging算法研究
引用本文:张翔,周明全,耿国华,侯凡. 面向中文文本分类的C4.5Bagging算法研究[J]. 计算机工程与应用, 2009, 45(26): 135-137. DOI: 10.3778/j.issn.1002-8331.2009.26.039
作者姓名:张翔  周明全  耿国华  侯凡
作者单位:西北大学可视化技术研究所,西安,710127;西安建筑科技大学信息与控制工程学院,西安,710055;西北大学可视化技术研究所,西安,710127;北京师范大学信息科学与技术学院,北京,100875;西北大学可视化技术研究所,西安,710127
摘    要:对于中文文本分类问题,提出一种新的Bagging方法。这一方法以决策树C4.5算法为弱分类器,通过实例重取样获取多个训练集,将其结果按照投票规则进行合成,最终得到分类结果。实验证明,这种算法的准确率、查全率、F1值比C4.5、kNN和朴素贝叶斯分类器都高,具有更加优良的性能。

关 键 词:Bagging算法  C4.5算法  中文文本分类
收稿时间:2008-05-20
修稿时间:2008-8-1 

C4.5Bagging algorithm for Chinese text categorization
ZHANG Xiang,ZHOU Ming-quan,GENG Guo-hua,HUO Fan. C4.5Bagging algorithm for Chinese text categorization[J]. Computer Engineering and Applications, 2009, 45(26): 135-137. DOI: 10.3778/j.issn.1002-8331.2009.26.039
Authors:ZHANG Xiang  ZHOU Ming-quan  GENG Guo-hua  HUO Fan
Affiliation:1.Visualization Technology Institute,Northwest University,Xi’an 710127,China 2.College of Information and Control Engineering,Xi’an University of Architecture and Technology,Xi’an 710055,China 3.College of Information Science and Technology,Beijing Normal University,Beijing 100875,China
Abstract:Aiming at the problem of Chinese text classification,a new method of Bagging is developed.The decision tree C4.5 is selected as the weak classifier and multiple training sets are gained through re-sampling instance.Then,the outputs are combined by voting and the final classification results are obtained.The experimental results show that the classifier based on the C4.5Bagging gets higher precision,recall,F-measure and better performance than C4.5,kNN and Naive-Bayse.
Keywords:Bagging  C4.5  Chinese text categorization
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号