首页 | 本学科首页   官方微博 | 高级检索  
     

基于特征类别属性分析的文本分类器分类噪声裁剪方法
引用本文:王强, 关毅, 王晓龙. 基于特征类别属性分析的文本分类器分类噪声裁剪方法. 自动化学报, 2007, 33(8): 809-816. doi: 10.1360/aas-007-0809
作者姓名:王强  关毅  王晓龙
作者单位:1.哈尔滨工业大学计算机科学与技术学院 哈尔滨 150001
摘    要:提出一种应用文本特征的类别属性进行文本分类过程中的类别噪声裁剪 (Eliminating class noise, ECN) 的算法. 算法通过分析文本关键特征中蕴含的类别指示信息, 主动预测待分类文本可能归属的类别集, 从而减少参与决策的分类器数目, 降低分类延迟,提高分类精度. 在中、英文测试语料上的实验表明, 该算法的 F 值分别达到 0.76 与 0.93, 而且分类器运行效率也有明显提升, 整体性能较好. 进一步的实验表明,此算法的扩展性能较好, 结合一定的反馈学习策略, 分类性能可进一步提高, 其 F 值可达到 0.806 与 0.943.

关 键 词:类别属性分析   类别噪声裁剪   文本分类
收稿时间:2006-04-24
修稿时间:2006-04-24

A Method for Eliminating Class Noise in Text Classification Based on Feature Class Attribute
WANG Qiang, GUAN Yi, WANG Xiao-Long. A Method for Eliminating Class Noise in Text Classification Based on Feature Class Attribute. ACTA AUTOMATICA SINICA, 2007, 33(8): 809-816. doi: 10.1360/aas-007-0809
Authors:WANG Qiang  GUAN Yi  WANG Xiao-Long
Affiliation:1. School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001
Abstract:This paper presents a novel algorithm for eliminating class noise based on the analysis of the feature class attribute in text classification.The algorithm can eliminate class noise for classifier by mining the most representative class information of text features,which means that the algorithm can actively prejudge the candidate class labels to unseen documents using the class attribute linked to features and classify them in the candidate class spaces to reduce the number of decisions,retrench time expense,and promote accuracy.The experimental results on Chinese and English corpus show that the algorithm has good performance.The F measure is 0.76 and 0.93,respectively,and the run efficiency of classifier has been improved greatly.A further experiment indicates that the algorithm has good expansibility.Based on a certain feedback learning strategy,the F measure can be further improved to 0.806 and 0.943.
Keywords:Class attribute analysis  eliminating class noise  text classification
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《自动化学报》浏览原始摘要信息
点击此处可从《自动化学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号