首页 | 本学科首页   官方微博 | 高级检索  
     

一种基于方差的文本特征选择算法
引用本文:袁轶,王新房.一种基于方差的文本特征选择算法[J].计算机工程,2012,38(12):155-157.
作者姓名:袁轶  王新房
作者单位:西安理工大学自动化与信息工程学院,西安,710048
摘    要:中文文本分类中传统特征选择算法在低维情况下分类效果不佳。为此,提出一种结合方差思想的评估函数,选出具有较强类别信息的词条,在保证整体分类性能不下降的同时,提高稀有类别的分类精度。采用中心向量分类器,在TanCorpV1.0语料上进行实验,结果表明,该方法在低维空间优势明显,与常用的文档频率、信息增益等9种特征选择算法相比,宏平均值均有较大提高。

关 键 词:文本分类  特征选择  方差  类别信息  宏平均
收稿时间:2011-07-15

Text Feature Selection Algorithm Based on Variance
YUAN Yi , WANG Xin-fang.Text Feature Selection Algorithm Based on Variance[J].Computer Engineering,2012,38(12):155-157.
Authors:YUAN Yi  WANG Xin-fang
Affiliation:(School of Automation & Information Engineering,Xi’an University of Technology,Xi’an 710048,China)
Abstract:The effectiveness of traditional feature selection method is not good when feature dimension is low.Anew method based on variance is proposed to solve this problem.This approach can select class information words in order to maintain categorization accuracy and improve the performance of rare classes.This paper gives a comparative analysis between the new method and other traditional feature selection methods such as Document Frequency(DF),Information Gain(IG),Mutual Information(MI),Chi-square Statistics(CHI),etc.Experiment takes Rocchio as the evaluation classifier.Experimental results on TanCorpV1.0 corpora show that the new feature selection Variance Feature Selection Method(VFSM) outperforms the traditional ones when using macro-averaged-measures F1.
Keywords:text categorization  feature selection  variance  class information  macro-averaged-measures
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《计算机工程》浏览原始摘要信息
点击此处可从《计算机工程》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号