首页 | 本学科首页   官方微博 | 高级检索  
     

基于类别方差的特征权重算法
引用本文:周鹏程,刘旭敏,徐维祥. 基于类别方差的特征权重算法[J]. 计算机应用研究, 2018, 35(12)
作者姓名:周鹏程  刘旭敏  徐维祥
作者单位:首都师范大学 信息工程学院,首都师范大学信息工程学院,北京交通大学 交通运输学院
基金项目:国家自然科学基金项目(面上项目,重点项目,重大项目)
摘    要:特征权重计算是文本分类过程的基础,传统基于概率的特征权重算法,往往只对词频,逆文档频和逆类频等进行统计,忽略了类别之间的相互关系。而对于多分类问题,类别之间的关系对统计又有重要意义。因此,针对这一不足,本文提出了基于类别方差的特征权重算法,通过计算类别文档频率的方差来度量类别之间的联系,并在搜狗新闻数据集上对五种特征权重算法进行分类实验。结果表明,与其他四种特征权重算法相比,本文提出的算法在F1宏平均和F1微平均上都有较大的提高,提升了文本分类的效果。

关 键 词:特征权重;类别方差;文本分类;支持向量机
收稿时间:2017-08-09
修稿时间:2018-11-04

A feature weighting algorithm based on class variance
zhoupengcheng,liuxumin and xuweixiang. A feature weighting algorithm based on class variance[J]. Application Research of Computers, 2018, 35(12)
Authors:zhoupengcheng  liuxumin  xuweixiang
Abstract:The computation of feature weight is the basis of text categorization, and the traditional feature weighting algorithm based on probability always counts only word frequency, inverse document frequency and inverse class frequency, ignoring the relation among classes. For multi classification problems, the relationship between classes is important to statistics. Therefore, to solve this problem, this paper proposed a feature weight algorithm based on class variance to measure the relationship between categories, through variance category document frequency, and classification experiments on five kinds of feature weight algorithm in the data set on the Sogou news. The results show that compared with the other four feature weighting algorithms, the proposed algorithm has greatly improved the F1 macro mean and the F1 micro mean, and improved the effect of text categorization.
Keywords:feature weighting   class variance   text classification   SVM
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号