首页 | 本学科首页   官方微博 | 高级检索  
     

结合新型文档频和二进制可辨矩阵的特征选择
引用本文:马春华,朱颢东,钟勇.结合新型文档频和二进制可辨矩阵的特征选择[J].计算机应用,2009,29(8):2268-2271.
作者姓名:马春华  朱颢东  钟勇
作者单位:1. 绥化学院2. 中科院成都计算机应用研究所3. 中国科学院成都计算机应用研究所
摘    要:特征选择是文本分类的一个核心研究课题。分析了几种经典特征选择方法并总结了它们的不足,提出了一个新型文档频,引入粗糙集理论,并给出了一个基于二进制可辨矩阵的属性约简算法,最后把该属性约简算法同新型文档频结合起来,提供了一个综合的特征选择方法。该方法首先利用新型文档频进行特征初选以过滤掉一些词条,然后利用所提属性约简算法消除冗余。通过对人民网的8类新闻组,每类300篇文档的分类实验,结果表明此种特征选择方法在分类准确率和召回率上优于互信息、CHI和信息增益方法。

关 键 词:特征选择  文本分类  文档频  二进制可辨矩阵  粗糙集  属性约简  feature  selection  text  categorization  document  frequency  binary  discernibility  matrix  Rough  Set  (RS)  attribution  reduction  
收稿时间:2009-03-23
修稿时间:2009-05-14

Feature selection combining new document frequency with binary discernibility matrix
MA Chun-hua,ZHU Hao-dong,ZHONG Yong.Feature selection combining new document frequency with binary discernibility matrix[J].journal of Computer Applications,2009,29(8):2268-2271.
Authors:MA Chun-hua  ZHU Hao-dong  ZHONG Yong
Affiliation:1. Department of Computer Science and Technology;Suihua College;Suihua Heilongjiang 152061;China;2.Chengdu Institute of Computer Application;Chinese Academy of Sciences;Chengdu Sichuan 610041;3. Graduate School of the Chinese Academy of Sciences;Beijing 100039;China
Abstract:Feature selection is a core research topic in text categorization. Several classic feature selection methods were analyzed and their deficiencies were summarized. A new document frequency was proposed, and Rough Set (RS) theory was adopted to provide an attribute reduction algorithm based on binary discernibility matrix. Based on the attribute reduction algorithm and the new document frequency, a comprehensive feature selection method was given. The comprehensive method firstly used the new document frequen...
Keywords:feature selection  text categorization  document frequency  binary discernibility matrix  Rough Set (RS)  attribute reduction
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号