首页 | 本学科首页   官方微博 | 高级检索  
     

一种结合改进CHI和RFFS的特征选择算法研究
引用本文:邱宁佳,周 稳,王 鹏,陶 跃.一种结合改进CHI和RFFS的特征选择算法研究[J].计算机工程与应用,2018,54(21):133-140.
作者姓名:邱宁佳  周 稳  王 鹏  陶 跃
作者单位:长春理工大学 计算机科学技术学院,长春 130022
摘    要:针对传统CHI算法忽略特征词的词频易导致重要特征词被漏选的问题,结合特征选择时Filter类算法速度快、Wrapper类算法准确率高的特点,提出一种将改进CHI(TDF-CHI)算法与随机森林特征选择(RFFS)相结合的特征选择算法。先利用TDF-CHI算法计算特征词的文档频率及词频与类别的相关程度来进行特征选择,去除冗余特征;再通过RFFS算法度量剩余特征的重要性,进行二次特征选择,优化特征集合,使分类器的性能进一步提升。为了验证改进算法的优越性,利用新闻文本数据,在常用的分类器上进行测试。实验表明,改进算法相比传统CHI算法所选特征词具有更好的分类效果,提高了分类器的准确率和召回率。

关 键 词:特征选择  TDF-CHI  随机森林特征选择(RFFS)  文本分类  

Research on feature selection algorithm combined with improved CHI and RFFS
QIU Ningjia,ZHOU Wen,WANG Peng,TAO Yue.Research on feature selection algorithm combined with improved CHI and RFFS[J].Computer Engineering and Applications,2018,54(21):133-140.
Authors:QIU Ningjia  ZHOU Wen  WANG Peng  TAO Yue
Affiliation:College of Computer Science and Technology, Changchun University of Science and Technology, Changchun 130022, China
Abstract:The traditional CHI algorithm ignores the term frequency of the characteristic word, and it is easy to lead to the leaking of the important feature words. Fast speed of Filter algorithm and high accuracy of Wrapper algorithm are combined in Feature selection. A feature selection algorithm that combined improved CHI(TDF-CHI) with Random Forest Feature Selection(RFFS) are proposed. Firstly, TDF-CHI is used to select the feature and remove the redundant features, by calculating the correlation between the document frequency and the category, the correlation between the term frequency and the category. And then use the RFFS algorithm to measure the importance of the remaining features to carry out the second feature selection, optimize the feature set, so that the performance of the classifier is further improved. In order to verify the superiority of the improved algorithm, it is tested on news text data which is the commonly used data in classifier algorithms. The experiments show that the improved algorithm, which can improve the accuracy and recall rate of the classifier, has better classification effect compared with the traditional CHI algorithm.
Keywords:feature selection  TDF-CHI  Random Forest Feature Selection(RFFS)  text classification  
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号