首页 | 本学科首页   官方微博 | 高级检索  
     

基于交互作用的文本分类特征选择算法
引用本文:唐小川,邱曦伟,罗亮. 基于交互作用的文本分类特征选择算法[J]. 计算机应用, 2018, 38(7): 1857-1861. DOI: 10.11772/j.issn.1001-9081.2018010114
作者姓名:唐小川  邱曦伟  罗亮
作者单位:电子科技大学 计算机科学与工程学院, 成都 611731
基金项目:国家自然科学基金资助项目(61602094)。
摘    要:针对文本分类中的特征选择问题,提出了一种考虑特征之间交互作用的文本分类特征选择算法——Max-Interaction。首先,通过联合互信息(JMI),建立基于信息论的文本分类特征选择模型;其次,放松现有特征选择算法的假设条件,将特征选择问题转化为交互作用优化问题;再次,通过最大最小法避免过高估计高阶交互作用;最后,提出一个基于前向搜索和高阶交互作用的文本分类特征选择算法。实验结果表明,Max-Interaction比交互作用权重特征选择(IWFS)的平均分类精度提升了5.5%,Max-Interaction比卡方统计法(Chi-square)的平均分类精度提升了6%,Max-Interaction在93%的实验中分类精度高于对比方法,因此,Max-Interaction能有效利用交互作用提升文本分类特征选择的性能。

关 键 词:特征选择  文本分类  交互作用  互信息  信息测度  
收稿时间:2018-01-16
修稿时间:2018-02-28

Interaction based algorithm for feature selection in text categorization
TANG Xiaochuan,QIU Xiwei,LUO Liang. Interaction based algorithm for feature selection in text categorization[J]. Journal of Computer Applications, 2018, 38(7): 1857-1861. DOI: 10.11772/j.issn.1001-9081.2018010114
Authors:TANG Xiaochuan  QIU Xiwei  LUO Liang
Affiliation:School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu Sichuan 611731, China
Abstract:Focusing on the issue of feature selection in text categorization, an interaction maximum feature selection algorithm, called Max-Interaction, was proposed. Firstly, an information theoretic feature selection model was established based on Joint Mutual Information (JMI). Secondly, the assumptions of the existing feature selection algorithms were relaxed, and the feature selection problem was transformed into an interaction optimization problem. Thirdly, the maximum of the minimum method was employed to avoid the overestimation of higher-order interaction. Finally, a text categorization feature selection algorithm based on sequential forward search and high-order interaction was proposed. In the comparison experiments, the average classification accuracy of Max-Interaction over Interaction Weight Feature Selection (IWFS) was improved by 5.5%; the average classification accuracy of Max-Interaction over Chi-square was improved by 6%; and Max-Interaction outperformed other methods on 93% of the experiments. Therefore, Max-Interaction can effectively improve the performance of feature selection in text categorization.
Keywords:feature selection   text categorization   interaction   Mutual Information (MI)   information measure
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号