首页 | 本学科首页   官方微博 | 高级检索  
     

一种基于互信息的改进文本特征选择
引用本文:刘海峰,陈琦,张以皓. 一种基于互信息的改进文本特征选择[J]. 计算机工程与应用, 2012, 48(25): 1-4,97
作者姓名:刘海峰  陈琦  张以皓
作者单位:1. 解放军理工大学理学院,南京,210007
2. 解放军理工大学指挥自动化学院,南京,210007
基金项目:国家自然科学基金(No.71071161)
摘    要:提出了一种优化互信息文本特征选择方法。针对互信息模型的不足之处主要从三方面进行改进:用权重因子对正、负相关特征加以区分;以修正因子的方式在MI中引入词频信息对低频词进行抑制;针对特征项在文本里的位置差异进行基于位置的特征加权。该方法改善了MI模型的特征选择效率。文本分类实验结果验证了提出的优化互信息特征选择方法的合理性与有效性。

关 键 词:文本分类  特征选择  互信息  特征降维

Improved mutual information method of feature selection in text categorization
LIU Haifeng , CHEN Qi , ZHANG Yihao. Improved mutual information method of feature selection in text categorization[J]. Computer Engineering and Applications, 2012, 48(25): 1-4,97
Authors:LIU Haifeng    CHEN Qi    ZHANG Yihao
Affiliation:1.Institute of Sciences,PLA University of Science and Technology,Nanjing 210007,China 2.Institute of Command Automation,PLA University of Science and Technology,Nanjing 210007,China
Abstract:This paper puts forward a kind of optimizing Mutual Information(MI)text characteristic selection method.Aiming at the MI’s deficiencies,it puts forward three approaches to improvement.The positive and negative features with the weight factors are distinguished.Through the introduction of the correct factors way,the low-frequency word is realized to restrain.According to the features position in the text,a further weighted method is put forward.In this way,the paper has improved the efficiency of MI model.Subsequent text classification experimental results show the proposed optimization MI and rationality of the method is effective.
Keywords:Text Categorization(TC)  feature selection  Mutual Information(MI)  feature reduction
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号