首页 | 本学科首页   官方微博 | 高级检索  
     

文本分类中特征选择方法的比较与改进
引用本文:单丽莉,刘秉权,孙承杰. 文本分类中特征选择方法的比较与改进[J]. 哈尔滨工业大学学报, 2011, 0(Z1): 319-324
作者姓名:单丽莉  刘秉权  孙承杰
作者单位:哈尔滨工业大学计算机科学与技术学院
基金项目:国家自然科学基金资助项目(61073127)
摘    要:为了在面向旅游领域的文本分类系统中选择有效的分类特征,提高分类性能,本文根据系统采用的训练集、训练过程及分类算法等因素重新对各常用的特征选择方法进行了综合实验评测,比较了五种常用的特征选择方法,对于评测结果最好的三种函数:期望交叉熵、信息增益和互信息,通过理论分析和科学实验,分别提出了不同的改进方法.实验结果表明改进的期望交叉熵方法在本应用中能够最有效地提高系统的分类性能.

关 键 词:文本分类  特征选择  期望交叉熵

Comparison and Improvement of feature selection method for text categorization
SHAN Li-li,LIU Bing-quan,SUN Cheng-jie. Comparison and Improvement of feature selection method for text categorization[J]. Journal of Harbin Institute of Technology, 2011, 0(Z1): 319-324
Authors:SHAN Li-li  LIU Bing-quan  SUN Cheng-jie
Affiliation:1.School of Computer Science and Technology,Harbin Institute of Technology,Harbin 150001,China,2.School of Computer Science and Technology,Harbin Institute of Technology,Harbin 150001,China)
Abstract:Feature selection is highly relative to the performance of text categorization systems.In this paper,we measured the effects of several popular feature selection methods for increasing the performance of a text categorization oriented to tourism domain.Out of five methods,three methods with better performance were chosen.They are expected cross entropy,information gain and mutual information.Through theoretical analysis and experiments,we modified the three methods respectively.Experimental results revealed that the modified expected cross entropy method yielded better performance than the others in our application.
Keywords:Text categorization  Feature selection  Expected cross entropy
本文献已被 CNKI 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号