首页 | 本学科首页   官方微博 | 高级检索  
     

基于独立性理论的文本分类特征选择方法
引用本文:冯霞,刘志辉,田继存.基于独立性理论的文本分类特征选择方法[J].计算机工程,2010,36(12):22-24.
作者姓名:冯霞  刘志辉  田继存
作者单位:中国民航大学计算机科学与技术学院,天津,300300
基金项目:国家自然科学基金资助项目(60776806, 60672174);中国民航大学博士启动基金资助项目(06qd08s)
摘    要:特征与各个文档类在文本集中的独立程度体现了特征的代表性,文本分类的特征选择过程是选择能够提高分类性能的高代表性特征的过程。基于该原理提出DHChi2和EIBA 2种新的文本分类特征选择方法,对这2种方法进行合理的组合。实验结果表明,独立性理论应用于文本分类特征选择有利于提高分类性能。

关 键 词:特征选择  文本分类  假设检验  独立性理论

Feature Selection Method for Text Category Based on Independence Theory
FENG Xia,LIU Zhi-hui,TIAN Ji-cun.Feature Selection Method for Text Category Based on Independence Theory[J].Computer Engineering,2010,36(12):22-24.
Authors:FENG Xia  LIU Zhi-hui  TIAN Ji-cun
Affiliation:(School of Computer Science and Technology, Civil Aviation University of China, Tianjin 300300)
Abstract:The degree of independence between a feature and each document category reflects the representation of the feature in the text set, while the procedure of selecting features is just a procedure in which the high representative subset of features are selected in text category. This paper proposes two approaches of feature selection based on the principle——DHChi2 and EIBA, and rationally combines the two approaches. Experimental results show that applying the independence theory to feature selection for text categorization can improve categorization performance.
Keywords:feature selection  text category  hypothesis test  independence theory
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《计算机工程》浏览原始摘要信息
点击此处可从《计算机工程》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号