首页 | 本学科首页   官方微博 | 高级检索  
     

基于假设检验的文本分类特征选择
引用本文:冯霞,刘志辉,田继存. 基于假设检验的文本分类特征选择[J]. 信息与控制, 2011, 40(3). DOI: 10.3724/SP.J.1219.2011.00273
作者姓名:冯霞  刘志辉  田继存
作者单位:中国民航大学计算机科学与技术学院,天津,300300
基金项目:国家自然科学基金,中国民航大学博士点启动基金
摘    要:在T-C(term-category)双向四格表中,特征与文档类相互独立与它们互不相关是等价的.基于此,本文应用了两种新颖的独立性假设检验方法来度量特征与文档类的相关程度,从文本集特征空间中选择能够高度代表文档内容的特征子集用于文本分类.实验结果表明,把假设检验应用于文本分类特征选择中,有利于提高分类性能.

关 键 词:特征选择  假设检验  文本分类  T-C双向四格表

Hypothesis Test-based Feature Selection for Text Categorization
FENG Xia,LIU Zhihui,TIAN Jicun. Hypothesis Test-based Feature Selection for Text Categorization[J]. Information and Control, 2011, 40(3). DOI: 10.3724/SP.J.1219.2011.00273
Authors:FENG Xia  LIU Zhihui  TIAN Jicun
Affiliation:FENG Xia,LIU Zhihui,TIAN Jicun (School of Computer Science and Technology,Civil Aviation University of China,Tianjin 300300,China)
Abstract:For the feature and the document category from a T-C(term-category) two-way four-fold contingency table, their mutual independence is equivalent to their mutual non-correlation.At this point,this paper uses two novel hypothesis test methods of independence to measure the degree of correlation between features and categories,and accordingly the high representative feature subset of the document content is selected out of the feature space of the text set for text categorization. The results of experiments sh...
Keywords:feature selection  hypothesis test  text categorization  T-C two-way four-fold contingency table  
本文献已被 CNKI 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号