首页 | 本学科首页   官方微博 | 高级检索  
     

一种基于词频信息的改进CHI文本特征选择
引用本文:刘海峰,苏展,刘守生.一种基于词频信息的改进CHI文本特征选择[J].计算机工程与应用,2013,49(22):110-114.
作者姓名:刘海峰  苏展  刘守生
作者单位:解放军理工大学 理学院,南京 210007
基金项目:国家自然科学基金(No.71071161,No.61273209);江苏省自然科学基金(No.BK2012511).
摘    要:CHI是一种常用的文本特征选择方法。针对该模型的不足之处,以特征项的频数为依据,分别从特征项的类内分布、类间分布以及类内不同文本之间分布等角度,对CHI模型进行逐步优化,使得特征项频数信息得到了有效利用。提出了一种基于词频信息的改进CHI模型。随后的文本分类试验证明了提出优化CHI模型的有效性。

关 键 词:文本分类  特征选择  [&chi  2]统计  类内分布  类间分布  

Improved CHI text feature selection based on word frequency information
LIU Haifeng,SU Zhan,LIU Shousheng.Improved CHI text feature selection based on word frequency information[J].Computer Engineering and Applications,2013,49(22):110-114.
Authors:LIU Haifeng  SU Zhan  LIU Shousheng
Affiliation:Institute of Sciences, PLA University of Science and Technology, Nanjing 210007, China
Abstract:CHI is a commonly used text feature selection method. Aiming at the shortcomings of the model, according to the fre- quency characteristic, the CHI model is gradually optimized from the feature distribution within class, distribution between class and the distribution between different text in the same category. This approach makes the characteristic frequency information has been used effectively. An improved CHI model based on word frequency information is proposed..The text categorization ex- periment subsequently proves the validity of the new optimized CHI model.
Keywords:text categorization  feature selection  Chi-square  distribution within class  distribution between class
本文献已被 维普 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号