首页 | 本学科首页   官方微博 | 高级检索  
     

一种基于改进互信息和信息熵的文本特征选择方法
引用本文:成卫青,唐旋.一种基于改进互信息和信息熵的文本特征选择方法[J].南京邮电学院学报(自然科学版),2013(5):63-68.
作者姓名:成卫青  唐旋
作者单位:南京邮电大学计算机学院,江苏南京210023
基金项目:国家自然科学基金(61170322,71171117)和江苏省自然科学基金(BK2010524)资助项目
摘    要:互信息是一种常用的特征选择评价函数,但研究表明它会导致分类精度相对较低.文中针对互信息倾向选择低频词的不足,提出了一种新的特征评价函数TFMIIE,将信息熵和改进互信息相结合,其中改进互信息能够避免偏向低频的生僻词,而特征熵有利于去除类别不确定的特征词.实验结果表明,采用TFMIIE进行特征选择,用得到的特征子集表示文本和构建分类器,文本分类的准确率与召回率比采用互信息的方法提高了约40%,验证了所提出的基于改进互信息和信息熵的文本特征选择方法是有效的.

关 键 词:特征选择  文本分类  评价函数  互信息  信息熵

A Text Feature Selection Method Using the Improved Mutual Information and Information Entropy
CHENG Wei-qing,TANG Xuan.A Text Feature Selection Method Using the Improved Mutual Information and Information Entropy[J].Journal of Nanjing University of Posts and Telecommunications(Natural Science),2013(5):63-68.
Authors:CHENG Wei-qing  TANG Xuan
Affiliation:1.School of Computer Science & Technology, Nanjing University of Posts and Telecommunications, Nanjing 210023, China;)
Abstract:Mutual information is a generally used evaluation function for feature selection.But research showed that it might lead to the low classification accuracy.To overcome the problem that the mutual information is incline to selecting low-frequency words,this paper proposes a new feature evaluation function TFMIIE for feature selection,which combines the information entropy with the improved mutual information.The improved mutual information avoids to selecting the low-frequency unfamiliar words,and the entropy of feature favors to removing the feature words with unclear class properties.Experimental results show that using TFMIIE to select feature,and to repesent text and build classifiers can achieve better precision ratio and recall ratio of text classification with about 40% increasing,which validated the proposed text feature selection method using the improved mutual information and the information entropy.
Keywords:feature selection  text classification  evaluation function  mutual information  information entropy
本文献已被 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号