首页 | 本学科首页   官方微博 | 高级检索  
     

基于索引项权重的文本特征选择方法
引用本文:王海鹃,韩立新,甄志龙.基于索引项权重的文本特征选择方法[J].计算机工程与设计,2010,31(5).
作者姓名:王海鹃  韩立新  甄志龙
作者单位:1. 通化师范学院数学系,吉林,通化,134002
2. 河海大学计算机及信息工程学院,江苏,南京,210098
3. 通化师范学院计算机科学系,吉林,通化,134002
基金项目:国家自然科学基金项目 
摘    要:为改善文本分类的效率和效果,降低计算复杂度,在分析了经典的特征选择方法后,提出加权的文本特征选择方法.该方法不仅利用数据集中文本的个数,还充分考虑到索引项的权重信息,并构造新的评估函数,改进了信息增益、期望交又熵以及文本证据权.利用KNN分类器在Reuters-21578标准数据集上进行训练和测试.实验结果表明,该方法能够选出有效特征,提高文本分类的性能.

关 键 词:文本分类  特征选择  索引项权重  信息增益  期望交叉熵  文本证据权

Feature selection based on term weight for text categorization
WANG Hai-juan,HAN Li-xin,ZHEN Zhi-long.Feature selection based on term weight for text categorization[J].Computer Engineering and Design,2010,31(5).
Authors:WANG Hai-juan  HAN Li-xin  ZHEN Zhi-long
Affiliation:WANG Hai-juan1,HAN Li-xin2,ZHEN Zhi-long3 (1. Department of Mathematics,Tonghua Normal College,Tonghua 134002,China,2. College of Computer , Information Engineering,Hohai University,Nanjing 210098,3. Department of Computer Science,China)
Abstract:To improve the efficiency and effectiveness and reduce computational complexity for text categorization,text feature selection with term weight is proposed based on the classical method. This method not only used the numbers of documents in datasets,but also fully took the information of term weight into account in the text. Thus,new evaluation function is constructed. It works better than information gain,expected cross entropy and weight of evidence for text. Using K-Nearest neighbor classifier,Reuters-21...
Keywords:text categorization  feature selection  term weight  information gain  expected cross entropy  weight of evidence for text
本文献已被 CNKI 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号