首页 | 本学科首页   官方微博 | 高级检索  
     

文本分类中特征权重算法的改进
引用本文:沈志斌,白清源.文本分类中特征权重算法的改进[J].南京师范大学学报,2008,8(4):95-98.
作者姓名:沈志斌  白清源
作者单位:福州大学数学与计算机科学学院,福州350002
基金项目:教育部留学回国人员启动基金、中科院软件所开放课题基金,福州大学科技发展基金,福建省教育厅基金
摘    要:TFIDF是文档特征权重表示常用方法.该方法简单易行,但忽略了特征词在各个类别中的分布情况,不能真正地反映特征词对区分每个类的贡献.针对这个不足,本文提出了BOR-TFIDF,来重新调整每个特征词对各个类别的区分度,即修正各个特征词的权重,并用分类器来验证其有效性.该方法优于原来的TFIDF算法,实验表明了改进的策略是可行的.

关 键 词:文本分类  特征权重  TFIDF  类别区分  BOR-TFIDF

Improvement of Feature Weighting Algorithm in Text Classification
Shen Zhibin,Bai Qingyuan.Improvement of Feature Weighting Algorithm in Text Classification[J].Journal of Nanjing Nor Univ: Eng and Technol,2008,8(4):95-98.
Authors:Shen Zhibin  Bai Qingyuan
Affiliation:Shen Zhibin,Bai Qingyuan(College of Mathematics , Computer Science,Fuzhou University,Fuzhou 350002,China)
Abstract:TFIDF is a kind of common methods used to measure the terms in a document.The method is easy but ignores the distribution of the feature in each class.So,it can not really reflect each feature's contribution to each class.Aiming at this shortage,we put forward the BOR-TFIDF and use it to readjust each feature's differentiation to each class,i.e.,modifies each feature's weight.Then the classifier is used to check its validaty.The method is better than traditional TFIDF and proves that the BOR-TFIDF method is...
Keywords:TFIDF  BOR-TFIDF
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号