首页 | 本学科首页   官方微博 | 高级检索  
     

基于文本分类TFIDF方法的改进与应用
引用本文:张玉芳,彭时名,吕佳. 基于文本分类TFIDF方法的改进与应用[J]. 计算机工程, 2006, 32(19): 76-78
作者姓名:张玉芳  彭时名  吕佳
作者单位:1. 重庆大学计算机学院,重庆,400045
2. 重庆师范大学数学与计算机科学学院,重庆,400047
摘    要:TFIDF是文档特征权值表示常用方法。该方法简单易行,但低估了在一个类中频繁出现的词条,该词条是能够代表这个类的文本特征的,应该赋予其较高的权重。通过修改TFIDF中IDF的表达式,来增加那些在一个类中频繁出现的词条的权重,用改进的TFIDF选择特征词条、用遗传算法训练分类器来验证其有效性。该方法优于其它算法,实验表明了改进的策略是可行的。

关 键 词:文本分类  特征选择  TFIDF  类别区分
文章编号:1000-3428(2006)19-0076-03
收稿时间:2005-12-25
修稿时间:2005-12-25

Improvement and Application of TFIDF Method Based on Text Classification
ZHANG Yufang,PENG Shiming,LV Jia. Improvement and Application of TFIDF Method Based on Text Classification[J]. Computer Engineering, 2006, 32(19): 76-78
Authors:ZHANG Yufang  PENG Shiming  LV Jia
Affiliation:(1. Department of Computer Science, Chongqing University, Chongqing 400045; 2. College of Mathematics and Computer Science, Chongqing Normal University, Chongqing 400047)
Abstract:TFIDF is a kind of common methods used to measure the terms in a document. The method is easy but it undervalues these terms that frequently appear in the documents belonging to the same class, while those terms can represent the characteristic of the documents of this class, so higher weight is entrusted to them. The expression of IDF in TFIDF is modified to increase the weight of those terms mentioned, then is applied to the experiment to validate it. In the experiment, the improved TFIDF is used to select feature and genetic algorithm is used to train the classifier. The method is better than others and proves that the improved TFIDF method is feasible.
Keywords:THDF
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《计算机工程》浏览原始摘要信息
点击此处可从《计算机工程》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号