基于改进TF-IDF算法的文本分类方法研究 A Research on Text Classification Method Based on Improved TF-IDF Algorithm期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于改进TF-IDF算法的文本分类方法研究

引用本文：	贺科达,朱铮涛,程昱.基于改进TF-IDF算法的文本分类方法研究[J].广东工业大学学报,2016,33(5):49-53.

作者姓名：	贺科达朱铮涛程昱

作者单位：	广东工业大学信息工程学院，广东广州 510006

基金项目：	国家自然科学基金资助项目（11204043）

摘要：	类别关键词是文本分类首先要解决的关键问题，在研究利用类别关键词及TF-IDF算法对文本进行分类的基础上，提出了一种改进的TF-IDF算法.首先建立类别关键词库，并对其进行扩充及去重，克服了向量空间模型不能很好调节权重的缺点.通过加入文档长度权值修正文档中关键词的权重，有效地解决了原有特征词条类别区分能力不足的问题.采用贝叶斯分类方法，结合实验验证了该算法的有效性，提高了文本分类的准确度.
关键词：	关键词提取特征选择文本分类预处理
收稿时间：	2015-09-22
A Research on Text Classification Method Based on Improved TF-IDF Algorithm

HE Ke-da,ZHU Zheng-tao,CHENG Yu.A Research on Text Classification Method Based on Improved TF-IDF Algorithm[J].Journal of Guangdong University of Technology,2016,33(5):49-53.

Authors:	HE Ke-da ZHU Zheng-tao CHENG Yu

Affiliation:	School of Information Engineering, Guangdong University of Technology, Guangzhou 510006, China

Abstract:	Establishing category keywords is the key problem in text classification, which should be solved first. On the basis of the classification of text by using the category keywords and TF-IDF algorithm, an improved TF-IDF algorithm has been proposed to overcome the shortcomings of the vector space model, which cannot well adjust the weights. Firstly, category keyword library should be established, and the expansion and duplication be carried out. The weight of keywords in the document is modified by the addition of the length of the document, and the shortage of the original features of the entry class distinction ability is solved effectively. By using Bayesian classification method, combined with the experiments, the effectiveness of the algorithm is verified, and the accuracy of text classification improved.

Keywords:	keyword extraction feature selection text classification pretreatment

	点击此处可从《广东工业大学学报》浏览原始摘要信息
	点击此处可从《广东工业大学学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏