首页 | 本学科首页   官方微博 | 高级检索  
     

自动文本分类中权值公式的改进
引用本文:寇莎莎,魏振军.自动文本分类中权值公式的改进[J].计算机工程与设计,2005,26(6):1616-1618.
作者姓名:寇莎莎  魏振军
作者单位:解放军信息工程大学,信息研究系,河南,郑州,450002
摘    要:在自动文本分类中,TF-IDF公式是常用的词语权重计算公式,但是TF-IDF公式是一种经验公式,并没有坚实的理论基础,它并不适用于任何情况下。通过信息论和概率证明了,在训练文本同属一个类别时,词语的重要性与词语的文档频率成正比,并对TF-IDF进行了改进,得到了改进的权值公式。改进的权值公式与TF-IDF公式进行实验比较,实验结果表明改进的权值公式提高了算法的分类精度。

关 键 词:文本分类  TF-IDF  向量空间  特征项  特征权重
文章编号:1000-7024(2005)06-1616-03

Improved weighting formula in auto text classification
KOU Sha-sha,WEI Zhen-jun.Improved weighting formula in auto text classification[J].Computer Engineering and Design,2005,26(6):1616-1618.
Authors:KOU Sha-sha  WEI Zhen-jun
Abstract:In auto textclassification, TF-IDF is often used when the weight of a term is calculated. But it is only an experiential formula, so it is not employed at any time. By using information theory and probability is it proven, the importance of a term is in proportion to document frequency of the term. TF-IDF is improved, and an improved weighting formula is obtained. And better performance is achieved when texts of the training set are belonged to a category in the experiment and applications in a categorization system.
Keywords:text classification  TF-IDF  vector space model  feature term  term weighting
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号