首页 | 本学科首页   官方微博 | 高级检索  
     

一种基于TF-IDF的朴素贝叶斯算法改进
引用本文:许甜华,吴明礼.一种基于TF-IDF的朴素贝叶斯算法改进[J].计算机技术与发展,2020(2):75-79.
作者姓名:许甜华  吴明礼
作者单位:北方工业大学信息学院
基金项目:国家自然科学基金(61672040)
摘    要:目前对以朴素贝叶斯算法为代表的文本分类算法,普遍存在特征权重一致,考虑指标单一等问题。为了解决这个问题,提出了一种基于TF-IDF的朴素贝叶斯改进算法TF-IDF-DL朴素贝叶斯算法。该算法以TF-IDF为基础,引入去中心化词频因子和特征词位置因子以加强特征权重的准确性。为了验证该算法的效果,采用了搜狗实验室的搜狗新闻数据集进行实验,实验结果表明,在朴素贝叶斯分类算法中引入TF-IDF-DL算法,能够使该算法在进行文本分类中的准确率、召回率和F 1值都有较好的表现,相比国内同类研究TF-IDF-dist贝叶斯方案,分类准确率提高8.6%,召回率提高11.7%,F 1值提高7.4%。因此该算法能较好地提高分类性能,并且对不易区分的类别也能在一定程度上达到良好的分类效果。

关 键 词:朴素贝叶斯  TF-IDF算法  去中心化  位置信息  特征权重

An Improved Naive Bayes Algorithm Based on TF-IDF
XU Tian-hua,WU Ming-li.An Improved Naive Bayes Algorithm Based on TF-IDF[J].Computer Technology and Development,2020(2):75-79.
Authors:XU Tian-hua  WU Ming-li
Affiliation:(School of Informatics,North China University of Technology,Beijing 100144,China)
Abstract:At present,the text classification algorithm represented by the naive Bayes algorithm generally has the same feature weights and single index.In order to solve this problem,we propose an improved TF-IDF-based naive Bayes algorithm,TF-IDF-DL naive Bayes algorithm.Based on TF-IDF,this algorithm introduces decentralized word frequency factor and feature word position factor to enhance the accuracy of feature weights.In order to verify its effect,we use Sogou’s Sogou news dataset to conduct experiments.The experiment shows that the TF-IDF-DL algorithm is introduced into the naive Bayesian classification algorithm,which can make the algorithm perform well in the accuracy,recall and F 1 value in text classification.Compared with the domestic similar research TF-IDF-dist Bayesian scheme,the classification accuracy rate is increased by 8.6%,the recall rate is increased by 11.7%,and the F 1 value is increased to 7.4%,so the proposed algorithm can improve the classification performance better and achieve a great classification effect to some extent for the indistinguishable categories.
Keywords:naive Bayes  TF-IDF algorithm  decentralization  location information  feature weight
本文献已被 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号