首页 | 本学科首页   官方微博 | 高级检索  
     

基于改进特征加权的朴素贝叶斯分类算法
引用本文:丁 月,汪学明.基于改进特征加权的朴素贝叶斯分类算法[J].计算机应用研究,2019,36(12).
作者姓名:丁 月  汪学明
作者单位:贵州大学 计算机科学与技术学院,贵州大学 计算机科学与技术学院
基金项目:国家自然科学基金项目([2011]61163049);贵州省自然科学基金资助项目黔科合J字([2014]7641)
摘    要:传统朴素贝叶分类算法没有根据特征项的不同对其重要程度进行划分,使得分类结果不准确。针对这一问题,引入Jensen-Shannon(JS)散度,用JS散度来表示特征项所能提供的信息量,并针对JS散度存在的不足,从类别内与类别间的词频、文本频以及用变异系数修正过的逆类别频率这三个方面考虑,对JS散度进行调整修正,最后计算出每一特征项的权值,将权值带入到朴素贝叶斯的公式中。通过与其他算法的对比实验证明,基于JS散度并从词、文本、类别三方面改进后的朴素贝叶斯算法的分类效果最好。因此基于JS散度特征加权的朴素贝叶斯分类算法与其他分类算法相比,其分类性能有很大提高。

关 键 词:文本分类    朴素贝叶斯    JS散度    词频    文本频率    类别频率
收稿时间:2018/7/9 0:00:00
修稿时间:2019/10/31 0:00:00

Naive Bayes classification algorithm based on improved feature weighting
Ding Yue and Wang Xueming.Naive Bayes classification algorithm based on improved feature weighting[J].Application Research of Computers,2019,36(12).
Authors:Ding Yue and Wang Xueming
Affiliation:College Of Computer Science And Technology Guizhou university,Guizhou Guiyang 550025,
Abstract:The traditional naive Bayes classification algorithm does not divide the importance degree according to the different feature items, which makes the classification result inaccurate. In order to solve this problem, this paper introduced Jensen-Shannon(JS) divergence and used JS divergence to express the amount of information provided by the feature terms. Aiming at the deficiency of JS divergence, the paper considered from the three aspects of word frequency, text frequency and inverse category frequency corrected by coefficient of variation, and then adjusted and corrected the JS divergence. Finally, it calculated the weight of each feature and introduced the weights into the naive Bayes formula. Compared with other algorithms, it proves that this method improves the naive Bayes classification algorithm effectively. Therefore, compared with other classification algorithms, the performance of naive Bayes classification algorithm based on JS divergence feature weighting is greatly improved.
Keywords:text classification  naive Bayes  Jensen-Shannon divergence  word frequency  document frequency  class frequency
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号