首页 | 本学科首页   官方微博 | 高级检索  
     

基于文本过滤的贝叶斯分类算法的改进
引用本文:路金泉,徐开勇,戴乐育.基于文本过滤的贝叶斯分类算法的改进[J].计算机与现代化,2016,0(9):100.
作者姓名:路金泉  徐开勇  戴乐育
摘    要:针对传统贝叶斯分类算法无法满足复杂网络文本过滤需求,提出一种多词 贝叶斯分类算法(Multi Word-Bayes,MWB)。该算法一方面引入了特征权重(Term Frequency-Inverse Document Frequency,TF-IDF)的计算思想,优化了传统贝叶斯分类算法只考虑词频不考虑文本间关系的问题;另一方面将词与词间的关系作为文本分类的重要参考项,克服了传统贝叶斯分类算法在分类器训练上对语义分析的忽视。实验结果表明,MWB在垃圾文本过滤上具有更好的分类性能。

关 键 词:贝叶斯分类算法    TF-IDF  语义分析    文本过滤  
收稿时间:2016-09-13

Improvement of Bayes Classification Algorithm Based on Text Filtering
LU Jin-quan,XU Kai-yong,Dai Le-yu.Improvement of Bayes Classification Algorithm Based on Text Filtering[J].Computer and Modernization,2016,0(9):100.
Authors:LU Jin-quan  XU Kai-yong  Dai Le-yu
Abstract:As the complexity of the network, traditional Bayes classification algorithm cannot meet the demand of text filtering. Multi Word-Bayes (MWB) classification algorithm is proposed. On the one hand, Term Frequency-Inverse Document Frequency (TF-IDF) feature weight is introduced in MWB algorithm to optimize the traditional Bayes algorithm which only considers the problem of word frequency, but doesn’t consider the relationship between the texts. On the other hand, the new algorithm views the relationship between the word and the word as an important reference, which overcomes the traditional Bayes classification algorithm ignoring the semantic analysis on the classifier training. Experiment results show that MWB classification algorithm is of better classification effect on the text filtering.
Keywords:Bayes classification algorithm  TF-IDF  semantic analysis  text filtering  
点击此处可从《计算机与现代化》浏览原始摘要信息
点击此处可从《计算机与现代化》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号