首页 | 本学科首页   官方微博 | 高级检索  
     

基于朴素贝叶斯模型的邮件过滤技术
引用本文:杨赫,孙广路,何勇军.基于朴素贝叶斯模型的邮件过滤技术[J].哈尔滨理工大学学报,2014(1):49-53.
作者姓名:杨赫  孙广路  何勇军
作者单位:[1]哈尔滨理工大学计算机科学与技术学院,黑龙江哈尔滨150080 [2]哈尔滨理工大学信息安全与智能技术研究中心,黑龙江哈尔滨150080
基金项目:黑龙江省普通高等学校新世纪优秀人才培养计划(1155-ncet-008);教育部人文社科项目(11YJC740048);黑龙江省教育科学规划课题(GBC1211062):黑龙江省高等教育教学改革项目(2011-NP33).
摘    要:针对朴素贝叶斯算法应用于反垃圾邮件过滤时,其有效性十分依赖于对邮件内容的有效建模,而邮件内容建模方面研究尚不成熟限制了贝叶斯方法在垃圾邮件过滤中的性能.采用了三种概率分布对邮件内容进行建模,据此提出了3种概率分布下的朴素贝叶斯算法.为了提高训练效率,算法采用了一种增量式的垃圾邮件过滤方法.在trec05p-1、trec06p两个公开数据集上对这3种贝叶斯算法进行了实验对比,分析出三种贝叶斯分布的适用范围.从不同分布的邮件内容建模角度出发,为过滤垃圾邮件的方法选择提供了有效依据.

关 键 词:邮件过滤  朴素贝叶斯  机器学习

SPAM Filtering with Naive Bayes
YANG He,SUN Guang-lu,',HE Yong-jun.SPAM Filtering with Naive Bayes[J].Journal of Harbin University of Science and Technology,2014(1):49-53.
Authors:YANG He  SUN Guang-lu    HE Yong-jun
Affiliation:1'2 (1. School of Computer Science and Technology, Harbin University of Science and Technology, Harbin 150080, China; 2. Research Center of Information Security and Intelligent Technology, Harbin University of Science and Technology, Harbin 150080, China)
Abstract:Abstract:The effectiveness of Naive Bayes in spare filtering depends on the modelling of the mail contents. However, mail content modelling is not mature, which limits the performance of Bayesian method in spam filtering. This paper presents three kinds of probability distribution to model email content, and proposes three Na'gve Bayes algorithms based on different probability distributions. To improve training efficiency, the incremental training algo- rithm is utilized in the experimental procedure. Experiments on trec06p and trec05p - 1 show that the three pro- posed algorithms can achieve good performance in different sceneries. Such a finding also provides effective basis for the selection of the filtering methods.
Keywords:e-mail fiherring  naive bayes  machine learning
本文献已被 CNKI 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号