首页 | 本学科首页   官方微博 | 高级检索  
     

基于内容的垃圾邮件过滤技术综述
引用本文:王斌,潘文锋.基于内容的垃圾邮件过滤技术综述[J].中文信息学报,2005,19(5):3-12.
作者姓名:王斌  潘文锋
作者单位:中国科学院计算技术研究所,北京 100080
基金项目:国家重点基础研究发展计划(973计划)
摘    要:垃圾邮件问题日益严重,受到研究人员的广泛关注。基于内容的过滤是当前解决垃圾邮件问题的主流技术之一。目前基于内容的垃圾邮件过滤主要包括基于规则的方法和基于概率统计的方法。本文综述了目前用于垃圾邮件过滤研究的各种语料和评价方法,并总结了目前使用的垃圾邮件过滤技术以及它们之间的对比实验,包括Ripper、决策树、Rough Set 、Rocchio 、Boosting、Bayes、kNN、SVM、Winnow 等等。实验结果表明,Boosting、Flexible Bayes、SVM、Winnow 方法是目前较好的垃圾邮件过滤方法,它们在评测语料上的结果已经达到很高水平,但是,要走向真正实用化,还有很多的工作要做。

关 键 词:计算机应用  中文信息处理  综述  垃圾邮件  反垃圾邮件  信息过滤  文本分类  
文章编号:1003-0077(2005)05-0001-10
收稿时间:2004-09-02
修稿时间:2004-09-022005-03-10

A Survey of Content-based Anti-spam Email Filtering
WANG Bin,PAN Wen-feng.A Survey of Content-based Anti-spam Email Filtering[J].Journal of Chinese Information Processing,2005,19(5):3-12.
Authors:WANG Bin  PAN Wen-feng
Affiliation:Institute of Computing Technology , Chinese Academy of Sciences ,Beijing 100080 ,China
Abstract:The volume of junk emails on the Internet has grown tremendously in th e past few years and is causing serious problems. Content-based filtering is on e of the mainstream technologies used so far. This paper aims to provide an overv iew on the state of art in this research field, including benchmark corpora, eva luation methods and filtering approaches. Many filtering approaches, including R ipper, Decision Trees, Rough Sets, Rocchio, Boosting, Bayes, kNN, SVM and Winnow , are discussed and compared in this paper. The experimental results show that s ome approaches, such as Boosting, Flexible Bayes, SVM, Winnow, can achieve very good results on research corpora. However, much more work should be done for pra ctical use.
Keywords:computer application  Chinese information proc essing  overview  junk email  anti-spam  information filtering  text classific ation
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号