首页 | 本学科首页   官方微博 | 高级检索  
     

基于多过滤器集成学习的在线垃圾邮件过滤
引用本文:刘伍颖,王挺.基于多过滤器集成学习的在线垃圾邮件过滤[J].中文信息学报,2008,22(1):67-73.
作者姓名:刘伍颖  王挺
作者单位:国防科技大学 计算机学院,湖南 长沙 410073, China
基金项目:国家自然科学基金 , 教育部跨世纪优秀人才培养计划
摘    要:垃圾邮件过滤就是在线对邮件做出Spam(垃圾)或Ham(非垃圾)的判断,这是一种根据客户反馈不断自学习的过程。本文抽取邮件的语言特征和行为特征构建多个简单过滤器,然后采用集成学习方法组合这些简单过滤器,获得了比简单过滤器更高的性能。实验表明单一特征学习的计算复杂性低、速度较快,而集成学习的效果更好。本文提出的将SVM集成学习用于邮件过滤的方法,在各种集成学习方法中效果最好。


关 键 词:计算机应用  中文信息处理  垃圾邮件过滤  机器学习  集成学习  支持向量机  
文章编号:1003-0077(2008)01-0067-07
收稿时间:2007-05-22
修稿时间:2007-12-03

Online Spam Filtering Based on Ensemble Learning of Multi-filter
LIU Wu-ying,WANG Ting.Online Spam Filtering Based on Ensemble Learning of Multi-filter[J].Journal of Chinese Information Processing,2008,22(1):67-73.
Authors:LIU Wu-ying  WANG Ting
Affiliation:School of Computer, National University of Defense Technology, Changsha, Hunan 410073, China
Abstract:Spam filtering is defined as a task trying to label Emails with Spam or Ham in an online situation,which is essentially a self-learning procedure with user's feedback.There are already some simple filters applying the linguistic features or behavior features.In this paper,we use the ensemble learning method to combine multi-filter and achieve a higher performance than the single one could.The experiment result shows the single feature learning is fast and the ensemble learning has better effects,in which the proposed SVM ensemble method has the highest performance.
Keywords:computer application  Chinese information processing  spam filtering  machine learning  ensemblelearning  support vector machine
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号