首页 | 本学科首页   官方微博 | 高级检索  
     

基于内容的垃圾电子邮件过滤技术研究
引用本文:马建斌,薛博洋.基于内容的垃圾电子邮件过滤技术研究[J].天津轻工业学院学报,2010(2):72-75.
作者姓名:马建斌  薛博洋
作者单位:[1]河北农业大学信息科学与技术学院,保定071001 [2]河北农业大学现代科技学院,保定071001
摘    要:提出一种过滤垃圾电子邮件的方法.通过tf-idf特征提取方法提取邮件的词汇特征,采用,特征选择方法选取有效的特征,并抽取几个具有明显区分能力的结构方面的特征,利用支持向量机算法对垃圾电子邮件进行自动过滤.对中科院中文垃圾邮件语料库(Cspam)的实验,识别正确率达到82%以上,另外,tf-idf词汇特征和结构特征搭配使用可以提高分类的正确率,表明此种方法能提高垃圾电子邮件过滤的准确性.

关 键 词:内容  垃圾电子邮件过滤  tf-idf  结构特征  支持向量机

Research on Technology of Spam Filtering Based on Contents
MA Jian-bin,XUE Bo-yang.Research on Technology of Spam Filtering Based on Contents[J].Journal of Tianjin University of Light Industry,2010(2):72-75.
Authors:MA Jian-bin  XUE Bo-yang
Affiliation:1. College of Information Science and Technology, Agricultural University of Hebei, Baoding 071001, China; 2.College of Modem Science and Technology, Agricultural University of Hebei, Baoding 071001, China)
Abstract:One method to filter spam was proposed. The tf-idf method was used to extract e-mail's lexical features. x^2 method was used to select effective features. The several structural features were extracted which could discriminate spain obviously. The support vector machine algorithm was adopted to filter spare automatically. By experimenting on dataset of Cspam, the evaluation value F is above 82%, the tf-idf lexical features and structural features combined can improve the classification accuracy, which proves that the method can approve the accuracy of filtering spam.
Keywords:contents  spare filtering  tf-idf  structural features  support vector machine
本文献已被 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号