首页 | 本学科首页   官方微博 | 高级检索  
     

基于词共现模型的垃圾邮件过滤方法研究
引用本文:张燕平,史科,徐庆鹏,谢飞.基于词共现模型的垃圾邮件过滤方法研究[J].中文信息学报,2009,23(6):61-67.
作者姓名:张燕平  史科  徐庆鹏  谢飞
作者单位:1. 安徽大学 计算智能与信号处理重点实验室,安徽 合肥 230039;
2.安徽广播电视大学 省直分校,安徽 合肥,230001;
3. 合肥工业大学,安徽 合肥 230009
基金项目:国家重点基础研究973计划资助项目,国家自然科学基金资助项目,教育部社科研究基金青年资助项目 
摘    要:垃圾邮件过滤就是对邮件做出是垃圾或非垃圾的判断。传统的表示邮件的方法是在向量空间模型基础上通过信息增益等特征选择方法提取一部分词来表示邮件内容,存在语义信息不足的问题。该文提出一种将传统方法和词共现模型结合起来表示邮件特征的新方法,再采用交叉覆盖算法对邮件进行分类得到邮件分类器。实验表明,该文提出的邮件过滤算法与传统方法相比提高了过滤性能,词共现选择的维度要比传统方法选择的维度更具有代表性。

关 键 词:计算机应用  中文信息处理  向量空间模型  垃圾邮件过滤  词共现模型  交叉覆盖算法
  

Spam Filter Based on Term Co-Occurrence Model
ZHANG Yanping,SHI Ke,XU Qingpeng,XIE Fei.Spam Filter Based on Term Co-Occurrence Model[J].Journal of Chinese Information Processing,2009,23(6):61-67.
Authors:ZHANG Yanping  SHI Ke  XU Qingpeng  XIE Fei
Affiliation:1. Key Laboratory of Intelligent Computing & Signal Processing, Ministry of Education,
Anhui University, Hefei, Anhui 230039,China;
2.Anhui Shengzhi Radio and TV University, Hefei, Anhui 230001,China;
3. Hefei University of Technology,Hefei, Anhui 230009,China
Abstract:The aim of spam filtering is to distinguish the spam and the ham. The traditional methods used vector space model and feature selection approaches to extract features representing the contents of emails. However, these methods do not take the semantic information among words into account. In this paper, a new method is proposed to extract email features by combining the vector space model and the term cooccurrence. The covering algorithm is then employed to classify emails. Experiments show that the proposed method significantly improves the filtering performances compared with traditional ones. The features selected by utilizing term cooccurrence model are more representative than those chosen by the vector space model.
Keywords:computer application  Chinese information processing  vector space model  spam filter  term cooccurrence model  covering algorithm
本文献已被 万方数据 等数据库收录!
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号