首页 | 本学科首页   官方微博 | 高级检索  
     

一种基于后缀数组聚类(SAC)的中文垃圾邮件过滤方法
引用本文:李翔鹰,陈钟,唐礼勇,李欣. 一种基于后缀数组聚类(SAC)的中文垃圾邮件过滤方法[J]. 计算机科学, 2006, 33(5): 107-109
作者姓名:李翔鹰  陈钟  唐礼勇  李欣
作者单位:北京大学计算机科学技术系,北京100871;辽宁工程技术大学电子信息与工程系,辽宁阜新123000
摘    要:贝叶斯算法在垃圾邮件过滤中应用广泛,但在中文垃圾邮件过滤中性能较低。本文通过聚类的思想,提出一种基于后缀数组聚类(SAC)的中文邮件特征项抽取方法,并给出了不同特征项抽取方法下贝叶斯算法的中文垃圾邮件过滤实验数据对比。实验表明,该方法显著提高了中文垃圾邮件的过滤性能。

关 键 词:朴素贝叶斯  垃圾邮件过滤  后缀数组

A Method of Chinese Spam Filtering Based on Suffix Array Clustering (SAC)
LI Xiang-Ying,CHEN Zhong,TANG Li-Yong,LI Xin. A Method of Chinese Spam Filtering Based on Suffix Array Clustering (SAC)[J]. Computer Science, 2006, 33(5): 107-109
Authors:LI Xiang-Ying  CHEN Zhong  TANG Li-Yong  LI Xin
Affiliation:Department of Computer Science and Technology , Peking University, Beijing 100871;Electronics and Information Engineering Department, Liaoning Technical University, Liaoning 123000
Abstract:The naivebayes algorithm has widely been applied to spam filtering. However,it has unsatisfactory performance in Chinese email filtering. Using clutering, this paper proposes a suffix array clustering based token extraction method for Chinese email,named SAC. It also shows the different filtering results of bayes under different token extraction methods. The experiments domenstrate the improvement of filtering performance of the method for Chinese sparn.
Keywords:Naive-bayes   Spare filtering   Suffix array clustering
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《计算机科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号