首页 | 本学科首页   官方微博 | 高级检索  
     

简体中文垃圾邮件分类的实验设计及对比研究
引用本文:李维杰,徐勇. 简体中文垃圾邮件分类的实验设计及对比研究[J]. 计算机工程与应用, 2007, 43(25): 128-132
作者姓名:李维杰  徐勇
作者单位:哈尔滨工业大学,深圳研究生院,生物计算研究中心,广东,深圳,518005;哈尔滨工业大学,深圳研究生院,生物计算研究中心,广东,深圳,518005
基金项目:国家自然科学基金 , 广东省自然科学基金
摘    要:综合分析了垃圾邮件过滤的技术路线与方法,并在分析基于关键字的方法和统计学的方法的基础上,提出了将两者相结合,运用模式识别中的贝叶斯、最近邻和感知机等分类方法,实现对垃圾邮件的过滤的技术路线。以互信息最大化准则筛选出的特征集为基础,对不同分类技术的对比分析揭示了贝叶斯、最近邻和感知机在垃圾邮件过滤应用上的优劣。同时,文中对基于互信息最大化准则的垃圾邮件过滤应用提出了有益的思路。

关 键 词:垃圾邮件  分类器  贝叶斯  最近邻  感知机
文章编号:1002-8331(2007)25-0128-05
修稿时间:2007-05-01

Simplified Chinese spam mail filter:design and performance evaluation
LI Wei-jie,XU Yong. Simplified Chinese spam mail filter:design and performance evaluation[J]. Computer Engineering and Applications, 2007, 43(25): 128-132
Authors:LI Wei-jie  XU Yong
Affiliation:Bio-Computing Research Center, Shenzhen Graduate School, Harbin Institute of Technology, Shenzhen, Guangdong 518005, China
Abstract:Paths to solving and methods of filtering unsolicited bulk e-mails,also known as spam,have been analyzed.And the method based on keyword and the statistical learning have been analyzed.Then a new method which is a combination of the two methods have been proposed.The method to filter spam using the na ve Bayesian decision theory,the nearest-neighbor classification,and the linear classification based the perceptron criterion function which is used in pattern classification has been introduced.The feature set used in the three theories have been gotten by mutual information.By comparied the three decision theories,the advantages and disadvantages of them has been presented.At same time,a good idea to filtering spam using mutual information has been pointed out in the paper.
Keywords:spam mail  classification  Bayesian decision  nearest-neighbor decision  perceptron criterion function
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号