首页 | 官方网站   微博 | 高级检索  
     

垃圾邮件分类的偏依赖特性研究
引用本文:刘震,谭良,周明天.垃圾邮件分类的偏依赖特性研究[J].电子学报,2007,35(10):1870-1874.
作者姓名:刘震  谭良  周明天
作者单位:电子科技大学计算机学院卫士通安全联合实验室,四川成都,610054
基金项目:国家863高技术研究发展计划(No.863-104-03-01)
摘    要:由于相对于漏报,误报会对邮件过滤性能造成更负面的影响,因此有必要研究如何让邮件过滤器对误报代价表现出更高的敏感性.本文通过引入具有偏依赖特征的权值系数函数,提出了一种能够实现非对称训练学习的改进拟合Logistic Regression邮件分类算法模型.根据在实际邮件样本集上所作测试试验,在分类精度性能没有降低的条件下,验证了新分类模型在误报率和漏报率两项指标之间存在较明显的偏依赖特性,同时对扰动特征数据表现出较强鲁棒特性.

关 键 词:垃圾邮件  偏依赖  误报率  漏报率
文章编号:0372-2112(2007)10-1870-05
修稿时间:2006-09-25

Research on the Characteristic of Partial Dependency for Spam Classification
LIU Zhen,TAN Liang,ZHOU Ming-tian.Research on the Characteristic of Partial Dependency for Spam Classification[J].Acta Electronica Sinica,2007,35(10):1870-1874.
Authors:LIU Zhen  TAN Liang  ZHOU Ming-tian
Affiliation:Westone United Laboratory of College of Computer Science and Engineering, UESTC, Chengdu, Sichuan 610054 China
Abstract:Since false positive,compared with false negative,would cause much higher negative influence on email filter's performance,it is necessary to investigate how to make the email filter become more sensitive to handle the cost of false positive.This paper brings forward an advanced fitting Logistic Regression model for spam discrimination by introducing a coefficient-weighted function which can help to implement unbalanced classifier training.Without performance degradation on classification precision,the results of the performance evaluation on actual email testing sets verify that the new categorization model is of the partial dependent characteristic evidently between the criteria of false positive ratio and false negative ratio.Meanwhile,the testing results suggest that the model is robust to perturbing data as well.
Keywords:spam  characteristic of partial dependency  false positive ratio  false negative ratio
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号