首页 | 本学科首页   官方微博 | 高级检索  
     

抗好词攻击的中文垃圾邮件过滤模型
引用本文:邓蔚,秦志光,刘峤,程红蓉.抗好词攻击的中文垃圾邮件过滤模型[J].电子测量与仪器学报,2010,24(12):1146-1152.
作者姓名:邓蔚  秦志光  刘峤  程红蓉
作者单位:电子科技大学计算机科学与工程学院,成都611731
基金项目:国家自然科学基金,国家"863"计划
摘    要:针对当前中文垃圾邮件过滤领域面临的好词攻击威胁,提出了一种鲁棒的中文垃圾邮件过滤模型。该模型基于多示例学习机制,并结合中文分词和特征选择方法,将一封邮件转化为若干示例的组合,然后应用多示例逻辑回归模型进行学习和分类。对多示例学习而言,当一封邮件中至少有一个示例为垃圾信息时,该邮件为垃圾邮件,否则为正常邮件。分别对训练数据集和测试数据集进行好词攻击,在多个大规模中文垃圾邮件过滤公开数据库上进行了测试。实验结果表明,在中文邮件过滤领域对抗好词攻击,分类器使用多示例反击策略较之于单示例反击策略有更强的鲁棒性。

关 键 词:中文垃圾邮件过滤  敌手学习  多示例学习  逻辑回归  好词攻击  鲁棒性

Chinese spam filtering model for combating good word attacks
Deng Wei,Qin Zhiguang,Liu Qiao,Chen Hongrong.Chinese spam filtering model for combating good word attacks[J].Journal of Electronic Measurement and Instrument,2010,24(12):1146-1152.
Authors:Deng Wei  Qin Zhiguang  Liu Qiao  Chen Hongrong
Affiliation:Deng Wei Qin Zhiguang Liu Qiao Chen Hongrong (School of Computer Science and Engineering,University of Electronic Science and Technology of China,Chengdu 611731,China)
Abstract:To combat good word attacks in the field of Chinese spam filtering,a robust Chinese spam filtering model is proposed in this paper.This model is based on multiple instances learning mechanism and use Chinese word segmentation and feature selection methods to transform an email into a bag of multiple instances.Subsequently it ap-plies multiple instances logistic regression model on the bags.According to multiple instances learning method,an email is classified as spam if at least one instance in the corresponding bag is spam,and as legitimate if all the instances in it are legitimate.Considering good word attacks on training dataset and testing dataset,the performances of our model are evaluated on several large Chinese spam corpora.The experiment results show that a classifier using our multiple instance counterattack strategy is more robust than its single instance counterpart to good word attacks in Chi-nese spam filtering domain.
Keywords:Chinese spam filtering  adversarial learning  multiple instance learning  logistic regression  good word attacks  robustness
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号