抗好词攻击的中文垃圾邮件过滤模型 Chinese spam filtering model for combating good word attacks期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

抗好词攻击的中文垃圾邮件过滤模型

引用本文：	邓蔚,秦志光,刘峤,程红蓉.抗好词攻击的中文垃圾邮件过滤模型[J].电子测量与仪器学报,2010,24(12):1146-1152.

作者姓名：	邓蔚秦志光刘峤程红蓉

作者单位：	电子科技大学计算机科学与工程学院,成都611731

基金项目：	国家自然科学基金，国家"863"计划

摘要：	针对当前中文垃圾邮件过滤领域面临的好词攻击威胁,提出了一种鲁棒的中文垃圾邮件过滤模型。该模型基于多示例学习机制,并结合中文分词和特征选择方法,将一封邮件转化为若干示例的组合,然后应用多示例逻辑回归模型进行学习和分类。对多示例学习而言,当一封邮件中至少有一个示例为垃圾信息时,该邮件为垃圾邮件,否则为正常邮件。分别对训练数据集和测试数据集进行好词攻击,在多个大规模中文垃圾邮件过滤公开数据库上进行了测试。实验结果表明,在中文邮件过滤领域对抗好词攻击,分类器使用多示例反击策略较之于单示例反击策略有更强的鲁棒性。
关键词：	中文垃圾邮件过滤敌手学习多示例学习逻辑回归好词攻击鲁棒性
Chinese spam filtering model for combating good word attacks

Deng Wei,Qin Zhiguang,Liu Qiao,Chen Hongrong.Chinese spam filtering model for combating good word attacks[J].Journal of Electronic Measurement and Instrument,2010,24(12):1146-1152.

Authors:	Deng Wei Qin Zhiguang Liu Qiao Chen Hongrong

Affiliation:	Deng Wei Qin Zhiguang Liu Qiao Chen Hongrong (School of Computer Science and Engineering,University of Electronic Science and Technology of China,Chengdu 611731,China)

Abstract:	To combat good word attacks in the field of Chinese spam filtering,a robust Chinese spam filtering model is proposed in this paper.This model is based on multiple instances learning mechanism and use Chinese word segmentation and feature selection methods to transform an email into a bag of multiple instances.Subsequently it ap-plies multiple instances logistic regression model on the bags.According to multiple instances learning method,an email is classified as spam if at least one instance in the corresponding bag is spam,and as legitimate if all the instances in it are legitimate.Considering good word attacks on training dataset and testing dataset,the performances of our model are evaluated on several large Chinese spam corpora.The experiment results show that a classifier using our multiple instance counterattack strategy is more robust than its single instance counterpart to good word attacks in Chi-nese spam filtering domain.

Keywords:	Chinese spam filtering adversarial learning multiple instance learning logistic regression good word attacks robustness
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏