基于小样本学习的垃圾邮件过滤方法 Spam Filtering Method Based on Learning from Small Samples期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于小样本学习的垃圾邮件过滤方法

引用本文：	潘洁珠,周晓,吴共庆,胡学钢.基于小样本学习的垃圾邮件过滤方法[J].计算机工程,2010,36(21):245-247.

作者姓名：	潘洁珠周晓吴共庆胡学钢

作者单位：	(1. 合肥师范学院计算机科学与技术系，合肥 230061；2. 合肥工业大学计算机与信息学院，合肥 230009)

基金项目：	国家"973"计划基金资助项目，国家自然科学基金资助项目，安徽高等学校省级自然科学研究基金资助项目

摘要：	针对客户端垃圾邮件过滤器难以获取足够训练样本的问题，提出一种基于小样本学习的垃圾邮件过滤方法，利用容易获取的未标记样本提高垃圾邮件过滤的性能。该方法使用已标记的小样本邮件实例集训练一个初始Na?ve Bayes分类器，以此标注未标记邮件，再使用所有数据训练新的分类器，利用EM算法进行迭代直至收敛。实验结果证明，当给定5个~20个已标记小样本训练邮件时，该方法可有效提高垃圾邮件过滤性能。
关键词：	小样本学习 EM算法未标记数据垃圾邮件过滤
Spam Filtering Method Based on Learning from Small Samples

PAN Jie-zhu,ZHOU Xiao,WU Gong-qing,HU Xue-gang.Spam Filtering Method Based on Learning from Small Samples[J].Computer Engineering,2010,36(21):245-247.

Authors:	PAN Jie-zhu ZHOU Xiao WU Gong-qing HU Xue-gang

Affiliation:	(1. Department of Computer Science and Technology, Hefei Normal University, Hefei 230061, China; 2. School of Computer and Information, Hefei University of Technology, Hefei 230009, China)

Abstract:	It is difficult to collect sufficient labeled E-mails for training a client spam classifier. Aiming at the problem, this paper proposes a spam filtering method based on learning from small samples, which improves the filtering performance with unlabeled samples. An initial Naive Bayes（NB） classifier is trained with a dataset of labeled E-mails, and unlabeled E-mails are probabilistically labeled with it. A new classifier is trained with all E-mails, and iterates to convergence with EM algorithm. Experimental results prove that, given labeled small training samples with a size of 5 to 20, the performance of spam filtering can be effectively improved.

Keywords:	learning from small samples EM algorithm unlabeled data spam filtering
本文献已被维普万方数据等数据库收录！
	点击此处可从《计算机工程》浏览原始摘要信息
	点击此处可从《计算机工程》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏