首页 | 本学科首页   官方微博 | 高级检索  
     

最大熵模型在邮件分类中的应用
引用本文:李军辉,李培峰,朱巧明,钱培德. 最大熵模型在邮件分类中的应用[J]. 计算机工程与应用, 2007, 43(35): 126-129
作者姓名:李军辉  李培峰  朱巧明  钱培德
作者单位:苏州大学,计算机科学与技术学院,江苏,苏州,215006;苏州大学,计算机科学与技术学院,江苏,苏州,215006;苏州大学,计算机科学与技术学院,江苏,苏州,215006;苏州大学,计算机科学与技术学院,江苏,苏州,215006
基金项目:江苏省高技术研究发展计划项目
摘    要:邮件分类是指在给定的分类体系下,根据邮件的内容和属性,确定其类别标签的过程。将最大熵模型应用于邮件分类中,给出了邮件的预处理过程,介绍了邮件信头特征,分析比较了特征数量和迭代次数、邮件特征字段对分类结果的影响,以及对层次分类和平面分类的效果进行了比较。实验表明,特征数量和迭代次数分别取2 000和250时为宜;充分利用邮件各字段信息,取得的总体分类效果最好,但对合法邮件,利用邮件头及邮件标题却取得了最好结果,并在层次分类中验证了这点,层次分类效果要优于平面分类。最后进行了总结和展望。

关 键 词:最大熵模型  邮件分类  特征  层次分类
文章编号:1002-8331(2007)35-0126-04
修稿时间:2007-05-01

Email categorization with maximum entropy model
LI Jun-hui,LI Pei-feng,ZHU Qiao-ming,QIAN Pei-de. Email categorization with maximum entropy model[J]. Computer Engineering and Applications, 2007, 43(35): 126-129
Authors:LI Jun-hui  LI Pei-feng  ZHU Qiao-ming  QIAN Pei-de
Affiliation:School of Computer Science and Technology,Suzhou University,Suzhou,Jiangsu 215006,China
Abstract:Email categorization assigns new emails to pre-defined categories based on their contents and properties.In this paper,the maximum entropy model is applied to email categorization.The pre-process of email is discussed firstly,and features extracted from email header are presented.No only the effects of categorization performance coursed by Email feature fields,the numbers of features and iteration are presented and discussed,but also the performance of hierarchy categorization and direct categorization is compared.The results of experiments show that the appropriate numbers of features and iteration are 2 000,250 respectively and utilizing all email fields gets the best whole performance than others,and that the performance of legitimate by using email header and subject is best though the whole performance is worst,which is also verified in the hierarchy categorization experiments.The results also illuminate that the effect of hierarchy categorization is better than that of direct categorization.The summarization and future work are presented in the end.
Keywords:maximum entropy model  e-mail classification  feature  hierarchy categorization
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号