首页 | 本学科首页   官方微博 | 高级检索  
     

基于概念向量空间模型的电子邮件分类
引用本文:曾超,吕钊,顾君忠.基于概念向量空间模型的电子邮件分类[J].计算机应用,2008,28(12):3248-3250.
作者姓名:曾超  吕钊  顾君忠
作者单位:华东师范大学,信息科学技术学院,上海,200241
摘    要:提出了一个基于概念向量空间模型的电子邮件分类方法。在提取电子邮件特征向量时,以WordNet语言本体库为基础,以同义词集合概念代替词条,同时考虑同义词集合间的上下位关系,从而建立电子邮件的概念向量空间模型作为电子邮件的特征向量。使用TF*IWF*IWF方法对概念向量进行权值修正,最后通过简单向量距离分类方法来确定电子邮件的类别。实验结果表明,当训练集合数目有限时,该方法能够有效提高电子邮件的分类准确率。

关 键 词:电子邮件分类  WordNet  概念向量  向量空间模型
收稿时间:2008-06-27

E-mail classification based on concept vector space model
ZENG Chao,LU Zhao,GU Jun-zhong.E-mail classification based on concept vector space model[J].journal of Computer Applications,2008,28(12):3248-3250.
Authors:ZENG Chao  LU Zhao  GU Jun-zhong
Affiliation:ZENG Chao,LU Zhao,GU Jun-zhongInstitute of Computer Applications,East China Normal University,Shanghai 200241,China
Abstract:A new approach of e-mail classification based on the concept vector space model was proposed. In this approach, the eigenvector of the e-mail was extracted during training process by replacing terms with synonymy sets in WordNet and considering hypernymy-hyponymy relation between synonymy sets. Then, TF * IWF * IWF method was used to revise the weight of the concept vector. In the end, the type of e-mail was determined using the simple vector classification method. Compared with the term-based VSM approach, the results show that this approach can improve the accuracy of e-mail classification especially when the size of training set is small.
Keywords:WordNet
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号