首页 | 本学科首页   官方微博 | 高级检索  
     

基于改进向量空间模型的邮件分类
引用本文:廖玲,文敦伟.基于改进向量空间模型的邮件分类[J].计算机与数字工程,2007,35(4):190-193.
作者姓名:廖玲  文敦伟
作者单位:[1]中南大学信息科学与工程学院,长沙410083 [2]阿萨巴斯卡大学计算机与信息系统学院,阿萨巴斯卡T9S3A3加拿大
摘    要:基于内容的邮件分类一般采用向量空间模型来表示邮件,该模型只是基于独立词在邮件内容中出现的频率来建立的,而并未考虑邮件的结构特征和词所在的上下文环境,这使得特征向量不能准确地表示邮件的内容,从而导致分类不够准确。文中提出了改进的向量空间模型,针对邮件特有的结构,以段落为分块单位,通过分析段落间的关系和段落中的内容来更改特征词的权重。以此模型设计了一个邮件分类系统,并对该系统进行了测试和结果分析。

关 键 词:向量空间模型  邮件分类  段落结构
修稿时间:2006年8月7日

Email Classification Based on Text Structure
Lian Ling,Wen Dunwei.Email Classification Based on Text Structure[J].Computer and Digital Engineering,2007,35(4):190-193.
Authors:Lian Ling  Wen Dunwei
Abstract:Content-based email classification often uses the Vector Space Model(VSM) as a tool to represent emails.This model is based on the frequencies of the words each of which is independent from each other;it ignores the structure of emails and the environment around the words.As a result,the vectors cannot fully describe emails exactly,which affects the precision of email categorization.In this paper,a method that modifies the basic VSM is proposed.According to email structure,the modified VSM takes paragraphs as units and modifies the weights of the features by analyzing the relationship between the paragraphs as well as the terms in the paragraphs.An experimental system is designed and the result is given and analyzed at last.
Keywords:Vector space model  Email classification  Structure of paragraphs
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号