首页 | 本学科首页   官方微博 | 高级检索  
     

一种电子邮件特征信息的压缩算法
引用本文:何小卫,张瑾烽.一种电子邮件特征信息的压缩算法[J].计算机与现代化,2006(1):13-15,33.
作者姓名:何小卫  张瑾烽
作者单位:1. 浙江师范大学信息科学与工程学院,浙江,金华,321004
2. 杭州市第十四中学,浙江,杭州,310006
摘    要:主成分分析算法是数据分析的重要方法之一,它通过构造原变量的一系列线性组合,使各线性组合在彼此不相关的前提下尽可能多地反映原变量的信息。针对目前垃圾邮件处理中存在的不足,本文使用主成分分析方法对大量的垃圾邮件样本进行分析,统计出在垃圾邮件中普遍存在的词语和它们对垃圾邮件的贡献率,作为判断未知邮件是否为垃圾邮件的过程中的降维依据;以此压缩邮件信息,得到含信息量大而维数低的向量。

关 键 词:主成分  贡献率  特征值  协方差矩阵  相关矩阵
文章编号:1006-2475(2006)01-0013-03
收稿时间:2005-08-12
修稿时间:2005-08-12

A Condensation Arithmetic of Character Information of E-mail
HE Xiao-wei,ZHANG Jin-feng.A Condensation Arithmetic of Character Information of E-mail[J].Computer and Modernization,2006(1):13-15,33.
Authors:HE Xiao-wei  ZHANG Jin-feng
Affiliation:1. School of Information Science and Engineering, Zhejiang Normal University,Jinhua 321004,China; 2.Hangzhou Fourteenth Middle School,Hangzhou 310006,China
Abstract:Principal components analysis is one of the most important methods in data analysis. It constructs a series of linear compounds of former variables and makes each compound reflect the information of former variables as more as possible on the condition of being independent of each other. Aimming at the lack existing in the disposal of garbage E-mail nowdays, this paper uses principal components analysis to analyze lots of garbage E-mail samples, in order to obtain the common words in garbage E-mails and their contribution rates. These are the gists of condensation during judging whether the unknown E-mail is a garbage E-mail or not. Based on this gist, the method compresses E-mail information and then gets vectors with more information and less dimension.
Keywords:principal components  contribution rate  eigenvalue  covariance matrix  correlation matrix
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号