共查询到19条相似文献,搜索用时 140 毫秒
1.
针对邮件所含信息的模糊性和合法邮件与垃圾邮件错分代价的不对称性提出了基于双隶属度模糊支持向量机的邮件过滤方法,通过对每个样本赋予不同的双隶属度,得到最优分类器,提高了邮件过滤的正确率。经仿真实验证明,该方法能够有效降低将合法邮件误判为垃圾邮件,而且有很高的正确率等特点。 相似文献
2.
研究垃圾邮件过滤准确率问题,电子邮件是一种高维、复杂的特殊文本,单一支持向量机、K近邻等传统模型均难以识别垃圾邮件,导致过滤正确率低.为了提高了垃圾邮件过滤正确率,提出一种K近邻和支持向量机相融合的垃圾邮件过滤模型(SVM-KNN).首先将邮件特征向量输入到支持向量机学习,找到支持向量集,然后计算待识别邮件与最优超平面间的距离,距离大于阈值,便采用支持向量机识别邮件类型,否则用K近邻识别邮件类型.仿真结果表明,SVM-KNN很好地解决单一模型存在的难题,提高了垃圾邮件过滤正确率,是一种有效的电子邮件管理的手段. 相似文献
3.
电子邮件已经成为了人们日常生活中不可缺少的通讯方式,然而垃圾邮件的泛滥给计算机网络安全带来威胁并给人们正常的信息交流带来了极大的不便,因此反垃圾邮件日益重要。支持向量机是在统计学习理论的基础上发展起来的新型机器学习算法,在解决小样本学习、非线性及高维模式识别问题中表现较好。因此采用支持向量机对垃圾邮件进行过滤,首先将文本类型的邮件进行预处理,提取合适的邮件特征,把邮件转化成向量空间模型,最后用支持向量机方法进行分类。实验表明支持向量机提高了过滤性能。 相似文献
4.
5.
提出了一种基于支持向量机的改进分类方法.该方法将特征空间分类超平面附近的样本分类,交由特征空间和样本空间中的K-近邻集体投票表决。方法应用于垃圾邮件的过滤之中,邮件合法性误判发生的概率可被有效降低。最后通过垃圾邮件过滤实例验证了该方法的有效性。 相似文献
6.
7.
垃圾邮件对计算机系统的安全和人们的生活造成了严重的威胁,反垃圾邮件问题已经成为的具有重要现实意义的研究课题.针对垃圾邮件过滤本质是分类问题,提出了一种基于服务器前端的反垃圾邮件过滤方法,它采用了改进的v支持向量机算法对邮件内容进行分类,过滤垃圾邮件.研究结果表明该方法与直接的支持向量机增量算法相比,提高了过滤的准确率,具有一定的应用价值. 相似文献
8.
一种基于SVM算法的垃圾邮件过滤方法 总被引:4,自引:1,他引:3
基于邮件内容的过滤是当前解决垃圾邮件问题的主流技术之一。针对垃圾邮件过滤本质是分类问题,提出了一种基于支持向量机对垃圾邮件过滤的方法,并且将SMO分类算法结合到垃圾邮件分类中。通过实验,SMO算法能够取得较好的分类效果,缩短了支持向量机分类器的分类时间。 相似文献
9.
10.
11.
该文提出一种多层grams特征抽取方法来提升基于在线支持向量模型的垃圾邮件过滤器。基于在线支持向量机模型的垃圾邮件过滤器在大规模垃圾邮件数据集已取得了很好的过滤效果,但与逻辑回归模型相比,计算性能的耗时是巨大的,很难被工业界所运用。该文提出的多层grams特征抽取方法能够有效减少特征数,抽取更精准有效的特征,大幅降低模型的运行时间,同时提升过滤器的过滤效果。实验表明,该方法使得在线支持向量机模型的运行时间从10337s减少到3784s,同时模型(1-ROCA)%降低了一半。 相似文献
12.
针对电子邮件应用中垃圾邮件危害日益严重的问题,基于机器学习的垃圾邮件过滤方法正成为当前互联网应用领域的研究热点之一.通过对现有基于机器学习的垃圾邮件处理方法的分析研究,并结合中文信息处理的特点,提出一种基于支持向量机SVM(Support Vector Machine)的中文垃圾邮件过滤方法并加以设计实现.实验表明,在有限样本情况下,基于SVM的中文垃圾邮件过滤方法具有较高的准确性和稳定性. 相似文献
13.
14.
15.
The annoyance of spam emails increasingly plagues both individuals and organizations. In response, most of prior research investigates spam filtering as a classical text categorization task, in which training examples must include both spam (positive examples) and legitimate (negative examples) emails. However, in many spam filtering scenarios, obtaining legitimate emails for training purpose can be more difficult than collecting spam and unclassified emails. Hence, it is more appropriate to construct a classification model for spam filtering that uses positive training examples (i.e., spam) and unlabeled instances only and does not require legitimate emails as negative training examples. Several single-class learning techniques, such as PNB and PEBL, have been proposed in the literature. However, they incur inherent limitations with regard to spam filtering. In this study, we propose and develop an ensemble approach, referred to as E2, to address these limitations. Specifically, we follow the two-stage framework of PEBL but extend each stage with an ensemble strategy. The empirical evaluation results from two spam filtering corpora suggest that our proposed E2 technique generally outperforms benchmark techniques (i.e., PNB and PEBL) and exhibits more stable performance than its counterparts. 相似文献
16.
《浙江大学学报:C卷英文版》2012,(3):187-195
This paper addresses the challenge of large margin classification for spam filtering in the presence of an adversary who disguises the spam mails to avoid being detected. In practice, the adversary may strategically add good words indicative of a legitimate message or remove bad words indicative of spam. We assume that the adversary could afiord to modify a spam message only to a certain extent, without damaging its utility for the spammer. Under this assumption, we present a large margin approach for classification of spam messages that may be disguised. The proposed classifier is formulated as a second-order cone programming optimization. We performed a group of experiments using the TREC 2006 Spam Corpus. Results showed that the performance of the standard support vector machine (SVM) degrades rapidly when more words are injected or removed by the adversary, while the proposed approach is more stable under the disguise attack. 相似文献
17.
应用于垃圾邮件过滤的词序列核 总被引:1,自引:0,他引:1
针对支持向量机(SVM)中常用核函数由于忽略文本结构而导致大量语义信息丢失的现象,提出一种类别相关度量的词序列核(WSK),并将其应用于垃圾邮件过滤。首先提取邮件文本特征并计算特征的类别相关度量,然后利用词序列核作为核函数训练支持向量机,训练过程中利用类别相关度量计算词的衰减系数,最后对邮件进行分类。实验结果表明,与常用核函数和字符串核相比,改进的词序列核分类准确率更高,提高了垃圾邮件过滤的准确率。 相似文献
18.
Electronic mail is a major revolution taking place over traditional communication systems due to its convenient, economical,
fast, and easy to use nature. A major bottleneck in electronic communications is the enormous dissemination of unwanted, harmful
emails known as spam emails. A major concern is the developing of suitable filters that can adequately capture those emails and achieve high performance
rate. Machine learning (ML) researchers have developed many approaches in order to tackle this problem. Within the context
of machine learning, support vector machines (SVM) have made a large contribution to the development of spam email filtering.
Based on SVM, different schemes have been proposed through text classification approaches (TC). A crucial problem when using
SVM is the choice of kernels as they directly affect the separation of emails in the feature space. This paper presents thorough
investigation of several distance-based kernels and specify spam filtering behaviors using SVM. The majority of used kernels
in recent studies concern continuous data and neglect the structure of the text. In contrast to classical kernels, we propose
the use of various string kernels for spam filtering. We show how effectively string kernels suit spam filtering problem.
On the other hand, data preprocessing is a vital part of text classification where the objective is to generate feature vectors
usable by SVM kernels. We detail a feature mapping variants in TC that yield improved performance for the standard SVM in
filtering task. Furthermore, to cope for realtime scenarios we propose an online active framework for spam filtering. We present
empirical results from an extensive study of online, transductive, and online active methods for classifying spam emails in
real time. We show that active online method using string kernels achieves higher precision and recall rates. 相似文献
19.
研究了基于SVM算法的改进朴素贝叶斯文本分类算法及在垃圾短信过滤中的应用。针对朴素贝叶斯算法条件独立性假设、过分依赖于样本空间的分布和内在不稳定性的缺陷,造成了算法时间复杂度的增加,提出了改进的基于SVM算法的朴素贝叶斯算法垃圾短信过滤的解决方案,充分结合了朴素贝叶斯算法高效分类和SVM算法增量学习及不依赖样本空间的特点;首先利用结构风险最小化原理和非线性变换将分类问题转化为二次寻优问题,最后利用朴素贝叶斯算法过滤短信,提高分类的准确度和稳定性;仿真实验结果表明,该算法能够快速得到最优分类特征子集,有效提高了垃圾短信过滤的准确率和分类速度。 相似文献