首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
林冬茂 《计算机仿真》2012,29(2):120-123
研究垃圾邮件检测准确性问题,提高网络安全。邮件特征具有高维、冗余量大,传统检测模型无法降低特征维数,冗余信息难以消除,导致计算时间长,空间复杂度大,垃圾检测正确率低等缺陷,为提高垃圾检测正确率,提出一种白名单和支持向量机相结合的两层垃圾邮件检测模型。采用聚类特征技术对特征进行聚类,降低特征维数,消除特征间冗余信息,将白名单检测技术作为垃圾检测系统第一道防线,检测已知地址垃圾邮件,支持向量机作为第二道防线,检测新的垃圾邮件,提高网络安全。采用垃圾邮件数据对模型性能进行检验,实验结果表明,两层垃圾邮件检测模型有效提高了垃圾邮件检测效率和正确率,为通信邮件管理提供了有效的手段。  相似文献   

2.
王青松  魏如玉 《计算机科学》2016,43(4):256-259, 269
朴素贝叶斯算法在垃圾邮件过滤领域得到了广泛应用,该算法中,特征提取是一个必不可少的环节。过去针对中文的垃圾邮件过滤方法都以词作为文本的特征项单位进行提取,面对大规模的邮件训练样本,这种算法的时间效率会成为邮件过滤技术中的一个瓶颈。对此,提出一种基于短语的贝叶斯中文垃圾邮件过滤方法,在特征项提取阶段结合文本分类领域提出的新的短语分析方法,按照基本名词短语、基本动词短语、基本语义分析规则,以短语为单位进行提取。通过分别以词和短语为单位进行垃圾邮件过滤的对比测试实验证实了所提出方法的有效性。  相似文献   

3.
垃圾邮件处理中LDA特征选择方法   总被引:1,自引:0,他引:1       下载免费PDF全文
垃圾邮件处理是一项长期研究课题,越来越多的文本分类技术被移植到垃圾邮件处理应用当中。LDA(Latent Dirichlet Allocation)等topic模型在自动摘要、信息获取和其他离散数据应用中受到越来越多的关注。将LDA模型作为一种特征选择方法,引入垃圾邮件处理应用中。将LDA特征选择方法与质心+KNN分类器结合,得到简单的测试用垃圾邮件过滤器。初步实验结果表明,基于LDA的特征选择方法优于通常的IG、MI特征选择方法;测试过滤器的过滤性能与其他过滤器相当。  相似文献   

4.
基于K 近邻法及移动agent技术的垃圾邮件检测系统研究*   总被引:2,自引:1,他引:1  
为了解决日益严重的垃圾邮件问题,设计了一个新型的基于K近邻法及移动agent技术的垃圾邮件检测系统。简单介绍了K近邻法及移动agent技术,详细阐述了基于K近邻法及移动agent技术的垃圾邮件检测系统的体系结构、工作流程和关键技术。实验结果表明,与同类系统相比,该系统执行速度提高了,对网络稳定性的要求比较低,能够有效阻止垃圾邮件的传播。  相似文献   

5.
微博客作为一种新的用户信息传播载体,在网络舆情发起和传播中起着重要作用。由于用户有意(上传广告)、无意(转发)操作所带来的大量噪音微博和相似微博,对网络舆情分析和用户浏览造成极为不利的影响。检测这些噪音微博和相似微博,对微博数据进行提纯,成为一个亟待解决的问题。基于统计数据分析了噪音微博和相似微博的特点,提出一种面向微博文本流的噪音判别和内容相似性双重检测的过滤方法:通过URL链接、字符率、高频词等特征判别,过滤噪音微博;通过分段过滤和索引过滤的双重内容过滤,检测和剔除相似微博。实验表明该方法能有效地对微博数据进行提纯,高效准确地过滤掉相似微博和噪音微博。  相似文献   

6.
Highly discriminative statistical features for email classification   总被引:2,自引:2,他引:0  
This paper reports on email classification and filtering, more specifically on spam versus ham and phishing versus spam classification, based on content features. We test the validity of several novel statistical feature extraction methods. The methods rely on dimensionality reduction in order to retain the most informative and discriminative features. We successfully test our methods under two schemas. The first one is a classic classification scenario using a 10-fold cross-validation technique for several corpora, including four ground truth standard corpora: Ling-Spam, SpamAssassin, PU1, and a subset of the TREC 2007 spam corpus, and one proprietary corpus. In the second schema, we test the anticipatory properties of our extracted features and classification models with two proprietary datasets, formed by phishing and spam emails sorted by date, and with the public TREC 2007 spam corpus. The contributions of our work are an exhaustive comparison of several feature selection and extraction methods in the frame of email classification on different benchmarking corpora, and the evidence that especially the technique of biased discriminant analysis offers better discriminative features for the classification, gives stable classification results notwithstanding the amount of features chosen, and robustly retains their discriminative value over time and data setups. These findings are especially useful in a commercial setting, where short profile rules are built based on a limited number of features for filtering emails.  相似文献   

7.
Spam appears in various forms and the current trend in spamming is moving towards multimedia spam objects. Image spam is a new type of spam attacks which attempts to bypass the spam filters that mostly text-based. Spamming attacks the users in many ways and these are usually countered by having a server to filter the spammers. This paper provides a fully-distributed pattern recognition system within P2P networks using the distributed associative memory tree (DASMET) algorithm to detect spam which is cost-efficient and not prone to a single point of failure, unlike the server-based systems. This algorithm is scalable for large and frequently updated data sets, and specifically designed for data sets that consist of similar occurring patterns.We have evaluated our system against centralised state-of-the-art algorithms (NN, k-NN, naive Bayes, BPNN and RBFN) and distributed P2P-based algorithms (Ivote-DPV, ensemble k-NN, ensemble naive Bayes, and P2P-GN). The experimental results show that our method is highly accurate with a 98 to 99% accuracy rate, and incurs a small number of messages—in the best-case, it requires only two messages per recall test. In summary, our experimental results show that the DAS-MET performs best with a relatively small amount of resources for the spam detection compared to other distributed methods.  相似文献   

8.
为了更好地解决垃圾邮件的问题,提高对垃圾邮件的防御效果,本文从造成垃圾邮件的其中一个原因———子邮件目录收割攻击(DHA)入手,通过对DHA攻击原理的分析,提出基于黑名单同时以邮件地址阈值和IP地址阈值为锁定条件的防御策略,并在攻击资源有限的条件下对防御策略进行模拟测试。分析结果表明该防御策略能对DHA进行有效的防御,同时得出防御策略中的过滤阈值和锁定时间的设置是防御DHA的关键点。  相似文献   

9.
介绍现在普遍采用的几种垃圾邮件过滤方法,对基于内容的过滤方法中的贝叶斯算法和Winnow算法进行详细的介绍.目前研究中文垃圾邮件的各类文献都基于不同的语料库,缺乏算法之间的效果比较分析.分别实现贝叶斯和Winnow的改进算法,并对CCERT的一个公开邮件语料库进行测试.测试结果表明,两种算法都达到较好的过滤效果.  相似文献   

10.
高维数据中进行各种处理时所需样本数量会成指数级增加,同时样本间距离的价值也逐渐减小,将导致维数灾问题。文本标签数据通常会面临数据维数过高的问题,会影响用户对垃圾标签的检测。文中借助支持向量机的数学模型构建出针对Folksonomy的大规模垃圾标签检测模型。为了减少检测垃圾标签时维数过高的影响,在核主成分分析理论的启发下,将数据降维思想引入数据约简领域,提出基于核主成分分析法的大规模SVM数据集约简模型。最终实例化形成一种新的垃圾标签检测方法,即基于核主成分分析支持向量机( KPCA-SVM)的大规模垃圾标签检测模型。该模型在垃圾标签检测中可以在不影响数据特征的前提下,缩短模型的测试时间且检测性能良好。  相似文献   

11.
为了抵制垃圾邮件对互联网及其用户造成的严重不良影响,本文采用高效的贝叶斯分类算法,基于hadoop平台实现垃圾邮件的过滤系统,克服了传统并行系统在编程实现和系统扩展上的不足,充分利用云计算环境优势,使系统实现简单,扩展容易,性能提高;并做了相关的试验,验证了设计理论。  相似文献   

12.
分类问题是机器学习与数据挖掘研究中最重要的问题之一,其中文本自动分类是信息检索与数据挖掘领域的研究热点与核心技术,近年来得到了广泛的关注和快速的发展。设计了一种基于贝叶斯概率推理方法的垃圾邮件过滤系统。它用概率测试的权重来描述数据间的相关性,从而解决了数据间的不一致性,甚至是相互独立的问题。作为互联网的第一大应用,电子邮件一直受到广大网民的青睐,但近些年来,垃圾邮件问题日益严重。将上述研究的结果应用到目前互联网上垃圾邮件的过滤工作中,实验证明了方法的有效性。  相似文献   

13.
Web spam uses numerous techniques to misguide Web search engines in exchange of financial profit. A myriad of semi-automatic propagation model has been proposed with the purpose of combating Web spam. In this paper, distrust propagation is used to detect Web spam. An automatic distrust seed set propagation algorithm (DSP), which acts as an extension to the seed set to propagate distrust further to detect more Web spam. Experiments are conducted on WEBSPAM-UK2006 and WEBSPAM-UK2007 dataset; the results have shown that DSP enhanced the baseline algorithms and detected 17.73 % more spam hosts in the former dataset and detected 8.59 % more spam hosts in later dataset.  相似文献   

14.
Artificial immune system inspired behavior-based anti-spam filter   总被引:2,自引:1,他引:1  
This paper proposes a novel behavior-based anti-spam technology for email service based on an artificial immune-inspired clustering algorithm. The suggested method is capable of continuously delivering the most relevant spam emails from the collection of all spam emails that are reported by the members of the network. Mail servers could implement the anti-spam technology by using the “black lists” that have been already recognized. Two main concepts are introduced, which defines the behavior-based characteristics of spam and to continuously identify the similar groups of spam when processing the spam streams. Experiment results using real-world datasets reveal that the proposed technology is reliable, efficient and scalable. Since no single technology can achieve one hundred percent spam detection with zero false positives, the proposed method may be used in conjunction with other filtering systems to minimize errors.  相似文献   

15.
电子邮件作为互联网技术发展的产物,在给全球网民带来通讯便利的同时,正不可避免地遭遇有悖初衷的运用。最为突出的是随之产生的垃圾邮件像瘟疫一样蔓延,污染网络环境,占用大量传输、存储和运算资源,影响了网络的正常运行。垃圾邮件问题日益严重,受到研究人员的广泛关注。基于内容的过滤是当前解决垃圾邮件问题的主流技术之一。由于常用的特征字串匹配技术对垃圾邮件件的查准率已经不能满足日益提高的过滤系统用户的产品需求,随后引入邻近类别分类的方法,利用基于贝叶斯算法的电子邮件过滤系统,对色情垃圾邮件样本进行分析,可明显提高对垃圾邮件的查准率。  相似文献   

16.
高性能中文垃圾邮件过滤器   总被引:2,自引:0,他引:2  
设计并实现了基于在线过滤模式高性能中文垃圾邮件过滤器,能够较好地识别不断变化的垃圾邮件。以逻辑回归模型为基础,该文提出了字节级n元文法提取邮件特征,并采用TONE(Train On or Near Error)方法训练过滤器。在多个大规模中文垃圾邮件过滤公开评测数据上的实验结果表明,该文过滤器的性能在TREC 06C数据上优于当年评测的最好成绩,在SEWM 07立即反馈上1-ROCA值达到了0.000 0%,并明显优于SEWM 08评测在线过滤任务中的所有其他方法。  相似文献   

17.
In their arms race against developers of spam filters, spammers have recently introduced the image spam trick to make the analysis of emails’ body text ineffective. It consists in embedding the spam message into an attached image, which is often randomly modified to evade signature-based detection, and obfuscated to prevent text recognition by OCR tools. Detecting image spam turns out to be an interesting instance of the problem of content-based filtering of multimedia data in adversarial environments, which is gaining increasing relevance in several applications and media. In this paper we give a comprehensive survey and categorisation of computer vision and pattern recognition techniques proposed so far against image spam, and make an experimental analysis and comparison of some of them on real, publicly available data sets.  相似文献   

18.
In this paper we present a detailed study of the behavioral characteristics of spammers based on a two-month email trace collected at a large US university campus network. We analyze the behavioral characteristics of spammers that are critical to spam control, including the distributions of message senders, spam and non-spam messages by spam ratios; the statistics of spam messages from different spammers; the spam arrival patterns across the IP address space; and the active duration of spammers, among others. In addition, we also formally confirm an informal observation that spammers may hijack network prefixes in sending spam messages, by correlating the arrivals of spam messages with the BGP route updates of the corresponding networks. In this paper we present the detailed results of the measurement study; in addition, we also discuss the implications of the findings for the (content-independent) anti-spam efforts.  相似文献   

19.
邓维斌  洪智勇 《计算机应用》2010,30(8):2006-2009
如何将邮件的头信息和内容信息有效结合起来进行垃圾邮件过滤备受研究人员的关注。基于粗糙集具有很好地处理不确定信息的特点,提出了一种基于粗糙集的两阶段邮件过滤方法,首先根据邮件头信息将其分为正常邮件、垃圾邮件和可疑邮件,再根据邮件内容将可疑邮件分为正常和垃圾邮件。通过在中英文邮件集上的测试实验,证明了所提出的邮件过滤方法不仅能提高垃圾邮件过滤的准确率,而且能大幅降低误杀率。  相似文献   

20.
As the problem of spam email increases, we examined users’ attitudes toward and experience with spam as a function of gender and age. College-age, working-age, and retirement-age men and women were surveyed. Most respondents strongly disliked receiving spam yet took few actions against it. There were fewer gender differences than predicted, but age was a significant predictor of several responses. Retirement age men rated themselves as significantly lower in expertise than did working age men, and the oldest and youngest age groups took fewer actions against spam, used the computer less often, and spent fewer hours online than did the working age respondents. Older respondents were more likely than younger ones to report making a purchase as a result of a spam email and received the same amount of spam as other age groups in spite of lower overall use of the computer. The results suggest both that older computer users may be more vulnerable to spam, and that the usability of email for all users may be threatened by the inability of users to effectively take action against spam.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号