共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
Jianying Zhou Wee-Yung Chin Rodrigo Roman Javier Lopez 《Information Security Technical Report》2007,12(3):179-185
Spam is a big problem for email users. The battle between spamming and anti-spamming technologies has been going on for many years. Though many advanced anti-spamming technologies are progressing significantly, spam is still able to bombard many email users. The problem worsens when some anti-spamming methods unintentionally filtered legitimate emails instead! In this paper, we first review existing anti-spam technologies, then propose a layered defense framework using a combination of anti-spamming methods. Under this framework, the server-level defense is targeted for common spam while the client-level defense further filters specific spam for individual users. This layered structure improves on filtering accuracy and yet reduces the number of false positives. A sub-system using our pre-challenge method is implemented as an add-on in Microsoft Outlook 2002. In addition, we extend our client-based pre-challenge method to a domain-based solution thus further reducing the individual email users' overheads. 相似文献
3.
4.
Email classification and prioritization expert systems have the potential to automatically group emails and users as communities based on their communication patterns, which is one of the most tedious tasks. The exchange of emails among users along with the time and content information determine the pattern of communication. The intelligent systems extract these patterns from an email corpus of single or all users and are limited to statistical analysis. However, the email information revealed in those methods is either constricted or widespread, i.e. single or all users respectively, which limits the usability of the resultant communities. In contrast to extreme views of the email information, we relax the aforementioned restrictions by considering a subset of all users as multi-user information in an incremental way to extend the personalization concept. Accordingly, we propose a multi-user personalized email community detection method to discover the groupings of email users based on their structural and semantic intimacy. We construct a social graph using multi-user personalized emails. Subsequently, the social graph is uniquely leveraged with expedient attributes, such as semantics, to identify user communities through collaborative similarity measure. The multi-user personalized communities, which are evaluated through different quality measures, enable the email systems to filter spam or malicious emails and suggest contacts while composing emails. The experimental results over two randomly selected users from email network, as constrained information, unveil partial interaction among 80% email users with 14% search space reduction where we notice 25% improvement in the clustering coefficient. 相似文献
5.
Juan Carlos Gomez Erik Boiy Marie-Francine Moens 《Knowledge and Information Systems》2012,31(1):23-53
This paper reports on email classification and filtering, more specifically on spam versus ham and phishing versus spam classification,
based on content features. We test the validity of several novel statistical feature extraction methods. The methods rely
on dimensionality reduction in order to retain the most informative and discriminative features. We successfully test our
methods under two schemas. The first one is a classic classification scenario using a 10-fold cross-validation technique for
several corpora, including four ground truth standard corpora: Ling-Spam, SpamAssassin, PU1, and a subset of the TREC 2007
spam corpus, and one proprietary corpus. In the second schema, we test the anticipatory properties of our extracted features
and classification models with two proprietary datasets, formed by phishing and spam emails sorted by date, and with the public
TREC 2007 spam corpus. The contributions of our work are an exhaustive comparison of several feature selection and extraction
methods in the frame of email classification on different benchmarking corpora, and the evidence that especially the technique
of biased discriminant analysis offers better discriminative features for the classification, gives stable classification
results notwithstanding the amount of features chosen, and robustly retains their discriminative value over time and data
setups. These findings are especially useful in a commercial setting, where short profile rules are built based on a limited
number of features for filtering emails. 相似文献
6.
垃圾邮件的处理是电子邮件服务中非常重要的功能,该文在对标准邮件集表示为向量空间模型,降维处理处理工作的基础上,运用神经网络集成的方法来构造邮件分类器,对邮件进行过滤;该方法在垃圾邮件语料库上进行了实验,实验证明该方法对于垃圾邮件的过滤有较好的效果。 相似文献
7.
8.
Electronic mail is a major revolution taking place over traditional communication systems due to its convenient, economical,
fast, and easy to use nature. A major bottleneck in electronic communications is the enormous dissemination of unwanted, harmful
emails known as spam emails. A major concern is the developing of suitable filters that can adequately capture those emails and achieve high performance
rate. Machine learning (ML) researchers have developed many approaches in order to tackle this problem. Within the context
of machine learning, support vector machines (SVM) have made a large contribution to the development of spam email filtering.
Based on SVM, different schemes have been proposed through text classification approaches (TC). A crucial problem when using
SVM is the choice of kernels as they directly affect the separation of emails in the feature space. This paper presents thorough
investigation of several distance-based kernels and specify spam filtering behaviors using SVM. The majority of used kernels
in recent studies concern continuous data and neglect the structure of the text. In contrast to classical kernels, we propose
the use of various string kernels for spam filtering. We show how effectively string kernels suit spam filtering problem.
On the other hand, data preprocessing is a vital part of text classification where the objective is to generate feature vectors
usable by SVM kernels. We detail a feature mapping variants in TC that yield improved performance for the standard SVM in
filtering task. Furthermore, to cope for realtime scenarios we propose an online active framework for spam filtering. We present
empirical results from an extensive study of online, transductive, and online active methods for classifying spam emails in
real time. We show that active online method using string kernels achieves higher precision and recall rates. 相似文献
9.
A comparative study for content-based dynamic spam classification using four machine learning algorithms 总被引:1,自引:0,他引:1
The growth of email users has resulted in the dramatic increasing of the spam emails during the past few years. In this paper, four machine learning algorithms, which are Naïve Bayesian (NB), neural network (NN), support vector machine (SVM) and relevance vector machine (RVM), are proposed for spam classification. An empirical evaluation for them on the benchmark spam filtering corpora is presented. The experiments are performed based on different training set size and extracted feature size. Experimental results show that NN classifier is unsuitable for using alone as a spam rejection tool. Generally, the performances of SVM and RVM classifiers are obviously superior to NB classifier. Compared with SVM, RVM is shown to provide the similar classification result with less relevance vectors and much faster testing time. Despite the slower learning procedure, RVM is more suitable than SVM for spam classification in terms of the applications that require low complexity. 相似文献
10.
The annoyance of spam emails increasingly plagues both individuals and organizations. In response, most of prior research investigates spam filtering as a classical text categorization task, in which training examples must include both spam (positive examples) and legitimate (negative examples) emails. However, in many spam filtering scenarios, obtaining legitimate emails for training purpose can be more difficult than collecting spam and unclassified emails. Hence, it is more appropriate to construct a classification model for spam filtering that uses positive training examples (i.e., spam) and unlabeled instances only and does not require legitimate emails as negative training examples. Several single-class learning techniques, such as PNB and PEBL, have been proposed in the literature. However, they incur inherent limitations with regard to spam filtering. In this study, we propose and develop an ensemble approach, referred to as E2, to address these limitations. Specifically, we follow the two-stage framework of PEBL but extend each stage with an ensemble strategy. The empirical evaluation results from two spam filtering corpora suggest that our proposed E2 technique generally outperforms benchmark techniques (i.e., PNB and PEBL) and exhibits more stable performance than its counterparts. 相似文献
11.
The Internet has been flooded with spam emails, and during the last decade there has been an increasing demand for reliable anti-spam email filters. The problem of filtering emails can be considered as a classification problem in the field of supervised learning. Theoretically, many mature technologies, for example, support vector machines (SVM), can be used to solve this problem. However, in real enterprise applications, the training data are typically collected via honeypots and thus are always of huge amounts and highly biased towards spam emails. This challenges both efficiency and effectiveness of conventional technologies. In this article, we propose an undersampling method to compress and balance the training set used for the conventional SVM classifier with minimal information loss. The key observation is that we can make a trade-off between training set size and information loss by carefully defining a similarity measure between data samples. Our experiments show that the SVM classifier provides a better performance by applying our compressing and balancing approach. 相似文献
12.
A new technique for managing and disseminating Web-based email prefetches messages and generates dynamic pages, displaying them at the network edge. Compared to other popular Web-based email servers, the prefetching and caching emails (PACE) prototype shows an improved performance with respect to user-perceived latency. Additionally, PACE'S centralized neural-network-based personalized spam filter will filter spam and viruses at the server's origin, thus saving bandwidth. Another major concern for users is the email accounts being clogged with spam. Spam filters can be classified as server-side or client-side. Server-side filters are integrated with email servers and filter out spam at the server end. 相似文献
13.
Xun Yue Ajith Abraham Zhong-Xian Chi Yan-You Hao Hongwei Mo 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2007,11(8):729-740
This paper proposes a novel behavior-based anti-spam technology for email service based on an artificial immune-inspired clustering
algorithm. The suggested method is capable of continuously delivering the most relevant spam emails from the collection of
all spam emails that are reported by the members of the network. Mail servers could implement the anti-spam technology by
using the “black lists” that have been already recognized. Two main concepts are introduced, which defines the behavior-based
characteristics of spam and to continuously identify the similar groups of spam when processing the spam streams. Experiment
results using real-world datasets reveal that the proposed technology is reliable, efficient and scalable. Since no single
technology can achieve one hundred percent spam detection with zero false positives, the proposed method may be used in conjunction
with other filtering systems to minimize errors. 相似文献
14.
An Operable Email Based Intelligent Personal Assistant 总被引:1,自引:0,他引:1
The recent phenomena of email-function-overloading and email-centricness in daily life and business have created new problems
to users. There is a practical need for developing a software assistant to facilitate the management of personal and organizational
emails, and to enable users to complete their email-centric jobs or tasks smoothly. This paper presents the status, goals,
and key technical elements of an Email-Centric Intelligent Personal Assistant, called ECIPA. ECIPA provides various assisting
functions, including automated and cost-sensitive spam filtering based on corresponding analysis, ontology-mediated email
classification, query and archiving. ECIPA can learn from dynamic user behaviors to effectively sort and automatically respond
email. Techniques developed in Web Intelligence (WI) are adopted to implement ECIPA. In order to facilitate cooperation of
ECIPAs of different users, the concept of operable email, an extension of traditional email with an operable form, is introduced. ECIPA can in fact be viewed as a family of collaborative agents working together on the operable email. 相似文献
15.
16.
贝叶斯过滤算法和费舍尔过滤算法均是利用统计学知识对于垃圾邮件进行过滤的算法,有着良好的过滤效果。该文设计将某一词组(单词)出现概率使用加权计算的方法,改善了朴素贝叶斯算法和朴素费舍尔的邮件过滤算法对于出现较少的单词误判情况,使系统对于垃圾邮件判断的准确率上升。设计可以使用个性化的垃圾邮件过滤方案,支持使用邮件下载协议(POP3、IMAP协议)从邮件服务器下载邮件,以及使用邮件解析协议(MIME协议)对于邮件进行解析,支持邮件发送协议(SMTP协议)帮助用户发送邮件。 相似文献
17.
With the incremental use of emails as an essential and popular communication mean over the Internet, there comes a serious threat that impacts the Internet and the society. This problem is known as spam. By receiving spam messages, Internet users are exposed to security issues, and minors are exposed to inappropriate contents. Moreover, spam messages waste resources in terms of storage, bandwidth, and productivity. What makes the problem worse is that spammers keep inventing new techniques to dodge spam filters. On the other side, the massive data flow of hundreds of millions of individuals, and the large number of attributes make the problem more cumbersome and complex. Therefore, proposing evolutionary and adaptable spam detection models becomes a necessity. In this paper, an intelligent detection system that is based on Genetic Algorithm (GA) and Random Weight Network (RWN) is proposed to deal with email spam detection tasks. In addition, an automatic identification capability is also embedded in the proposed system to detect the most relevant features during the detection process. The proposed system is intensively evaluated through a series of extensive experiments based on three email corpora. The experimental results confirm that the proposed system can achieve remarkable results in terms of accuracy, precision, and recall. Furthermore, the proposed detection system can automatically identify the most relevant features of the spam emails. 相似文献
18.
Dr. Guido Schryen 《WIRTSCHAFTSINFORMATIK》2004,46(4):281-288
Spam as unsolicited email has certainly crossed the border of just being bothersome. In 2003, it surpassed legitimate email — growing to more than 50% of all Internet emails. Annually, it causes economic harms of several billion Euros. Fighting spam, beside legal approaches especially technical means are deployed in practical systems, mainly focussing on blocking and filtering mechanisms. This article introduces into the spam field and describes, assesses, and classifies the currently most important approaches against spam. 相似文献
19.
Automatic thesaurus construction for spam filtering using revised back propagation neural network 总被引:1,自引:0,他引:1
Email has become one of the fastest and most economical forms of communication. Email is also one of the most ubiquitous and pervasive applications used on a daily basis by millions of people worldwide. However, the increase in email users has resulted in a dramatic increase in spam emails during the past few years. This paper proposes a new spam filtering system using revised back propagation (RBP) neural network and automatic thesaurus construction. The conventional back propagation (BP) neural network has slow learning speed and is prone to trap into a local minimum, so it will lead to poor performance and efficiency. The authors present in this paper the RBP neural network to overcome the limitations of the conventional BP neural network. A well constructed thesaurus has been recognized as a valuable tool in the effective operation of text classification, it can also overcome the problems in keyword-based spam filters which ignore the relationship between words. The authors conduct the experiments on Ling-Spam corpus. Experimental results show that the proposed spam filtering system is able to achieve higher performance, especially for the combination of RBP neural network and automatic thesaurus construction. 相似文献
20.
Unsolicited or spam email has recently become a major threat that can negatively impact the usability of electronic mail. Spam substantially wastes time and money for business users and network administrators, consumes network bandwidth and storage space, and slows down email servers. In addition, it provides a medium for distributing harmful code and/or offensive content. In this paper, we explore the application of the GMDH (Group Method of Data Handling) based inductive learning approach in detecting spam messages by automatically identifying content features that effectively distinguish spam from legitimate emails. We study the performance for various network model complexities using spambase, a publicly available benchmark dataset. Results reveal that classification accuracies of 91.7% can be achieved using only 10 out of the available 57 attributes, selected through abductive learning as the most effective feature subset (i.e. 82.5% data reduction). We also show how to improve classification performance using abductive network ensembles (committees) trained on different subsets of the training data. Comparison with other techniques such as neural networks and naïve Bayesian classifiers shows that the GMDH-based learning approach can provide better spam detection accuracy with false-positive rates as low as 4.3% and yet requires shorter training time. 相似文献