为了提高电子邮件中垃圾邮件的过滤准确率和效率,以朴素贝叶斯算法和K最近邻(KNN:K-Nearest Neighbors)算法为基础,对传统垃圾邮件过滤算法进行改进,给出邮件的合法属性和非法属性的概念,并提出一种新的分类算法——基于邮件合法属性和非法属性的分类算法(SEASF:Simple and Efficient Algorithm to Spam Filter based on legitimate attribute and nonlicet attribute)。SEASF计算复杂度较低,可适用于大规模场合及邮件的在线过滤。将SEASF算法应用于垃圾邮件过滤的结果表明,该算法可大幅度提高分类精度,分类速度也令人满意。 相似文献
ABSTRACT“Fast-flux” refers to rapidly assigning different IP addresses to the same domain name. Although there are some legitimate uses for this technique, recently it has become a favorite tool for cyber criminals to launch collaborative attacks. After it was first observed by Honeynet, it was reported that fast-flux has been used in phishing, malware spreading, spam, and other malicious activities linked to criminal organizations. Combining with peer-to-peer networking, distributed command and control, web-based load balancing, and proxy redirection, fast-flux makes Internet attacks more resistant to discovery and counter-measure. This article aims at giving a comprehensive survey on fast-flux attacks. Some important issues including technical background, classification, characterization, measurement and detection, and mitigation are discussed. Challenges of detecting and mitigating fast-flux attack are also pointed out. 相似文献
In today’s world of connectivity there is a huge amount of data than we could imagine. The number of network users are increasing day by day and there are large number of social networks which keeps the users connected all the time. These social networks give the complete independence to the user to post the data either political, commercial or entertainment value. Some data may be sensitive and have a greater impact on the society as a result. The trustworthiness of data is important when it comes to public social networking sites like facebook and twitter. Due to the large user base and its openness there is a huge possibility to spread spam messages in this network. Spam detection is a technique to identify and mark data as a false data value. There are lot of machine learning approaches proposed to detect spam in social networks. The efficiency of any spam detection algorithm is determined by its cost factor and accuracy. Aiming to improve the detection of spam in the social networks this study proposes using statistical based features that are modelled through the supervised boosting approach called Stochastic gradient boosting to evaluate the twitter data sets in the English language. The performance of the proposed model is evaluated using simulation results.
The feature of brevity in mobile phone messages makes it difficult to distinguish lexical patterns to identify spam. This paper proposes a novel approach to spam classification of extremely short messages using not only lexical features that reflect the content of a message but new stylistic features that indicate the manner in which the message is written. Experiments on two mobile phone message collections in two different languages show that the approach outperforms previous content-based approaches significantly, regardless of language. 相似文献