首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Online social networks have become immensely popular in recent years and have become the major sources for tracking the reverberation of events and news throughout the world. However, the diversity and popularity of online social networks attract malicious users to inject new forms of spam. Spamming is a malicious activity where a fake user spreads unsolicited messages in the form of bulk message, fraudulent review, malware/virus, hate speech, profanity, or advertising for marketing scam. In addition, it is found that spammers usually form a connected community of spam accounts and use them to spread spam to a large set of legitimate users. Consequently, it is highly desirable to detect such spammer communities existing in social networks. Even though a significant amount of work has been done in the field of detecting spam messages and accounts, not much research has been done in detecting spammer communities and hidden spam accounts. In this work, an unsupervised approach called SpamCom is proposed for detecting spammer communities in Twitter. We model the Twitter network as a multilayer social network and exploit the existence of overlapping community-based features of users represented in the form of Hypergraphs to identify spammers based on their structural behavior and URL characteristics. The use of community-based features, graph and URL characteristics of user accounts, and content similarity among users make our technique very robust and efficient.  相似文献   

2.
《Computer Networks》2007,51(10):2616-2630
Unsolicited commercial email, commonly known as spam, has become a pressing problem in today’s Internet. In this paper, we re-examine the architectural foundations of the current email delivery system that are responsible for the proliferation of email spam. We argue that the difficulties in controlling spam stem from the fact that the current email system is fundamentally sender-driven and distinctly lacks receiver control over email delivery. Based on these observations we propose a Differentiated Mail Transfer Protocol (DMTP), which grants receivers greater control over how messages from different senders should be delivered on the Internet. In addition, we also develop a simple mathematical model to study the effectiveness of DMTP in controlling spam. Through numerical experiments we demonstrate that DMTP can effectively reduce the maximum revenue that a spammer can gather. Moreover, compared to the current SMTP-based email system, the proposed email system can force spammers to stay online for longer periods of time, which may significantly improve the performance of various real-time blacklists of spammers. In addition, DMTP provides an incremental deployment path from the current SMTP-based system in today’s Internet.  相似文献   

3.
Spam in online social networks (OSNs) is a systemic problem that imposes a threat to these services in terms of undermining their value to advertisers and potential investors, as well as negatively affecting users’ engagement. As spammers continuously keep creating newer accounts and evasive techniques upon being caught, a deeper understanding of their spamming strategies is vital to the design of future social media defense mechanisms. In this work, we present a unique analysis of spam accounts in OSNs viewed through the lens of their behavioral characteristics. Our analysis includes over 100 million messages collected from Twitter over the course of 1 month. We show that there exist two behaviorally distinct categories of spammers and that they employ different spamming strategies. Then, we illustrate how users in these two categories demonstrate different individual properties as well as social interaction patterns. Finally, we analyze the detectability of spam accounts with respect to three categories of features, namely content attributes, social interactions, and profile properties.  相似文献   

4.
Twitter spam detection is a recent area of research in which most previous works had focused on the identification of malicious user accounts and honeypot-based approaches. However, in this paper we present a methodology based on two new aspects: the detection of spam tweets in isolation and without previous information of the user; and the application of a statistical analysis of language to detect spam in trending topics. Trending topics capture the emerging Internet trends and topics of discussion that are in everybody’s lips. This growing microblogging phenomenon therefore allows spammers to disseminate malicious tweets quickly and massively. In this paper we present the first work that tries to detect spam tweets in real time using language as the primary tool. We first collected and labeled a large dataset with 34 K trending topics and 20 million tweets. Then, we have proposed a reduced set of features hardly manipulated by spammers. In addition, we have developed a machine learning system with some orthogonal features that can be combined with other sets of features with the aim of analyzing emergent characteristics of spam in social networks. We have also conducted an extensive evaluation process that has allowed us to show how our system is able to obtain an F-measure at the same level as the best state-of-the-art systems based on the detection of spam accounts. Thus, our system can be applied to Twitter spam detection in trending topics in real time due mainly to the analysis of tweets instead of user accounts.  相似文献   

5.
The main purpose of most spam e-mail messages distributed on Internet today is to entice recipients into visiting World Wide Web pages that are advertised through spam. In essence, e-mail spamming is a campaign that advertises URL addresses at a massive scale and at minimum cost for the advertisers and those advertised. Nevertheless, the characteristics of URL addresses and of web sites advertised through spam have not been studied extensively. In this paper, we investigate the properties of URL-dissemination through spam e-mail, and the characteristics of URL addresses disseminated through spam. We conclude that spammers advertise URL addresses non-repetitively and that spam-advertised URLs are short-lived, elusive, and therefore hard to detect and filter. We also observe that reputable URL addresses are sometimes used as decoys against e-mail users and spam filters. These observations can be valuable for the configuration of spam filters and in order to drive the development of new techniques to fight spam.  相似文献   

6.
Liu  Bo  Ni  Zeyang  Luo  Junzhou  Cao  Jiuxin  Ni  Xudong  Liu  Benyuan  Fu  Xinwen 《World Wide Web》2019,22(6):2953-2975

Social networking websites with microblogging functionality, such as Twitter or Sina Weibo, have emerged as popular platforms for discovering real-time information on the Web. Like most Internet services, these websites have become the targets of spam campaigns, which contaminate Web contents and damage user experiences. Spam campaigns have become a great threat to social network services. In this paper, we investigate crowd-retweeting spam in Sina Weibo, the counterpart of Twitter in China. We carefully analyze the characteristics of crowd-retweeting spammers in terms of their profile features, social relationships and retweeting behaviors. We find that although these spammers are likely to connect more closely than legitimate users, the underlying social connections of crowd-retweeting campaigns are different from those of other existing spam campaigns because of the unique features of retweets that are spread in a cascade. Based on these findings, we propose retweeting-aware link-based ranking algorithms to infer more suspicious accounts by using identified spammers as seeds. Our evaluation results show that our algorithms are more effective than other link-based strategies.

  相似文献   

7.
With the rise of social networking services such as Facebook and Twitter, the problem of spam and content pollution has become more significant and intractable. Using social networking services, users are able to develop relationships and share messages with others in a very convenient manner; however, they are vulnerable to receiving spam messages. The automatic detection of spammers or content polluters on the network can effectively reduce the burden on the service provider in making a decision on appropriate counteractions. Content polluters can be automatically identified by using the supervised learning technique of artificial intelligence. To build a classification model with high accuracy automatically from the training data set, it is important to identify a set of useful features that can classify polluters and non-polluters. Moreover, because we deal with a huge amount of raw data in this process, the efficiency of data preparation and model creation are also critical issues that need to be addressed. In this paper, we present an efficient method for detecting content polluters on Twitter. Specifically, we propose a set of features that can be easily extracted from the messages and behaviors of Twitter users and construct a new breed of classifiers based on these features. The proposed approach requires only a minimal number of feature values per Twitter user and thus adds considerably less time to the overall mining process compared to other methods. Experiments confirm that the proposed approach outperforms previous approaches in both classification accuracy and processing time.  相似文献   

8.
With the incremental use of emails as an essential and popular communication mean over the Internet, there comes a serious threat that impacts the Internet and the society. This problem is known as spam. By receiving spam messages, Internet users are exposed to security issues, and minors are exposed to inappropriate contents. Moreover, spam messages waste resources in terms of storage, bandwidth, and productivity. What makes the problem worse is that spammers keep inventing new techniques to dodge spam filters. On the other side, the massive data flow of hundreds of millions of individuals, and the large number of attributes make the problem more cumbersome and complex. Therefore, proposing evolutionary and adaptable spam detection models becomes a necessity. In this paper, an intelligent detection system that is based on Genetic Algorithm (GA) and Random Weight Network (RWN) is proposed to deal with email spam detection tasks. In addition, an automatic identification capability is also embedded in the proposed system to detect the most relevant features during the detection process. The proposed system is intensively evaluated through a series of extensive experiments based on three email corpora. The experimental results confirm that the proposed system can achieve remarkable results in terms of accuracy, precision, and recall. Furthermore, the proposed detection system can automatically identify the most relevant features of the spam emails.  相似文献   

9.
10.
Earlier works on detecting spam e-mails usually compare the contents of e-mails against specific keywords, which are not robust as the spammers frequently change the terms used in e-mails. We have presented in this paper a novel featuring method for spam filtering. Instead of classifying e-mails according to keywords, this study analyzes the spamming behaviors and extracts the representative ones as features for describing the characteristics of e-mails. An back-propagation neural network is designed and implemented, which builds classification model by considering the behavior-based features revealed from e-mails’ headers and syslogs. Since spamming behaviors are infrequently changed, compared with the change frequency of keywords used in spams, behavior-based features are more robust with respect to the change of time; so that the behavior-based filtering mechanism outperform keyword-based filtering. The experimental results indicate that our methods are more useful in distinguishing spam e-mails than that of keyword-based comparison.  相似文献   

11.
针对垃圾邮件短小、一定时间内在网络上重复、大量地散发的特点,提出了基于签名的近似垃圾邮件检测算法(ASD)。该算法以句为基本单位,求取邮件所含的全部句子的摘要,垃圾邮件的近似检测转变为两个摘要集近似度的比较。通过与近似文本查询算法DSC、DSC-SS、I-Match的比较,ASD算法在近似垃圾邮件查询中,表现出样本集的存储空间大小适中、运算时问短、鲁棒性高、高准确率、高召回率的特征。  相似文献   

12.
《Knowledge》2005,18(4-5):187-195
Spam filtering is a particularly challenging machine learning task as the data distribution and concept being learned changes over time. It exhibits a particularly awkward form of concept drift as the change is driven by spammers wishing to circumvent spam filters. In this paper we show that lazy learning techniques are appropriate for such dynamically changing contexts. We present a case-based system for spam filtering that can learn dynamically. We evaluate its performance as the case-base is updated with new cases. We also explore the benefit of periodically redoing the feature selection process to bring new features into play. Our evaluation shows that these two levels of model update are effective in tracking concept drift.  相似文献   

13.
14.
Spam appears in various forms and the current trend in spamming is moving towards multimedia spam objects. Image spam is a new type of spam attacks which attempts to bypass the spam filters that mostly text-based. Spamming attacks the users in many ways and these are usually countered by having a server to filter the spammers. This paper provides a fully-distributed pattern recognition system within P2P networks using the distributed associative memory tree (DASMET) algorithm to detect spam which is cost-efficient and not prone to a single point of failure, unlike the server-based systems. This algorithm is scalable for large and frequently updated data sets, and specifically designed for data sets that consist of similar occurring patterns.We have evaluated our system against centralised state-of-the-art algorithms (NN, k-NN, naive Bayes, BPNN and RBFN) and distributed P2P-based algorithms (Ivote-DPV, ensemble k-NN, ensemble naive Bayes, and P2P-GN). The experimental results show that our method is highly accurate with a 98 to 99% accuracy rate, and incurs a small number of messages—in the best-case, it requires only two messages per recall test. In summary, our experimental results show that the DAS-MET performs best with a relatively small amount of resources for the spam detection compared to other distributed methods.  相似文献   

15.
在线评论对用户的购买决策有重要的影响作用,部分卖方为提高自身信誉或贬低竞争对手的产品,通过雇佣大量水军有组织、有策略地撰写虚假评论来误导潜在消费者。为了检测这种有组织的水军群组,提出了一个综合考虑网络结构与评论者的行为特征水军群组检测算法。首先,根据评分和评论时间相关性得到评论者之间的紧密度,构建评论者关系图;其次,基于构建的评论者关系图,利用标签传播方法检测社区,得到候选群组集合;最后,复原候选群组对应的二部图,以对比可疑度为评估指标,在每个二部图上找到最终的造假者。基于真实数据集的实验结果证明了该算法的有效性。  相似文献   

16.
In their arms race against developers of spam filters, spammers have recently introduced the image spam trick to make the analysis of emails’ body text ineffective. It consists in embedding the spam message into an attached image, which is often randomly modified to evade signature-based detection, and obfuscated to prevent text recognition by OCR tools. Detecting image spam turns out to be an interesting instance of the problem of content-based filtering of multimedia data in adversarial environments, which is gaining increasing relevance in several applications and media. In this paper we give a comprehensive survey and categorisation of computer vision and pattern recognition techniques proposed so far against image spam, and make an experimental analysis and comparison of some of them on real, publicly available data sets.  相似文献   

17.
Unsolicited or spam email has recently become a major threat that can negatively impact the usability of electronic mail. Spam substantially wastes time and money for business users and network administrators, consumes network bandwidth and storage space, and slows down email servers. In addition, it provides a medium for distributing harmful code and/or offensive content. In this paper, we explore the application of the GMDH (Group Method of Data Handling) based inductive learning approach in detecting spam messages by automatically identifying content features that effectively distinguish spam from legitimate emails. We study the performance for various network model complexities using spambase, a publicly available benchmark dataset. Results reveal that classification accuracies of 91.7% can be achieved using only 10 out of the available 57 attributes, selected through abductive learning as the most effective feature subset (i.e. 82.5% data reduction). We also show how to improve classification performance using abductive network ensembles (committees) trained on different subsets of the training data. Comparison with other techniques such as neural networks and naïve Bayesian classifiers shows that the GMDH-based learning approach can provide better spam detection accuracy with false-positive rates as low as 4.3% and yet requires shorter training time.  相似文献   

18.
Abstract

Spam can be defined as unsolicited e- mail, often of a commercial nature, sent indiscriminately to multiple mailing lists, individuals, or newsgroups. Spoofing (Templeton and Levitt, 2003) is a technique often used by spammers to make them harder to trace. Trojan viruses embedded in e-mail messages also employ spoofing techniques to ensure the source of the message is more difficult to locate (Ishibashi et al., 2003). Spam filters and virus scanners can eliminate only a certain amount of spam and also risk catching legitimate e-mails. As the SoBig virus has demonstrated, virus scanners themselves actually add to the e-mail traffic, through notification and bounceback messages. Simple Mail Transfer Protocol (SMTP) is flawed in that it allows these e-mail headers to be faked and does not allow for the sender to be authenticated as the real sender of the message. If this problem can be solved, it will result in a reduction in spam e-mail messages and more security for existing e-mails, and it will allow e-mail viruses to be tracked down and stopped more effectively (Schwartz and Garfinkel, 1998). This approach is known as “trusted e-mail.”  相似文献   

19.
基于复杂网络的垃圾短信过滤算法   总被引:1,自引:0,他引:1  
对垃圾短信发送用户的识别和过滤具有十分重要的研究价值和社会意义. 随着新形式和内容的垃圾短信出现, 传统的关键字匹配和发送速度频率过滤方法无法有效地处理这一问题. 在对短信发送/接收网络形式化表达的基础上, 以真实短信发送和接收以及通话关系数据为例, 统计和分析了短信发送网络的网络特性. 进一步分析和挖掘了垃圾短信用户在网络上发送接收的异常模式和行为, 并以此提出了一个基于语音关联程度和短信回复比率的过滤算法(NASFA算法). 通过实验和分析表明, 本文的算法能够高效地识别垃圾短信发送用户, 同时能够有效地控制将正常用户误识别为垃圾短信用户的比率.  相似文献   

20.
Issues with spam     
This article takes a detailed view of unsolicited, commercial email, better known as “spam”, examining the current state of spam dissemination, how it is distributed by spammers, the impact and problems spam is causing the IT industry, and what methods are being employed both legislative and technological by various segments of government and the IT industry to help control and/or eliminate spam. In analyzing the various legislative and technological means that are being employed to control and/or eliminate spam, the pros and cons of each method and potential societal impacts are discussed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号