首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
The Journal of Supercomputing - Twitter social network has gained more popularity due to the increase in social activities of registered users. Twitter performs dual functions of online social...  相似文献   

2.
This paper presents a novel spam filtering technique called Symbiotic Filtering (SF) that aggregates distinct local filters from several users to improve the overall performance of spam detection. SF is an hybrid approach combining some features from both Collaborative (CF) and Content-Based Filtering (CBF). It allows for the use of social networks to personalize and tailor the set of filters that serve as input to the filtering. A comparison is performed against the commonly used Naive Bayes CBF algorithm. Several experiments were held with the well-known Enron data, under both fixed and incremental symbiotic groups. We show that our system is competitive in performance and is robust against both dictionary and focused contamination attacks. Moreover, it can be implemented and deployed with few effort and low communication costs, while assuring privacy.  相似文献   

3.
Spam appears in various forms and the current trend in spamming is moving towards multimedia spam objects. Image spam is a new type of spam attacks which attempts to bypass the spam filters that mostly text-based. Spamming attacks the users in many ways and these are usually countered by having a server to filter the spammers. This paper provides a fully-distributed pattern recognition system within P2P networks using the distributed associative memory tree (DASMET) algorithm to detect spam which is cost-efficient and not prone to a single point of failure, unlike the server-based systems. This algorithm is scalable for large and frequently updated data sets, and specifically designed for data sets that consist of similar occurring patterns.We have evaluated our system against centralised state-of-the-art algorithms (NN, k-NN, naive Bayes, BPNN and RBFN) and distributed P2P-based algorithms (Ivote-DPV, ensemble k-NN, ensemble naive Bayes, and P2P-GN). The experimental results show that our method is highly accurate with a 98 to 99% accuracy rate, and incurs a small number of messages—in the best-case, it requires only two messages per recall test. In summary, our experimental results show that the DAS-MET performs best with a relatively small amount of resources for the spam detection compared to other distributed methods.  相似文献   

4.
Image spam is unsolicited bulk email, where the message is embedded in an image. Spammers use such images to evade text-based filters. In this research, we analyze and compare two methods for detecting spam images. First, we consider principal component analysis (PCA), where we determine eigenvectors corresponding to a set of spam images and compute scores by projecting images onto the resulting eigenspace. The second approach focuses on the extraction of a broad set of image features and selection of an optimal subset using support vector machines (SVM). Both of these detection strategies provide high accuracy with low computational complexity. Further, we develop a new spam image dataset that cannot be detected using our PCA or SVM approach. This new dataset should prove valuable for improving image spam detection capabilities.  相似文献   

5.
Graph regularization methods for Web spam detection   总被引:1,自引:0,他引:1  
We present an algorithm, witch, that learns to detect spam hosts or pages on the Web. Unlike most other approaches, it simultaneously exploits the structure of the Web graph as well as page contents and features. The method is efficient, scalable, and provides state-of-the-art accuracy on a standard Web spam benchmark.  相似文献   

6.
Spam is a serious universal problem which causes problems for almost all computer users. This issue affects not only normal users of the internet, but also causes a big problem for companies and organizations since it costs a huge amount of money in lost productivity, wasting users’ time and network bandwidth. Many studies on spam indicate that spam cost organizations billions of dollars yearly. This work presents a machine learning method inspired by the human immune system called Artificial Immune System (AIS) which is a new emerging method that still needs further exploration. Core modifications were applied on the standard AIS with the aid of the Genetic Algorithm. Also an Artificial Neural Network for spam detection is applied with a new manner. SpamAssassin corpus is used in all our simulations.  相似文献   

7.
In this paper we propose a novel inference method for maximum a posteriori estimation with Markov random field prior. The central idea is to integrate a kind of joint “voting” of neighboring labels into a message passing scheme similar to loopy belief propagation (LBP). While the LBP operates with many pairwise interactions, we formulate “messages” sent from a neighborhood as a whole. Hence the name neighborhood-consensus message passing (NCMP). The practical algorithm is much simpler than LBP and combines the flexibility of iterated conditional modes (ICM) with some ideas of more general message passing. The proposed method is also a generalization of the iterated conditional expectations algorithm (ICE): we revisit ICE and redefine it in a message passing framework in a more general form. We also develop a simplified version of NCMP, called weighted iterated conditional modes (WICM), that is suitable for large neighborhoods. We verify the potentials of our methods on four different benchmarks, showing the improvement in quality and/or speed over related inference techniques.  相似文献   

8.
We present a library of PVS meta-theories that can be used to verify a class of distributed systems in which agent communication is via message-passing. The theoretical work, as outlined in Chandy et al. (Form Aspect Comput 2011, to appear) consists of iterative schemes for solving systems of linear equations, such as message-passing extensions of the Gauss and Gauss-Seidel methods. We briefly review that work and discuss the challenges in formally verifying it.  相似文献   

9.
Unsolicited or spam email has recently become a major threat that can negatively impact the usability of electronic mail. Spam substantially wastes time and money for business users and network administrators, consumes network bandwidth and storage space, and slows down email servers. In addition, it provides a medium for distributing harmful code and/or offensive content. In this paper, we explore the application of the GMDH (Group Method of Data Handling) based inductive learning approach in detecting spam messages by automatically identifying content features that effectively distinguish spam from legitimate emails. We study the performance for various network model complexities using spambase, a publicly available benchmark dataset. Results reveal that classification accuracies of 91.7% can be achieved using only 10 out of the available 57 attributes, selected through abductive learning as the most effective feature subset (i.e. 82.5% data reduction). We also show how to improve classification performance using abductive network ensembles (committees) trained on different subsets of the training data. Comparison with other techniques such as neural networks and naïve Bayesian classifiers shows that the GMDH-based learning approach can provide better spam detection accuracy with false-positive rates as low as 4.3% and yet requires shorter training time.  相似文献   

10.
Kennedy  Steve 《ITNOW》2005,47(5):22
  相似文献   

11.
Link spam is created with the intention of boosting one target’s rank in exchange of business profit. This unethical way of deceiving Web search engines is known as Web spam. Since then many anti-link spam detection techniques have constantly being proposed. Web spam detection is a crucial task due to its devastation towards Web search engines and global cost of billion dollars annually. In this paper, we proposed a novel technique by incorporating weight properties to enhance the Web spam detection algorithms. Weight properties can be defined as the influences of one Web node towards another Web node. We modified existing Web spam detection algorithms with our novel technique to evaluate the performances on a large public Web spam dataset – WEBSPAM-UK2007. The overall performance have shown that the modified algorithms outperform the benchmark algorithms up to 30.5 % improvement at host level and 6.11 % improvement at page level.  相似文献   

12.
13.
Kawintiranon  Kornraphop  Singh  Lisa  Budak  Ceren 《Machine Learning》2022,111(7):2515-2536
Machine Learning - Social media data has a mix of high and low-quality content. One form of commonly studied low-quality content is spam. Most studies assume that spam is context-neutral. We show...  相似文献   

14.
In this paper, we present a generic statistical approach to identify spam profiles on Online Social Networks (OSNs). Our study is based on real datasets containing both normal and spam profiles crawled from Facebook and Twitter networks. We have identified a set of 14 generic statistical features to identify spam profiles. The identified features are common to both Facebook and Twitter networks. For classification task, we have used three different classification algorithms – naïve Bayes, Jrip, and J48, and evaluated them on both individual and combined datasets to establish the discriminative property of the identified features. The results obtained on a combined dataset has detection rate (DR) as 0.957 and false positive rate (FPR) as 0.048, whereas on Facebook dataset the DR and FPR values are 0.964 and 0.089, respectively, and that on Twitter dataset the DR and FPR values are 0.976 and 0.075, respectively. We have also analyzed the contribution of each individual feature towards the detection accuracy of spam profiles. Thereafter, we have considered 7 most discriminative features and proposed a clustering-based approach to identify spam campaigns on Facebook and Twitter networks.  相似文献   

15.
With computers and the Internet being essential in everyday life, malware poses serious and evolving threats to their security, making the detection of malware of utmost concern. Accordingly, there have been many researches on intelligent malware detection by applying data mining and machine learning techniques. Though great results have been achieved with these methods, most of them are built on shallow learning architectures. Due to its superior ability in feature learning through multilayer deep architecture, deep learning is starting to be leveraged in industrial and academic research for different applications. In this paper, based on the Windows application programming interface calls extracted from the portable executable files, we study how a deep learning architecture can be designed for intelligent malware detection. We propose a heterogeneous deep learning framework composed of an AutoEncoder stacked up with multilayer restricted Boltzmann machines and a layer of associative memory to detect newly unknown malware. The proposed deep learning model performs as a greedy layer-wise training operation for unsupervised feature learning, followed by supervised parameter fine-tuning. Different from the existing works which only made use of the files with class labels (either malicious or benign) during the training phase, we utilize both labeled and unlabeled file samples to pre-train multiple layers in the heterogeneous deep learning framework from bottom to up for feature learning. A comprehensive experimental study on a real and large file collection from Comodo Cloud Security Center is performed to compare various malware detection approaches. Promising experimental results demonstrate that our proposed deep learning framework can further improve the overall performance in malware detection compared with traditional shallow learning methods, deep learning methods with homogeneous framework, and other existing anti-malware scanners. The proposed heterogeneous deep learning framework can also be readily applied to other malware detection tasks.  相似文献   

16.
A method of spam detection, based on cognitive pattern recognition, had been proposed. The connection between Email category and cognition of Email user interest within life and work, had been analyzed. Under the guidance of cognitive pattern recognition theory, the mechanism of spam detection, based on intelligent cognition of Email user interest within life and work, had been discussed. Then the spam detection algorithm and its concrete implementation was given. Experimental results demonstrate that the spare detection algorithm has a good learning ability, scalability, and a good ability to achieve high recognition accuracyn  相似文献   

17.
Zhu  Hui-Juan  Jiang  Tong-Hai  Ma  Bo  You  Zhu-Hong  Shi  Wei-Lei  Cheng  Li 《Neural computing & applications》2018,30(11):3353-3361

Mobile phones are rapidly becoming the most widespread and popular form of communication; thus, they are also the most important attack target of malware. The amount of malware in mobile phones is increasing exponentially and poses a serious security threat. Google’s Android is the most popular smart phone platforms in the world and the mechanisms of permission declaration access control cannot identify the malware. In this paper, we proposed an ensemble machine learning system for the detection of malware on Android devices. More specifically, four groups of features including permissions, monitoring system events, sensitive API and permission rate are extracted to characterize each Android application (app). Then an ensemble random forest classifier is learned to detect whether an app is potentially malicious or not. The performance of our proposed method is evaluated on the actual data set using tenfold cross-validation. The experimental results demonstrate that the proposed method can achieve a highly accuracy of 89.91%. For further assessing the performance of our method, we compared it with the state-of-the-art support vector machine classifier. Comparison results demonstrate that the proposed method is extremely promising and could provide a cost-effective alternative for Android malware detection.

  相似文献   

18.
In this paper, a three-layer back-propagation neural network (BPNN) is employed for spam detection by using a concentration based feature construction (CFC) approach. In the CFC approach, ‘self’ and ‘non-self’ concentrations are constructed through ‘self’ and ‘non-self’ gene libraries, respectively, to form a two-element concentration vector for expressing the e-mail efficiently. A three-layer BPNN with two-element input is then employed to classify e-mails automatically. Comprehensive experiments are conducted on two public benchmark corpora PU1 and Ling to demonstrate that the proposed CFC approach based BPNN classifier not only has a very much fast speed but also achieves 97 and 99% of classification accuracy on corpora PU1 and Ling by just using a two-element concentration feature vector.  相似文献   

19.
为了有效地检测垃圾网页,通过分析网页内容特征和链接特征的分布,发现正常网页特征分布有规律而垃圾网页特征分布散乱,根据正常网页特征分布与垃圾网页特征分布的不同,提出了用分布函数拟合正常网页特征分布,并计算正常网页和垃圾网页比例与分布函数的差值,以差值为阈值使用C4.5决策树对垃圾网页进行检测.实验结果表明,该方法能够有效地减少被错误分类的正常网页,提高准确率.  相似文献   

20.
基于P2P的协作式垃圾邮件检测系统   总被引:1,自引:0,他引:1  
邱明明  吴国新 《计算机工程与设计》2007,28(11):2559-2562,2596
对于某一封邮件是否为垃圾邮件,很多的邮件用户可能有着相同的看法.利用这一特性,提出了一种基于P2P的协作式垃圾邮件检测系统的设计方案.系统借助的信息摘要技术,在保护用户隐私的同时,提供了较好的抗攻击能力.在邮件服务器层构建了有超级节点的P2P网络,实现了垃圾邮件信息的分布式共享.同时在用户层设计了基于多Agent的体系结构,在利用邮件服务器对收到邮件进行初步分析的结果的同时,利用个性化贝叶斯Agent,可以实现一定程度的个性化垃圾邮件检测服务.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号