共查询到20条相似文献,搜索用时 15 毫秒
1.
Clotilde Lopes Paulo Cortez Pedro Sousa Miguel Rocha Miguel Rio 《Expert systems with applications》2011,38(8):9365-9372
This paper presents a novel spam filtering technique called Symbiotic Filtering (SF) that aggregates distinct local filters from several users to improve the overall performance of spam detection. SF is an hybrid approach combining some features from both Collaborative (CF) and Content-Based Filtering (CBF). It allows for the use of social networks to personalize and tailor the set of filters that serve as input to the filtering. A comparison is performed against the commonly used Naive Bayes CBF algorithm. Several experiments were held with the well-known Enron data, under both fixed and incremental symbiotic groups. We show that our system is competitive in performance and is robust against both dictionary and focused contamination attacks. Moreover, it can be implemented and deployed with few effort and low communication costs, while assuring privacy. 相似文献
2.
Zubair A. Baig Sadiq M. Sait AbdulRahman Shaheen 《Engineering Applications of Artificial Intelligence》2013,26(7):1731-1740
Network intrusion detection has been an area of rapid advancement in recent times. Similar advances in the field of intelligent computing have led to the introduction of several classification techniques for accurately identifying and differentiating network traffic into normal and anomalous. Group Method for Data Handling (GMDH) is one such supervised inductive learning approach for the synthesis of neural network models. Through this paper, we propose a GMDH-based technique for classifying network traffic into normal and anomalous. Two variants of the technique, namely, Monolithic and Ensemble-based, were tested on the KDD-99 dataset. The dataset was preprocessed and all features were ranked based on three feature ranking techniques, namely, Information Gain, Gain Ratio, and GMDH by itself. The results obtained proved that the proposed intrusion detection scheme yields high attack detection rates, nearly 98%, when compared with other intelligent classification techniques for network intrusion detection. 相似文献
3.
Email is one of the most ubiquitous and pervasive applications used on a daily basis by millions of people worldwide, individuals and organizations more and more rely on the emails to communicate and share information and knowledge. However, the increase in email users has resulted in a dramatic increase in spam emails during the past few years. It is becoming a big challenge to process and manage the emails efficiently for and individuals and organizations. This paper proposes new email classification models using a linear neural network trained by perceptron learning algorithm and a nonlinear neural network trained by back-propagation learning algorithm. An efficient semantic feature space (SFS) method is introduced in these classification models. The traditional back-propagation neural network (BPNN) has slow learning speed and is prone to trap into a local minimum, so the modified back-propagation neural network (MBPNN) is presented to overcome these limitations. The vector space model based email classification system suffers from a large number of features and ambiguity in the meaning of terms, which will lead to sparse and noisy feature space. So we use the SFS to convert the original sparse and noisy feature space to a semantically richer feature space, which will helps to accelerate the learning speed. The experiments are conducted based on different training set size and extracted feature size. Experimental results show that the models using MBPNN outperform the traditional BPNN, and the use of SFS can greatly reduce the feature dimensionality and improve email classification performance. 相似文献
4.
Neural Computing and Applications - Email has become extremely popular among people nowadays. In fact, it has been reported to be the cheapest, popular and fastest means of communication in recent... 相似文献
5.
Image spam is unsolicited bulk email, where the message is embedded in an image. Spammers use such images to evade text-based filters. In this research, we analyze and compare two methods for detecting spam images. First, we consider principal component analysis (PCA), where we determine eigenvectors corresponding to a set of spam images and compute scores by projecting images onto the resulting eigenspace. The second approach focuses on the extraction of a broad set of image features and selection of an optimal subset using support vector machines (SVM). Both of these detection strategies provide high accuracy with low computational complexity. Further, we develop a new spam image dataset that cannot be detected using our PCA or SVM approach. This new dataset should prove valuable for improving image spam detection capabilities. 相似文献
6.
Email spam causes a serious waste of time and resources. This paper addresses the email spam filtering problem and proposes an online active multi-field learning approach, which is based on the following ideas: (1) Email spam filtering is an online application, which suggests an online learning idea; (2) Email document has a multi-field text structure, which suggests a multi-field learning idea; and (3) It is costly to obtain a label for a real-world email spam filter, which suggests an active learning idea. The online learner regards the email spam filtering as an incremental supervised binary streaming text classification. The multi-field learner combines multiple results predicted by field classifiers in a novel compound weight schema, and each field classifier calculates the arithmetical average of multiple conditional probabilities calculated from feature strings according to a data structure of string-frequency index. Comparing the current variance of field classifying results with the historical variance, the active learner evaluates the classifying confidence and takes the more uncertain email as the more informative sample for which to request a label. The experimental results show that the proposed approach can achieve the state-of-the-art performance with greatly reduced label requirements and very low space-time costs. The performance of our online active multi-field learning, the standard (1-ROCA)% measurement, even exceeds the full feedback performance of some advanced individual text classification algorithms. 相似文献
7.
Adewole Kayode Sakariyah Anuar Nor Badrul Kamsin Amirrudin Sangaiah Arun Kumar 《Multimedia Tools and Applications》2019,78(4):3925-3960
Multimedia Tools and Applications - Short message communication media, such as mobile and microblogging social networks, have become attractive platforms for spammers to disseminate unsolicited... 相似文献
8.
Residual analysis for feature detection 总被引:5,自引:0,他引:5
Chen M.-H. Lee D. Pavlidis T. 《IEEE transactions on pattern analysis and machine intelligence》1991,13(1):30-40
It is shown that in a very simple form residual analysis achieves results that are at least as good as if not better than those obtained by other techniques. There are many ways for extensions of the method. For example, moving average filters of regularization can be used to obtain the residual images. Also, the strength of the correlation, measured by D rr(O ), can be used to eliminate noise, weak edges, etc. A more ambitious extension is by considering smoothing filters that leave invariant the function representing the reflectance from smooth surfaces 相似文献
9.
Spam appears in various forms and the current trend in spamming is moving towards multimedia spam objects. Image spam is a new type of spam attacks which attempts to bypass the spam filters that mostly text-based. Spamming attacks the users in many ways and these are usually countered by having a server to filter the spammers. This paper provides a fully-distributed pattern recognition system within P2P networks using the distributed associative memory tree (DASMET) algorithm to detect spam which is cost-efficient and not prone to a single point of failure, unlike the server-based systems. This algorithm is scalable for large and frequently updated data sets, and specifically designed for data sets that consist of similar occurring patterns.We have evaluated our system against centralised state-of-the-art algorithms (NN, k-NN, naive Bayes, BPNN and RBFN) and distributed P2P-based algorithms (Ivote-DPV, ensemble k-NN, ensemble naive Bayes, and P2P-GN). The experimental results show that our method is highly accurate with a 98 to 99% accuracy rate, and incurs a small number of messages—in the best-case, it requires only two messages per recall test. In summary, our experimental results show that the DAS-MET performs best with a relatively small amount of resources for the spam detection compared to other distributed methods. 相似文献
10.
Graph regularization methods for Web spam detection 总被引:1,自引:0,他引:1
We present an algorithm, witch, that learns to detect spam hosts or pages on the Web. Unlike most other approaches, it simultaneously exploits the structure of the Web graph as well as page contents and features. The method is efficient, scalable, and provides
state-of-the-art accuracy on a standard Web spam benchmark. 相似文献
11.
12.
Yuanning Liu Youwei Wang Lizhou Feng Xiaodong Zhu 《Pattern Analysis & Applications》2016,19(2):369-383
Feature selection is an important technology on improving the efficiency and accuracy of spam filtering. Among the numerous methods, document frequency-based feature selections ignore the effect of term frequency information, thus always deduce unsatisfactory results. In this paper, a hybrid method (called HBM), which combines the document frequency information and term frequency information is proposed. To maintain the category distinguishing ability of the selected features, an optimal document frequency-based feature selection (called ODFFS) is chosen; terms which are indeed discriminative will be selected by ODFFS. For the remaining features, term frequency information is considered and the terms with the highest HBM values are selected. Further, a novel method called feature subset evaluating parameter optimization (FSEPO) is proposed for parameter optimization. Experiments with support vector machine (SVM) and Naïve Bayesian (NB) classifiers are applied on four corpora: PU1, LingSpam, SpamAssian and Trec2007. Six feature selections: information gain, Chi square, improved Gini-index, multi-class odds ratio, normalized term frequency-based discriminative power measure and comprehensively measure feature selection are compared with HBM. Experimental results show that, HBM is significantly superior to other feature selection methods on four corpora when SVM and NB are applied, respectively. 相似文献
13.
14.
Network immunization and virus propagation in email networks: experimental evaluation and analysis 总被引:1,自引:1,他引:0
Network immunization strategies have emerged as possible solutions to the challenges of virus propagation. In this paper,
an existing interactive model is introduced and then improved in order to better characterize the way a virus spreads in email
networks with different topologies. The model is used to demonstrate the effects of a number of key factors, notably nodes’
degree and betweenness. Experiments are then performed to examine how the structure of a network and human dynamics affects
virus propagation. The experimental results have revealed that a virus spreads in two distinct phases and shown that the most
efficient immunization strategy is the node-betweenness strategy. Moreover, those results have also explained why old virus
can survive in networks nowadays from the aspects of human dynamics. 相似文献
15.
Liang Min Hou Jie-Bo Zhu Xiaobin Yang Chun Qin Jingyan 《International Journal on Document Analysis and Recognition》2022,25(3):163-175
International Journal on Document Analysis and Recognition (IJDAR) - Detecting arbitrary shape scene texts is challenging mainly due to the varied aspect ratios, curves, and scales. In this paper,... 相似文献
16.
17.
Copy-move detection is to find the existence of duplicated regions in an image. In this paper, an effective method based on region features is proposed to detect copy-move forgeries, especially when the image is multiple copied or with multiple copy-move groups. Firstly, maximally stable color region detector is applied to extract features, and these features are represented by Zernike moments. Then an improved matching strategy considering n best-matching features is applied to deal with the multiple-copied problem. Moreover, a hierarchical cluster algorithm is developed to estimate transformation matrices and confirm the existence of forgery. Based on these matrices, the duplicated regions can be located at pixel level. Experimental results indicate that the proposed scheme outperforms other similar state-of-the-art techniques. 相似文献
18.
为了控制病毒的传播,给出了一种新颖的基于异常检测的邮件病毒防治方案.通过分析病毒邮件的异常行为,并给出用户相应的风险提示来达到阻止病毒进一步传播的目的.该策略能够检测出层出不穷的新病毒和已有病毒的变种,符合了当前反病毒技术的发展要求. 相似文献
19.
针对传统特征点配准算法效率过慢、对特征点存在误检的现象,提出了一种基于特征点检测的图像配准算法.对特征点检测方法进行了改进,利用像素点与周围像素点的灰度关系滤除非特征点;对剩余的点使用提出的菱形模版进行精确检测,建立了特征点集合;利用迭代最近点(ICP)算法对特征点集合进行配准.实验结果表明:改进算法在特征点检测准确性和检测时间上明显提高,并且具有良好配准效果. 相似文献
20.
Roger L. KingMichael A. HicksStephen P. Signer 《Engineering Applications of Artificial Intelligence》1993,6(6):565-573
The use of an unsupervised learning technique for classifying geological features in the roof overlying an underground coal mine is described. The technique uses torque, thrust, drill speed, penetration rate, and drill position data from a roof bolter as inputs for the classification. Data were obtained from an underground coal mine in the western United States and initially classified using clustering. Some of the available approaches for clustering are reviewed and the rationale used in selecting the chosen approach is discussed. The cluster centers, or exemplars, obtained from this approach can be used to train two supervised neural networks involving the back-propagation of error learning algorithm. 相似文献