首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

One of the major challenges in cyber space and Internet of things (IoT) environments is the existence of fake or phishing websites that steal users’ information. A website as a multimedia system provides access to different types of data such as text, image, video, audio. Each type of these data are prune to be used by fishers to perform a phishing attack. In phishing attacks, people are directed to fake pages and their important information is stolen by a thief or phisher. Machine learning and data mining algorithms are the widely used algorithms for classifying websites and detecting phishing attacks. Classification accuracy is highly dependent on the feature selection method employed to choose appropriate features for classification. In this research, an improved spotted hyena optimization algorithm (ISHO algorithm) is proposed to select proper features for classifying phishing websites through support vector machine. The proposed ISHO algorithm outperformed the standard spotted hyena optimization algorithm with better accuracy. In addition, the results indicate the superiority of ISHO algorithm to three other meta-heuristic algorithms including particle swarm optimization, firefly algorithm, and bat algorithm. The proposed algorithm is also compared with a number of classification algorithms proposed before on the same dataset.

  相似文献   

2.
We propose a novel classification model that consists of features of website URLs and content for automatically detecting Chinese phishing e-Business websites. The model incorporates several unique domain-specific features of Chinese e-Business websites. We evaluated the proposed model using four different classification algorithms and approximately 3,000 Chinese e-Business websites. The results show that the Sequential Minimal Optimization (SMO) algorithm performs the best. The proposed model outperforms two baseline models in detection precision, recall, and F-measure. The results of a sensitivity analysis demonstrate that domain-specific features have the most significant impact on the detection of Chinese phishing e-Business websites.  相似文献   

3.
通过用于垃圾文本流过滤的在线文本分类研究,提出了一种新的条件概率集成方法。采用语汇序列表示文本,使用索引结构存储分类知识,设计实现了分类模型的在线训练算法和在线分类算法。抽取电子邮件和手机短信的多种文本特征,分别在TREC07P电子邮件语料和真实中文手机短信语料上进行了垃圾信息过滤实验。实验结果表明,提出的方法能够获得很好的垃圾信息过滤效果。  相似文献   

4.
In this paper, we present a new rule-based method to detect phishing attacks in internet banking. Our rule-based method used two novel feature sets, which have been proposed to determine the webpage identity. Our proposed feature sets include four features to evaluate the page resources identity, and four features to identify the access protocol of page resource elements. We used approximate string matching algorithms to determine the relationship between the content and the URL of a page in our first proposed feature set. Our proposed features are independent from third-party services such as search engines result and/or web browser history. We employed support vector machine (SVM) algorithm to classify webpages. Our experiments indicate that the proposed model can detect phishing pages in internet banking with accuracy of 99.14% true positive and only 0.86% false negative alarm. Output of sensitivity analysis demonstrates the significant impact of our proposed features over traditional features. We extracted the hidden knowledge from the proposed SVM model by adopting a related method. We embedded the extracted rules into a browser extension named PhishDetector to make our proposed method more functional and easy to use. Evaluating of the implemented browser extension indicates that it can detect phishing attacks in internet banking with high accuracy and reliability. PhishDetector can detect zero-day phishing attacks too.  相似文献   

5.
Highly discriminative statistical features for email classification   总被引:2,自引:2,他引:0  
This paper reports on email classification and filtering, more specifically on spam versus ham and phishing versus spam classification, based on content features. We test the validity of several novel statistical feature extraction methods. The methods rely on dimensionality reduction in order to retain the most informative and discriminative features. We successfully test our methods under two schemas. The first one is a classic classification scenario using a 10-fold cross-validation technique for several corpora, including four ground truth standard corpora: Ling-Spam, SpamAssassin, PU1, and a subset of the TREC 2007 spam corpus, and one proprietary corpus. In the second schema, we test the anticipatory properties of our extracted features and classification models with two proprietary datasets, formed by phishing and spam emails sorted by date, and with the public TREC 2007 spam corpus. The contributions of our work are an exhaustive comparison of several feature selection and extraction methods in the frame of email classification on different benchmarking corpora, and the evidence that especially the technique of biased discriminant analysis offers better discriminative features for the classification, gives stable classification results notwithstanding the amount of features chosen, and robustly retains their discriminative value over time and data setups. These findings are especially useful in a commercial setting, where short profile rules are built based on a limited number of features for filtering emails.  相似文献   

6.
7.
对网络钓鱼的攻击原理和形式进行了分析;针对网络钓鱼不同的攻击方式,如欺骗电邮、假冒银行和网上证券、虚假电子商务以及木马/黑客等,提出了相应的解决方案。  相似文献   

8.
针对传统分类算法对维吾尔文文本分类准确率不高的问题,提出了一种基于深度置信网络的维吾尔文短信文本分类模型。深度学习模拟人脑的多层次结构,对数据从低层到高层逐渐地进行特征提取,深层挖掘数据集的分布规律,从而提高分类准确性。通过逐层无监督的方法完成深度置信网络的初始化,并结合softmax回归分类器实现文本的分类。最后在收集的维吾尔文短信数据集上进行实验论证。实验结果表明,相比KNN、SVM和决策树算法,深度置信网络具有更好的分类效果,准确率更高。  相似文献   

9.
Increasing high volume phishing attacks are being encountered every day due to attackers’ high financial returns. Recently, there has been significant interest in applying machine learning for phishing Web pages detection. Different from literatures, this paper introduces predicted labels of textual contents to be part of the features and proposes a novel framework for phishing Web pages detection using hybrid features consisting of URL-based, Web-based, rule-based and textual content-based features. We achieve this framework by developing an efficient two-stage extreme learning machine (ELM). The first stage is to construct classification models on textual contents of Web pages using ELM. In particular, we take Optical Character Recognition (OCR) as an assistant tool to extract textual contents from image format Web pages in this stage. In the second stage, a classification model on hybrid features is developed by using a linear combination model-based ensemble ELMs (LC-ELMs), with the weights calculated by the generalized inverse. Experimental results indicate the proposed framework is promising for detecting phishing Web pages.  相似文献   

10.
11.
基于SVM主动学习算法的网络钓鱼检测系统   总被引:1,自引:0,他引:1       下载免费PDF全文
针对钓鱼式网络攻击,从URL入手,对网址URL和Web页面内容综合特征进行识别、分类,实现网络钓鱼检测并保证检测的效率和精度.用支持向量机主动学习算法和适合小样本集的分类模型提高分类性能.实验结果证明,网络钓鱼检测系统能达到较高的检测精度.  相似文献   

12.
《Information Fusion》2007,8(4):337-346
This paper presents a novel multi-level wavelet based fusion algorithm that combines information from fingerprint, face, iris, and signature images of an individual into a single composite image. The proposed approach reduces the memory size, increases the recognition accuracy using multi-modal biometric features, and withstands common attacks such as smoothing, cropping, JPEG 2000, and filtering due to tampering. The fusion algorithm is validated using the verification algorithms we developed, existing algorithms, and commercial algorithm. In addition to our multi-modal database, experiments are also performed on other well known databases such as FERET face database and CASIA iris database. The effectiveness of the fusion algorithm is experimentally validated by computing the matching scores and the equal error rates before fusion, after reconstruction of biometric images, and when the composite fused image is subjected to both frequency and geometric attacks. The results show that the fusion process reduced the memory required for storing the multi-modal images by 75%. The integrity of biometric features and the recognition performance of the resulting composite fused image is not affected significantly. The complexity of the fusion and the reconstruction algorithms is O(n log n) and is suitable for many real-time applications. We also propose a multi-modal biometric algorithm that further reduces the equal error rate compared to individual biometric images.  相似文献   

13.
网络钓鱼攻击(phishing,又称钓鱼攻击、网络钓鱼)作为一种主要基于互联网传播和实施的新兴攻击、诈骗的方式,正呈逐年上升之势,使广大用户和金融机构遭受到财产和经济损失。如何及时、有效地识别网络钓鱼相关的互联网风险,控制钓鱼攻击可能带来的影响,已经成为各金融机构当前亟待解决的问题。因此,各大银行、证券公司以及安全公司纷纷推出自己的反钓鱼监控服务,目前的反钓鱼技术普遍采取利用爬虫主动进行大范围互联网仿冒站点的搜素,爬取大量可疑钓鱼网站,并逐一对可疑钓鱼网站进行检测,判断其是否为钓鱼网站。面对海量可疑网站,如何高效快速地检测出可疑钓鱼网站又成为一个难题。文中介绍了一种基于图像识别技术的网站徽标(LOGO)检测的新思路,用于过滤海量的可疑钓鱼网站,加快钓鱼网站的检测效率。  相似文献   

14.
The data in the cloud is protected by various mechanisms to ensure security aspects and user’s privacy. But, deceptive attacks like phishing might obtain the user’s data and use it for malicious purposes. In Spite of much technological advancement, phishing acts as the first step in a series of attacks. With technological advancements, availability and access to the phishing kits has improved drastically, thus making it an ideal tool for the hackers to execute the attacks. The phishing cases indicate use of foreign characters to disguise the original Uniform Resource Locator (URL), typosquatting the popular domain names, using reserved characters for re directions and multi-chain phishing. Such phishing URLs can be stored as a part of the document and uploaded in the cloud, providing a nudge to hackers in cloud storage. The cloud servers are becoming the trusted tool for executing these attacks. The prevailing software for blacklisting phishing URLs lacks the security for multi-level phishing and expects security from the client’s end (browser). At the same time, the avalanche effect and immutability of block-chain proves to be a strong source of security. Considering these trends in technology, a block-chain based filtering implementation for preserving the integrity of user data stored in the cloud is proposed. The proposed Phish Block detects the homographic phishing URLs with accuracy of 91% which assures the security in cloud storage.  相似文献   

15.
This article examines how attackers are likely to respond to the current move towards 2-factor authentication as a defence against phishing scams, and describes an alternative approach, available today, that provides a longer-term solution.In recent months, newspaper and television reports have highlighted how highly-organized criminal gangs are launching large-scale, carefully planned attacks against high-street banks and other services, both in the UK and overseas. These so-called ‘phishing’ attacks begin with an email. Appearing to come from the bank, it leads the recipient to a convincing Web page, at which point he is tricked into entering his username and password.  相似文献   

16.
目前各种基于规则的分类方法在电子邮件过滤中起到了良好的效果,在邮件过滤器的训练中,训练集中会存在部分邮件具有邮件类别模糊的现象,如何将训练集中的此类类别界限模糊的邮件提取出来将会对邮件的分类效果有明显提高的作用。提出一种基于聚类的过滤方法,根据界限模糊邮件数据之间的共性特征,对邮件训练集进行聚类。实验表明,与单纯的进行基于规则的分类算法相比,这种方法在各项评价指标上具有优越性。  相似文献   

17.
18.
Zhang  Hongpo  Cheng  Ning  Zhang  Yang  Li  Zhanbo 《Applied Intelligence》2021,51(7):4503-4514

Label flipping attack is a poisoning attack that flips the labels of training samples to reduce the classification performance of the model. Robustness is used to measure the applicability of machine learning algorithms to adversarial attack. Naive Bayes (NB) algorithm is a anti-noise and robust machine learning technique. It shows good robustness when dealing with issues such as document classification and spam filtering. Here we propose two novel label flipping attacks to evaluate the robustness of NB under label noise. For the three datasets of Spambase, TREC 2006c and TREC 2007 in the spam classification domain, our attack goal is to increase the false negative rate of NB under the influence of label noise without affecting normal mail classification. Our evaluation shows that at a noise level of 20%, the false negative rate of Spambase and TREC 2006c has increased by about 20%, and the test error of the TREC 2007 dataset has increased to nearly 30%. We compared the classification accuracy of five classic machine learning algorithms (random forest(RF), support vector machine(SVM), decision tree(DT), logistic regression(LR), and NB) and two deep learning models(AlexNet, LeNet) under the proposed label flipping attacks. The experimental results show that two label noises are suitable for various classification models and effectively reduce the accuracy of the models.

  相似文献   

19.
In this paper, we propose a novel supervised dimension reduction algorithm based on K-nearest neighbor (KNN) classifier. The proposed algorithm reduces the dimension of data in order to improve the accuracy of the KNN classification. This heuristic algorithm proposes independent dimensions which decrease Euclidean distance of a sample data and its K-nearest within-class neighbors and increase Euclidean distance of that sample and its M-nearest between-class neighbors. This algorithm is a linear dimension reduction algorithm which produces a mapping matrix for projecting data into low dimension. The dimension reduction step is followed by a KNN classifier. Therefore, it is applicable for high-dimensional multiclass classification. Experiments with artificial data such as Helix and Twin-peaks show ability of the algorithm for data visualization. This algorithm is compared with state-of-the-art algorithms in classification of eight different multiclass data sets from UCI collection. Simulation results have shown that the proposed algorithm outperforms the existing algorithms. Visual place classification is an important problem for intelligent mobile robots which not only deals with high-dimensional data but also has to solve a multiclass classification problem. A proper dimension reduction method is usually needed to decrease computation and memory complexity of algorithms in large environments. Therefore, our method is very well suited for this problem. We extract color histogram of omnidirectional camera images as primary features, reduce the features into a low-dimensional space and apply a KNN classifier. Results of experiments on five real data sets showed superiority of the proposed algorithm against others.  相似文献   

20.
王青松  魏如玉 《计算机科学》2016,43(4):256-259, 269
朴素贝叶斯算法在垃圾邮件过滤领域得到了广泛应用,该算法中,特征提取是一个必不可少的环节。过去针对中文的垃圾邮件过滤方法都以词作为文本的特征项单位进行提取,面对大规模的邮件训练样本,这种算法的时间效率会成为邮件过滤技术中的一个瓶颈。对此,提出一种基于短语的贝叶斯中文垃圾邮件过滤方法,在特征项提取阶段结合文本分类领域提出的新的短语分析方法,按照基本名词短语、基本动词短语、基本语义分析规则,以短语为单位进行提取。通过分别以词和短语为单位进行垃圾邮件过滤的对比测试实验证实了所提出方法的有效性。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号