首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
针对钓鱼攻击者常用的伪造HTTPS网站以及其他混淆技术,借鉴了目前主流基于机器学习以及规则匹配的检测钓鱼网站的方法RMLR和PhishDef,增加对网页文本关键字和网页子链接等信息进行特征提取的过程,提出了Nmap-RF分类方法。Nmap-RF是基于规则匹配和随机森林方法的集成钓鱼网站检测方法。根据网页协议对网站进行预过滤,若判定其为钓鱼网站则省略后续特征提取步骤。否则以文本关键字置信度,网页子链接置信度,钓鱼类词汇相似度以及网页PageRank作为关键特征,以常见URL、Whois、DNS信息和网页标签信息作为辅助特征,经过随机森林分类模型判断后给出最终的分类结果。实验证明,Nmap-RF集成方法可以在平均9~10 μs的时间内对钓鱼网页进行检测,且可以过滤掉98.4%的不合法页面,平均总精度可达99.6%。  相似文献   

2.
针对目前大部分钓鱼网站检测方法存在检测准确率低、误判率高等问题,提出了一种基于特征选择与集成学习的钓鱼网站检测方法。该检测方法首先使用FSIGR算法进行特征选择,FSIGR算法结合过滤和封装模式的优点,从信息相关性和分类能力两个方面对特征进行综合度量,并采用前向递增后向递归剔除策略对特征进行选择,以分类精度作为评价指标对特征子集进行评价,从而获取最优特征子集;然后使用最优特征子集数据对随机森林分类算法模型进行训练。在UCI数据集上的实验表明,所提方法能够有效提高钓鱼网站检测的正确率,降低误判率,具有实际应用意义。  相似文献   

3.
Phishing attacks continue to pose serious risks for consumers and businesses as well as threatening global security and the economy. Therefore, developing countermeasures against such attacks is an important step towards defending critical infrastructures such as banking. Although different types of classification algorithms for filtering phishing have been proposed in the literature, the scale and sophistication of phishing attacks have continued to increase steadily. In this paper, we propose a new approach called multi-tier classification model for phishing email filtering. We also propose an innovative method for extracting the features of phishing email based on weighting of message content and message header and select the features according to priority ranking. We will also examine the impact of rescheduling the classifier algorithms in a multi-tier classification process to find out the optimum scheduling. A detailed empirical performance and analysis of the proposed algorithm is present. The results of the experiments show that the proposed algorithm reduces the false positive problems substantially with lower complexity.  相似文献   

4.
We propose a novel classification model that consists of features of website URLs and content for automatically detecting Chinese phishing e-Business websites. The model incorporates several unique domain-specific features of Chinese e-Business websites. We evaluated the proposed model using four different classification algorithms and approximately 3,000 Chinese e-Business websites. The results show that the Sequential Minimal Optimization (SMO) algorithm performs the best. The proposed model outperforms two baseline models in detection precision, recall, and F-measure. The results of a sensitivity analysis demonstrate that domain-specific features have the most significant impact on the detection of Chinese phishing e-Business websites.  相似文献   

5.
Website phishing is considered one of the crucial security challenges for the online community due to the massive numbers of online transactions performed on a daily basis. Website phishing can be described as mimicking a trusted website to obtain sensitive information from online users such as usernames and passwords. Black lists, white lists and the utilisation of search methods are examples of solutions to minimise the risk of this problem. One intelligent approach based on data mining called Associative Classification (AC) seems a potential solution that may effectively detect phishing websites with high accuracy. According to experimental studies, AC often extracts classifiers containing simple “If-Then” rules with a high degree of predictive accuracy. In this paper, we investigate the problem of website phishing using a developed AC method called Multi-label Classifier based Associative Classification (MCAC) to seek its applicability to the phishing problem. We also want to identify features that distinguish phishing websites from legitimate ones. In addition, we survey intelligent approaches used to handle the phishing problem. Experimental results using real data collected from different sources show that AC particularly MCAC detects phishing websites with higher accuracy than other intelligent algorithms. Further, MCAC generates new hidden knowledge (rules) that other algorithms are unable to find and this has improved its classifiers predictive performance.  相似文献   

6.
检测托攻击的本质是对真实用户和虚假用户进行分类,现有的检测算法对于具有选择项的流行攻击、段攻击等攻击方式的检测鲁棒性较差。针对这一问题,通过分析真实用户和虚假用户的评分分布情况,结合ID3决策树提出基于用户评分离散度的托攻击检测Dispersion-C算法。算法通过用户评分极端评分比、去极端评分方差和用户评分标准差3个特征衡量用户评分离散度,并将其作为ID3决策树算法的分类特征,根据不同特征的信息增益选择特征作为分类属性,训练分类器。实验结果表明,Dispersion-C算法对各类托攻击均有良好的检测效果,具有较好的鲁棒性。  相似文献   

7.
8.
In this paper, we present a new rule-based method to detect phishing attacks in internet banking. Our rule-based method used two novel feature sets, which have been proposed to determine the webpage identity. Our proposed feature sets include four features to evaluate the page resources identity, and four features to identify the access protocol of page resource elements. We used approximate string matching algorithms to determine the relationship between the content and the URL of a page in our first proposed feature set. Our proposed features are independent from third-party services such as search engines result and/or web browser history. We employed support vector machine (SVM) algorithm to classify webpages. Our experiments indicate that the proposed model can detect phishing pages in internet banking with accuracy of 99.14% true positive and only 0.86% false negative alarm. Output of sensitivity analysis demonstrates the significant impact of our proposed features over traditional features. We extracted the hidden knowledge from the proposed SVM model by adopting a related method. We embedded the extracted rules into a browser extension named PhishDetector to make our proposed method more functional and easy to use. Evaluating of the implemented browser extension indicates that it can detect phishing attacks in internet banking with high accuracy and reliability. PhishDetector can detect zero-day phishing attacks too.  相似文献   

9.
目前在识别钓鱼网站的研究中,对识别速度有着越来越高的需求,因此提出了一种基于混合特征选择模型的钓鱼网站快速识别方法。混合特征选择模型包含初次特征选择、二次特征选择和分类三个主要部分,使用信息增益、卡方检验相结合以及基于随机森林的递归特征消除算法建立了混合特征选择模型,并在模型中使用分布函数与梯度,获取最佳截断阈值,得到最优数据集,从而提高钓鱼网站识别的效率。实验数据表明,使用该混合特征选择模型进行特征筛选后的数据集,维度降低了79.2%,在分类精确度几乎不损失的情况下,降低了32%的分类时间复杂度,有效地提高了分类效率。另外,使用UCI机器学习库中的大型钓鱼数据集对该模型进行评价,分类精确率虽然损失1.7%,但数据集维度降低了70%,分类时间复杂度降低了41.1%。  相似文献   

10.
The detection of phishing and legitimate websites is considered a great challenge for web service providers because the users of such websites are indistinguishable. Phishing websites also create traffic in the entire network. Another phishing issue is the broadening malware of the entire network, thus highlighting the demand for their detection while massive datasets (i.e., big data) are processed. Despite the application of boosting mechanisms in phishing detection, these methods are prone to significant errors in their output, specifically due to the combination of all website features in the training state. The upcoming big data system requires MapReduce, a popular parallel programming, to process massive datasets. To address these issues, a probabilistic latent semantic and greedy levy gradient boosting (PLS-GLGB) algorithm for website phishing detection using MapReduce is proposed. A feature selection-based model is provided using a probabilistic intersective latent semantic preprocessing model to minimize errors in website phishing detection. Here, the missing data in each URL are identified and discarded for further processing to ensure data quality. Subsequently, with the preprocessed features (URLs), feature vectors are updated by the greedy levy divergence gradient (model) that selects the optimal features in the URL and accurately detects the websites. Thus, greedy levy efficiently differentiates between phishing websites and legitimate websites. Experiments are conducted using one of the largest public corpora of a website phish tank dataset. Results show that the PLS-GLGB algorithm for website phishing detection outperforms state-of-the-art phishing detection methods. Significant amounts of phishing detection time and errors are also saved during the detection of website phishing.  相似文献   

11.
Phishing attacks are security attacks that do not affect only individuals’ or organizations’ websites but may affect Internet of Things (IoT) devices and networks. IoT environment is an exposed environment for such attacks. Attackers may use thingbots software for the dispersal of hidden junk emails that are not noticed by users. Machine and deep learning and other methods were used to design detection methods for these attacks. However, there is still a need to enhance detection accuracy. Optimization of an ensemble classification method for phishing website (PW) detection is proposed in this study. A Genetic Algorithm (GA) was used for the proposed method optimization by tuning several ensemble Machine Learning (ML) methods parameters, including Random Forest (RF), AdaBoost (AB), XGBoost (XGB), Bagging (BA), GradientBoost (GB), and LightGBM (LGBM). These were accomplished by ranking the optimized classifiers to pick out the best classifiers as a base for the proposed method. A PW dataset that is made up of 4898 PWs and 6157 legitimate websites (LWs) was used for this study's experiments. As a result, detection accuracy was enhanced and reached 97.16 percent.  相似文献   

12.
网络钓鱼攻击(phishing,又称钓鱼攻击、网络钓鱼)作为一种主要基于互联网传播和实施的新兴攻击、诈骗的方式,正呈逐年上升之势,使广大用户和金融机构遭受到财产和经济损失。如何及时、有效地识别网络钓鱼相关的互联网风险,控制钓鱼攻击可能带来的影响,已经成为各金融机构当前亟待解决的问题。因此,各大银行、证券公司以及安全公司纷纷推出自己的反钓鱼监控服务,目前的反钓鱼技术普遍采取利用爬虫主动进行大范围互联网仿冒站点的搜素,爬取大量可疑钓鱼网站,并逐一对可疑钓鱼网站进行检测,判断其是否为钓鱼网站。面对海量可疑网站,如何高效快速地检测出可疑钓鱼网站又成为一个难题。文中介绍了一种基于图像识别技术的网站徽标(LOGO)检测的新思路,用于过滤海量的可疑钓鱼网站,加快钓鱼网站的检测效率。  相似文献   

13.
基于SVM主动学习算法的网络钓鱼检测系统   总被引:1,自引:0,他引:1       下载免费PDF全文
针对钓鱼式网络攻击,从URL入手,对网址URL和Web页面内容综合特征进行识别、分类,实现网络钓鱼检测并保证检测的效率和精度.用支持向量机主动学习算法和适合小样本集的分类模型提高分类性能.实验结果证明,网络钓鱼检测系统能达到较高的检测精度.  相似文献   

14.
In this paper, we employ a novel two-stage soft computing approach for data imputation to assess the severity of phishing attacks. The imputation method involves K-means algorithm and multilayer perceptron (MLP) working in tandem. The hybrid is applied to replace the missing values of financial data which is used for predicting the severity of phishing attacks in financial firms. After imputing the missing values, we mine the financial data related to the firms along with the structured form of the textual data using multilayer perceptron (MLP), probabilistic neural network (PNN) and decision trees (DT) separately. Of particular significance is the overall classification accuracy of 81.80%, 82.58%, and 82.19% obtained using MLP, PNN, and DT respectively. It is observed that the present results outperform those of prior research. The overall classification accuracies for the three risk levels of phishing attacks using the classifiers MLP, PNN, and DT are also superior.  相似文献   

15.
Raj  Chahat  Meel  Priyanka 《Applied Intelligence》2021,51(11):8132-8148

An upsurge of false information revolves around the internet. Social media and websites are flooded with unverified news posts. These posts are comprised of text, images, audio, and videos. There is a requirement for a system that detects fake content in multiple data modalities. We have seen a considerable amount of research on classification techniques for textual fake news detection, while frameworks dedicated to visual fake news detection are very few. We explored the state-of-the-art methods using deep networks such as CNNs and RNNs for multi-modal online information credibility analysis. They show rapid improvement in classification tasks without requiring pre-processing. To aid the ongoing research over fake news detection using CNN models, we build textual and visual modules to analyze their performances over multi-modal datasets. We exploit latent features present inside text and images using layers of convolutions. We see how well these convolutional neural networks perform classification when provided with only latent features and analyze what type of images are needed to be fed to perform efficient fake news detection. We propose a multi-modal Coupled ConvNet architecture that fuses both the data modules and efficiently classifies online news depending on its textual and visual content. We thence offer a comparative analysis of the results of all the models utilized over three datasets. The proposed architecture outperforms various state-of-the-art methods for fake news detection with considerably high accuracies.

  相似文献   

16.
The ability to automatically detect fraudulent escrow websites is important in order to alleviate online auction fraud. Despite research on related topics, such as web spam and spoof site detection, fake escrow website categorization has received little attention. The authentic appearance of fake escrow websites makes it difficult for Internet users to differentiate legitimate sites from phonies; making systems for detecting such websites an important endeavor. In this study we evaluated the effectiveness of various features and techniques for detecting fake escrow websites. Our analysis included a rich set of fraud cues extracted from web page text, image, and link information. We also compared several machine learning algorithms, including support vector machines, neural networks, decision trees, naïve bayes, and principal component analysis. Experiments were conducted to assess the proposed fraud cues and techniques on a test bed encompassing nearly 90,000 web pages derived from 410 legitimate and fake escrow websites. The combination of an extended feature set and a support vector machines ensemble classifier enabled accuracies over 90 and 96% for page and site level classification, respectively, when differentiating fake pages from real ones. Deeper analysis revealed that an extended set of fraud cues is necessary due to the broad spectrum of tactics employed by fraudsters. The study confirms the feasibility of using automated methods for detecting fake escrow websites. The results may also be useful for informing existing online escrow fraud resources and communities of practice about the plethora of fraud cues pervasive in fake websites.  相似文献   

17.

Features subset selection (FSS) generally plays an essential role in the implementation of data mining, particularly in the field of high-dimensional medical data analysis, as well as in supplying early detection with essential features and high accuracy. The latest modern feature selection models are now using the ability of optimization algorithms for extracting features of particular properties to get the highest accuracy performance possible. Many of the optimization algorithms, such as genetic algorithm, often use the required parameters that would need to be adjusted for better results. For the function selection procedure, tuning these parameter values is a difficult challenge. In this paper, a new wrapper-based feature selection approach called binary teaching learning based optimization (BTLBO) is introduced. The binary teaching learning based optimization (BTLBO) is among the most sophisticated meta-heuristic method which does not involve any specific algorithm parameters. It requires only standard process parameters such as population size and a number of iterations to extract a set of features selected from a data. This is a demanding process, to achieve the best possible set of features would be to use a method which is independent of the method controlling parameters. This paper introduces a new modified binary teaching–learning-based optimization (NMBTLBO) as a technique to select subset features and demonstrate support vector machine (SVM) accuracy of binary identification as a fitness function for the implementation of the feature subset selection process. The new proposed algorithm NMBTLBO contains two steps: first, the new updating procedure, second, the new method to select the primary teacher in teacher phase in binary teaching-learning based on optimization algorithm. The proposed technique NMBTLBO was used to classify the rheumatic disease datasets collected from Baghdad Teaching Hospital Outpatient Rheumatology Clinic during 2016–2018. Compared with the original BTLBO algorithm, the improved NMBTLBO algorithm has achieved a major difference in accuracy. Validation was carried out by testing the accuracy of four classification methods: K-nearest neighbors, decision trees, support vector machines and K-means. Study results showed that the classification accuracy of the four methods was increased for the proposed method of selection of features (NMBTLBO) compared to the BTLBO algorithm. SVM classifier provided 89% accuracy of BTLBO-SVM and 95% with NMBTLBO –SVM. Decision trees set the values of 94% with BTLBO-SVM and 95% with the feature selection of NMBTLBO-SVM. The analysis indicates that the latest method (NMBTLBO) enhances classification accuracy.

  相似文献   

18.
一种以相关性确定条件属性的决策树   总被引:5,自引:1,他引:5  
韩家新  王家华 《微机发展》2003,13(5):38-39,42
决策树是数据挖掘中的一种重要的分类器。文章在介绍了一些典型的决策树分类算法的基础上,研究了一种相关性度量的决策树分类器。其主要思想是在建立决策树过程中采用属性相关性度量来确定划分条件属性的顺序,通过阈值设定和处理简化了决策树的剪枝和优化过程,避免了使用信息熵带来的不当划分,详细描述了算法的执行过程以及正确性证明和时间复杂性分析。  相似文献   

19.
信息技术的飞速发展造成了大量的文本数据累积,其中很大一部分是短文本数据。文本分类技术对于从这些海量短文中自动获取知识具有重要意义。但是由于短文中的关键词出现次数少,而且带标签的训练样本又通常数量很少,现有的一般文本挖掘算法很难得到可接受的准确度。一些基于语义的分类方法获得了较好的准确度但又由于其低效性而无法适用于海量数据。文本提出了一个新颖的短文分类算法。该算法基于文本语义特征图,并使用类似kNN的方法进行分类。实验表明该算法在对海量短文进行分类时,其准确度和性能超过其它的算法。  相似文献   

20.
One of the drastically growing and emerging research areas used in most information technology industries is Bigdata analytics. Bigdata is created from social websites like Facebook, WhatsApp, Twitter, etc. Opinions about products, persons, initiatives, political issues, research achievements, and entertainment are discussed on social websites. The unique data analytics method cannot be applied to various social websites since the data formats are different. Several approaches, techniques, and tools have been used for big data analytics, opinion mining, or sentiment analysis, but the accuracy is yet to be improved. The proposed work is motivated to do sentiment analysis on Twitter data for cloth products using Simulated Annealing incorporated with the Multiclass Support Vector Machine (SA-MSVM) approach. SA-MSVM is a hybrid heuristic approach for selecting and classifying text-based sentimental words following the Natural Language Processing (NLP) process applied on tweets extracted from the Twitter dataset. A simulated annealing algorithm searches for relevant features and selects and identifies sentimental terms that customers criticize. SA-MSVM is implemented, experimented with MATLAB, and the results are verified. The results concluded that SA-MSVM has more potential in sentiment analysis and classification than the existing Support Vector Machine (SVM) approach. SA-MSVM has obtained 96.34% accuracy in classifying the product review compared with the existing systems.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号