首页 | 本学科首页   官方微博 | 高级检索  
     

一种基于集成学习的钓鱼网站检测方法
引用本文:余恩泽,努尔布力,于清. 一种基于集成学习的钓鱼网站检测方法[J]. 计算机工程与应用, 2019, 55(18): 81-88. DOI: 10.3778/j.issn.1002-8331.1812-0362
作者姓名:余恩泽  努尔布力  于清
作者单位:新疆大学 信息科学与工程学院,乌鲁木齐,830046;新疆大学 信息科学与工程学院,乌鲁木齐,830046;新疆大学 信息科学与工程学院,乌鲁木齐,830046
摘    要:针对钓鱼攻击者常用的伪造HTTPS网站以及其他混淆技术,借鉴了目前主流基于机器学习以及规则匹配的检测钓鱼网站的方法RMLR和PhishDef,增加对网页文本关键字和网页子链接等信息进行特征提取的过程,提出了Nmap-RF分类方法。Nmap-RF是基于规则匹配和随机森林方法的集成钓鱼网站检测方法。根据网页协议对网站进行预过滤,若判定其为钓鱼网站则省略后续特征提取步骤。否则以文本关键字置信度,网页子链接置信度,钓鱼类词汇相似度以及网页PageRank作为关键特征,以常见URL、Whois、DNS信息和网页标签信息作为辅助特征,经过随机森林分类模型判断后给出最终的分类结果。实验证明,Nmap-RF集成方法可以在平均9~10 μs的时间内对钓鱼网页进行检测,且可以过滤掉98.4%的不合法页面,平均总精度可达99.6%。

关 键 词:钓鱼网页  集成学习  规则匹配  钓鱼网页混淆技术

Phishing Website Detection Method Based on Integrated Learning
YU Enze,Nurbol,YU Qing. Phishing Website Detection Method Based on Integrated Learning[J]. Computer Engineering and Applications, 2019, 55(18): 81-88. DOI: 10.3778/j.issn.1002-8331.1812-0362
Authors:YU Enze  Nurbol  YU Qing
Affiliation:College of Information Science and Engineering, Xinjiang University, Urumqi 830046, China
Abstract:In view of the fake HTTPS websites commonly used by phishing attackers and other obfuscation techniques, this paper draws on the current mainstream methods of detecting phishing websites based on machine learning and rule matching, RMLR and PhishDef, and adds features such as web page text keywords and web page sub-links. The Nmap-RF classification method is proposed. Nmap-RF is an integrated phishing website detection method based on rule matching and random forest method. The website is pre-filtered according to the webpage protocol, and if it is determined to be a phishing website, the subsequent feature extraction step is omitted. Otherwise, the text keyword confidence, the page sub-link confidence, the phishing vocabulary similarity and the page PageRank are taken as key features. The common URL, Whois, DNS information and web page tag information are used as auxiliary features, and are judged by the random forest classification model. Experiments show that the Nmap-RF integration method can detect phishing pages in an average of 9~10 μs, and can filter out 98.4% of illegal pages. The average total accuracy is 99.6%.
Keywords:phishing websites  ensemble learning  rule matching  phishing obfuscation techniques  
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号