首页 | 本学科首页   官方微博 | 高级检索  
     

网页内容链接层次语义树的恶意网页检测方法
引用本文:陈本刚,宋礼鹏.网页内容链接层次语义树的恶意网页检测方法[J].计算机工程与应用,2020,56(11):90-97.
作者姓名:陈本刚  宋礼鹏
作者单位:中北大学 大数据学院 大数据与网络安全研究所,太原 030051
基金项目:国家自然科学基金;中北大学第十四届研究生科技立项
摘    要:针对攻击者利用URL缩短服务导致仅依赖于URL特征的恶意网页检测失效的问题,及恶意网页检测中恶意与良性网页高度不均衡的问题,提出一种融合网页内容层次语义树特征的成本敏感学习的恶意网页检测方法。该方法通过构建网页内容链接层次语义树,提取基于语义树的特征,解决了URL缩短服务导致特征失效的问题;并通过构建成本敏感学习的检测模型,解决了数据类别不均衡的问题。实验结果表明,与现有的方法相比,提出的方法不仅能应对缩短服务的问题,还能在类别不均衡的恶意网页检测任务中表现出较低的漏报率2.1%和误报率3.3%。此外,在25万条无标签数据集上,该方法比反病毒工具VirusTotal的查全率提升了38.2%。

关 键 词:恶意网页检测  缩短服务  链接层次语义树  成本敏感

Malicious Webpage Detection Method for Webpage Content Link Hierarchy Semantic Tree
CHEN Bengang,SONG Lipeng.Malicious Webpage Detection Method for Webpage Content Link Hierarchy Semantic Tree[J].Computer Engineering and Applications,2020,56(11):90-97.
Authors:CHEN Bengang  SONG Lipeng
Affiliation:Research Institute of Big Data and Network Security, School of Big Data, North University of China, Taiyuan 030051, China
Abstract:Aiming at the problem that attackers use URL shortening services to cause invalid detection of malicious webpages that rely only on URL characteristics, and the problem of highly unbalanced malicious and benign webpages in malicious webpage detection, this paper proposes a cost-sensitive learning method for malicious webpages that incorporates the features of the hierarchical semantic tree of webpage content. This method solves the problem of feature invalidation caused by URL shortening service by constructing a semantic tree of webpage content link hierarchy and extracting features based on the semantic tree. It constructs a cost-sensitive learning detection model to solve the problem of imbalanced data. Experimental results show that compared with the existing methods, the method proposed in this paper can not only deal with the problem of shortening the service, but also show a lower false negative rate of 2.1% and a false negative rate of 3.3% in the detection of unbalanced malicious web pages. In addition, on 250,000 unlabeled data sets, the method improves the recall rate by 38. 2% compared to the anti-virus tool VirusTotal.
Keywords:malicious webpage detection  URL shortening service  link level semantic tree  cost sensitive  
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号