首页 | 本学科首页   官方微博 | 高级检索  
     


Feature evaluation for web crawler detection with data mining techniques
Authors:Dusan Stevanovic  Aijun An  Natalija Vlajic
Affiliation:1. Stollery Children''s Hospital, Canada;2. University of Alberta, 2C3.79 WMC, 8440 – 112 St. NW, Edmonton, AB, T6G 2B7, Canada;1. State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, Jiangsu 210023, China;2. Shanghai Key Laboratory of Scalable Computing and Systems, Shanghai Jiao Tong University, Shanghai 200240, China;3. College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China
Abstract:Distributed Denial of Service (DDoS) is one of the most damaging attacks on the Internet security today. Recently, malicious web crawlers have been used to execute automated DDoS attacks on web sites across the WWW. In this study we examine the effect of applying seven well-established data mining classification algorithms on static web server access logs in order to: (1) classify user sessions as belonging to either automated web crawlers or human visitors and (2) identify which of the automated web crawlers sessions exhibit ‘malicious’ behavior and are potentially participants in a DDoS attack. The classification performance is evaluated in terms of classification accuracy, recall, precision and F1 score. Seven out of nine vector (i.e. web-session) features employed in our work are borrowed from earlier studies on classification of user sessions as belonging to web crawlers. However, we also introduce two novel web-session features: the consecutive sequential request ratio and standard deviation of page request depth. The effectiveness of the new features is evaluated in terms of the information gain and gain ratio metrics. The experimental results demonstrate the potential of the new features to improve the accuracy of data mining classifiers in identifying malicious and well-behaved web crawler sessions.
Keywords:
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号