首页 | 本学科首页   官方微博 | 高级检索  
     


SAAD,a content based Web Spam Analyzer and Detector
Authors:  ctor M. Prieto,Manuel Á  lvarezFidel Cacheda
Affiliation:Communications and Information Technologies Department, University of A Coruna, Campus de Elvia s/n, 15071 A Coruna, Spain
Abstract:Web Spam is one of the main difficulties that crawlers have to overcome and therefore one of the main problems of the WWW. There are several studies about characterising and detecting Web Spam pages. However, none of them deals with all the possible kinds of Web Spam. This paper shows an analysis of different kinds of Web Spam pages and identifies new elements that characterise it, to define heuristics which are able to partially detect them. We also discuss and explain several heuristics from the point of view of their effectiveness and computational efficiency. Taking them into account, we study several sets of heuristics and demonstrate how they improve the current results. Finally, we propose a new Web Spam detection system called SAAD (Spam Analyzer And Detector), which is based on the set of proposed heuristics and their use in a C4.5 classifier improved by means of Bagging and Boosting techniques. We have also tested our system in some well known Web Spam datasets and we have found it to be very effective.
Keywords:Web characterization   Web Spam   Malware   Data mining   Statistical properties of Web Spam
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号