首页 | 本学科首页   官方微博 | 高级检索  
     


An enhanced Web page change detection approach based on limiting similarity computations to elements of same type
Authors:Hassan Artail  Michel Abi-Aad
Affiliation:(1) Electrical and Computer Engineering, American University of Beirut, Bliss Street, P.O. Box 11-0236, Beirut, 1107 2020, Lebanon
Abstract:This paper describes an efficient Web page detection approach based on restricting the similarity computations between two versions of a given Web page to the nodes with the same HTML tag type. Before performing the similarity computations, the HTML Web page is transformed into an XML-like structure in which a node corresponds to an open-closed HTML tag. Analytical expressions and supporting experimental results are used to quantify the improvements that are made when comparing the proposed approach to the traditional one, which computes the similarities across all nodes of both pages. It is shown that the improvements are highly dependent on the diversity of tags in the page. That is, the more diverse the page is (i.e., contains mixed content of text, images, links, etc.), the greater the improvements are, while the more uniform it is, the lesser they are.
Keywords:Web page  Change detection  Change monitoring  Similarity computation  Performance improvements
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号