首页 | 本学科首页   官方微博 | 高级检索  
     


A fast HTML web page change detection approach based on hashing and reducing the number of similarity computations
Authors:Hassan   Kassem   
Affiliation:aDepartment of Electrical and Computer Engineering, American University of Beirut, P.O. Box 11-0236, Riad El-Solh, Beirut 1107 2020, Lebanon
Abstract:
This paper describes a fast HTML web page detection approach that saves computation time by limiting the similarity computations between two versions of a web page to nodes having the same HTML tag type, and by hashing the web page in order to provide direct access to node information. This efficient approach is suitable as a client application and for implementing server applications that could serve the needs of users in monitoring modifications to HTML web pages made over time, and that allow for reporting and visualizing changes and trends in order to gain insight about the significance and types of such changes. The detection of changes across two versions of a page is accomplished by performing similarity computations after transforming the web page into an XML-like structure in which a node corresponds to an open–close HTML tag. Performance and detection reliability results were obtained, and showed speed improvements when compared to the results of a previous approach.
Keywords:Web page change detection   Change monitoring   Similarity computation   HTML   Tree similarity
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号