首页 | 本学科首页   官方微博 | 高级检索  
     


Indexing and querying segmented web pages: the BlockWeb Model
Authors:Emmanuel Bruno  Nicolas Faessel  Herv�� Glotin  Jacques Le Maitre  Michel Scholl
Affiliation:1. LSIS, Universit?? du Sud Toulon-Var, BP 20132, 83957, La Garde Cedex, France
2. LSIS, Universit?? Paul C??zanne, Avenue Escadrille Normandie-Niemen, 13397, Marseille Cedex 20, France
3. Cedric/Wisdom, CNAM, 292 Rue St Martin, 75141, Paris Cedex 03, France
Abstract:We present in this paper a model for indexing and querying web pages, based on the hierarchical decomposition of pages into blocks. Splitting up a page into blocks has several advantages in terms of page design, indexing and querying such as (i) blocks of a page most similar to a query may be returned instead of the page as a whole (ii) the importance of a block can be taken into account, as well as (iii) the permeability of the blocks to neighbor blocks: a block b is said to be permeable to a block b?? in the same page if b?? content (text, image, etc.) can be (partially) inherited by b upon indexing. An engine implementing this model is described including: the transformation of web pages into blocks hierarchies, the definition of a dedicated language to express indexing rules and the storage of indexed blocks into an XML repository. The model is assessed on a dataset of electronic news, and a dataset drawn from web pages of the ImagEval campaign where it improves by 16% the mean average precision of the baseline.
Keywords:
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号