Indexing and querying segmented web pages: the BlockWeb Model |
| |
Authors: | Emmanuel Bruno Nicolas Faessel Herv�� Glotin Jacques Le Maitre Michel Scholl |
| |
Affiliation: | 1. LSIS, Universit?? du Sud Toulon-Var, BP 20132, 83957, La Garde Cedex, France 2. LSIS, Universit?? Paul C??zanne, Avenue Escadrille Normandie-Niemen, 13397, Marseille Cedex 20, France 3. Cedric/Wisdom, CNAM, 292 Rue St Martin, 75141, Paris Cedex 03, France
|
| |
Abstract: | We present in this paper a model for indexing and querying web pages, based on the hierarchical decomposition of pages into blocks. Splitting up a page into blocks has several advantages in terms of page design, indexing and querying such as (i) blocks of a page most similar to a query may be returned instead of the page as a whole (ii) the importance of a block can be taken into account, as well as (iii) the permeability of the blocks to neighbor blocks: a block b is said to be permeable to a block b?? in the same page if b?? content (text, image, etc.) can be (partially) inherited by b upon indexing. An engine implementing this model is described including: the transformation of web pages into blocks hierarchies, the definition of a dedicated language to express indexing rules and the storage of indexed blocks into an XML repository. The model is assessed on a dataset of electronic news, and a dataset drawn from web pages of the ImagEval campaign where it improves by 16% the mean average precision of the baseline. |
| |
Keywords: | |
本文献已被 SpringerLink 等数据库收录! |
|