Examining topic shifts in content-oriented XML retrieval |
| |
Authors: | Elham Ashoori Mounia Lalmas Theodora Tsikrika |
| |
Affiliation: | (1) Department of Computer Science, Queen Mary, University of London, London, E1 4NS, UK |
| |
Abstract: | Content-oriented XML retrieval systems support access to XML repositories by retrieving, in response to user queries, XML
document components (XML elements) instead of whole documents. The retrieved XML elements should not only contain information
relevant to the query, but also provide the right level of granularity. In INEX, the INitiative for the Evaluation of XML
retrieval, a relevant element is defined to be at the right level of granularity if it is exhaustive and specific to the query. Specificity was specifically introduced to capture how focused an element is on the query (i.e., discusses
no other irrelevant topics). To score XML elements according to how exhaustive and specific they are given a query, the content
and logical structure of XML documents have been widely used. One source of evidence that has led to promising results with
respect to retrieval effectiveness is element length. This work aims at examining a new source of evidence deriving from the
semantic decomposition of XML documents. We consider that XML documents can be semantically decomposed through the application
of a topic segmentation algorithm. Using the semantic decomposition and the logical structure of XML documents, we propose
a new source of evidence, the number of topic shifts in an element, to reflect its relevance and more particularly its specificity. This paper has three research objectives.
Firstly, we investigate the characteristics of XML elements reflected by their number of topic shifts. Secondly, we compare
topic shifts to element length, by incorporating each of them as a feature in a retrieval setting and examining their effects
in estimating the relevance of XML elements given a query. Finally, we use the number of topic shifts as evidence for capturing
specificity to provide a focused access to XML repositories. |
| |
Keywords: | Content-oriented XML retrieval Topic segmentation INEX relevance Right level of granularity Topic shifts vs length Focused access |
本文献已被 SpringerLink 等数据库收录! |
|