首页 | 本学科首页   官方微博 | 高级检索  
     


Examining topic shifts in content-oriented XML retrieval
Authors:Elham Ashoori  Mounia Lalmas  Theodora Tsikrika
Affiliation:(1) Department of Computer Science, Queen Mary, University of London, London, E1 4NS, UK
Abstract:Content-oriented XML retrieval systems support access to XML repositories by retrieving, in response to user queries, XML document components (XML elements) instead of whole documents. The retrieved XML elements should not only contain information relevant to the query, but also provide the right level of granularity. In INEX, the INitiative for the Evaluation of XML retrieval, a relevant element is defined to be at the right level of granularity if it is exhaustive and specific to the query. Specificity was specifically introduced to capture how focused an element is on the query (i.e., discusses no other irrelevant topics). To score XML elements according to how exhaustive and specific they are given a query, the content and logical structure of XML documents have been widely used. One source of evidence that has led to promising results with respect to retrieval effectiveness is element length. This work aims at examining a new source of evidence deriving from the semantic decomposition of XML documents. We consider that XML documents can be semantically decomposed through the application of a topic segmentation algorithm. Using the semantic decomposition and the logical structure of XML documents, we propose a new source of evidence, the number of topic shifts in an element, to reflect its relevance and more particularly its specificity. This paper has three research objectives. Firstly, we investigate the characteristics of XML elements reflected by their number of topic shifts. Secondly, we compare topic shifts to element length, by incorporating each of them as a feature in a retrieval setting and examining their effects in estimating the relevance of XML elements given a query. Finally, we use the number of topic shifts as evidence for capturing specificity to provide a focused access to XML repositories.
Keywords:Content-oriented XML retrieval  Topic segmentation  INEX relevance  Right level of granularity  Topic shifts vs  length  Focused access
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号