RRSi: indexing XML data for proximity twig queries |
| |
Authors: | Patrick K. L. Ng Vincent T. Y. Ng |
| |
Affiliation: | (1) Department of Computing, The Hong Kong Polytechnic University, Kowloon, Hong Kong |
| |
Abstract: | Twig query pattern matching is a core operation in XML query processing. Indexing XML documents for twig query processing
is of fundamental importance to supporting effective information retrieval. In practice, many XML documents on the web are
heterogeneous and have their own formats; documents describing relevant information can possess different structures. Therefore
some “user-interesting” documents having similar but non-exact structures against a user query are often missed out. In this
paper, we propose the RRSi, a novel structural index designed for structure-based query lookup on heterogeneous sources of XML documents supporting
proximate query answers. The index avoids the unnecessary processing of structurally irrelevant candidates that might show
good content relevance. An optimized version of the index, oRRSi, is also developed to further reduce both space requirements and computational complexity. To our knowledge, these structural
indexes are the first to support proximity twig queries on XML documents. The results of our preliminary experiments show
that RRSi and oRRSi based query processing significantly outperform previously proposed techniques in XML repositories with structural heterogeneity.
|
| |
Keywords: | XML indexing Proximity query Twig query XML structural similarity |
本文献已被 SpringerLink 等数据库收录! |
|