Divide, Compress and Conquer: Querying XML via Partitioned Path-Based Compressed Data Blocks |
| |
Authors: | Wilfred Ng Ho-Lam Lau Aoying Zhou |
| |
Affiliation: | (1) Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Hong Kong, China;(2) Department of Computer Science and Engineering, Fudan University, Shanghai, China |
| |
Abstract: | We propose a novel partition path-based (PPB) grouping strategy to store compressed XML data in a stream of blocks. In addition,
we employ a minimal indexing scheme called block statistic signature (BSS) on the compressed data, which is a simple but effective
technique to support evaluation of selection and aggregate XPath queries of the compressed data. We present a formal analysis
and empirical study of these techniques. The BSS indexing is first extended into effective cluster statistic signature (CSS)
and multiple-cluster statistic signature (MSS) indexing by establishing more layers of indexes. We analyze how the response
time is affected by various parameters involved in our compression strategy such as the data stream block size, the number
of cluster layers, and the query selectivity. We also gain further insight about the compression and querying performance
by studying the optimal block size in a stream, which leads to the minimum processing cost for queries. The cost model analysis
provides a solid foundation for predicting the querying performance. Finally, we demonstrate that our PPB grouping and indexing
strategies are not only efficient enough to support path-based selection and aggregate queries of the compressed XML data,
but they also require relatively low computation time and storage space when compared with other state-of-the-art compression
strategies. |
| |
Keywords: | data compression query processing cost model markup languages |
本文献已被 SpringerLink 等数据库收录! |
|