首页 | 本学科首页   官方微博 | 高级检索  
     


Divide, Compress and Conquer: Querying XML via Partitioned Path-Based Compressed Data Blocks
Authors:Wilfred Ng  Ho-Lam Lau  Aoying Zhou
Affiliation:(1) Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Hong Kong, China;(2) Department of Computer Science and Engineering, Fudan University, Shanghai, China
Abstract:We propose a novel partition path-based (PPB) grouping strategy to store compressed XML data in a stream of blocks. In addition, we employ a minimal indexing scheme called block statistic signature (BSS) on the compressed data, which is a simple but effective technique to support evaluation of selection and aggregate XPath queries of the compressed data. We present a formal analysis and empirical study of these techniques. The BSS indexing is first extended into effective cluster statistic signature (CSS) and multiple-cluster statistic signature (MSS) indexing by establishing more layers of indexes. We analyze how the response time is affected by various parameters involved in our compression strategy such as the data stream block size, the number of cluster layers, and the query selectivity. We also gain further insight about the compression and querying performance by studying the optimal block size in a stream, which leads to the minimum processing cost for queries. The cost model analysis provides a solid foundation for predicting the querying performance. Finally, we demonstrate that our PPB grouping and indexing strategies are not only efficient enough to support path-based selection and aggregate queries of the compressed XML data, but they also require relatively low computation time and storage space when compared with other state-of-the-art compression strategies.
Keywords:data compression  query processing  cost model  markup languages
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号