Approximate query processing using wavelets |
| |
Authors: | Kaushik Chakrabarti Minos Garofalakis Rajeev Rastogi Kyuseok Shim |
| |
Affiliation: | (1) University of Illinois, 1304 W. Springfield Ave., Urbana, IL 61801, USA; e-mail: kaushikc@cs.uiuc.edu , US;(2) Bell Laboratories, 600 Mountain Avenue, Murray Hill, NJ 07974, USA; e-mail: {minos,rastogi}@bell-labs.com , US;(3) KAIST and AITrc, 373-1 Kusong-Dong, Yusong-Gu, Taejon 305-701, Korea; e-mail: shim@cs.kaist.ac.kr , KR |
| |
Abstract: | Approximate query processing has emerged as a cost-effective approach for dealing with the huge data volumes and stringent
response-time requirements of today's decision support systems (DSS). Most work in this area, however, has so far been limited
in its query processing scope, typically focusing on specific forms of aggregate queries. Furthermore, conventional approaches
based on sampling or histograms appear to be inherently limited when it comes to approximating the results of complex queries
over high-dimensional DSS data sets. In this paper, we propose the use of multi-dimensional wavelets as an effective tool
for general-purpose approximate query processing in modern, high-dimensional applications. Our approach is based on building
wavelet-coefficient synopses of the data and using these synopses to provide approximate answers to queries. We develop novel query processing algorithms
that operate directly on the wavelet-coefficient synopses of relational tables, allowing us to process arbitrarily complex
queries entirely in the wavelet-coefficient domain. This guarantees extremely fast response times since our approximate query execution engine
can do the bulk of its processing over compact sets of wavelet coefficients, essentially postponing the expansion into relational
tuples until the end-result of the query. We also propose a novel wavelet decomposition algorithm that can build these synopses
in an I/O-efficient manner. Finally, we conduct an extensive experimental study with synthetic as well as real-life data sets
to determine the effectiveness of our wavelet-based approach compared to sampling and histograms. Our results demonstrate
that our techniques: (1) provide approximate answers of better quality than either sampling or histograms; (2) offer query
execution-time speedups of more than two orders of magnitude; and (3) guarantee extremely fast synopsis construction times
that scale linearly with the size of the data.
Received: 7 August 2000 / Accepted: 1 April 2001 Published online: 7 June 2001 |
| |
Keywords: | : Query processing – Data synopses – Approximate query answers – Wavelet decomposition |
本文献已被 SpringerLink 等数据库收录! |
|