Sampling-based estimators for subset-based queries |
| |
Authors: | Shantanu Joshi Christopher Jermaine |
| |
Affiliation: | (1) Server Manageability, Oracle, 400 Oracle Parkway, Redwood Shores, CA 94065, USA;(2) Computer and Information Science and Engineering, University of Florida, Gainesville, FL 32611, USA |
| |
Abstract: | We consider the problem of using sampling to estimate the result of an aggregation operation over a subset-based SQL query, where a subquery is correlated to an outer query by a NOT EXISTS, NOT IN, EXISTS or IN clause. We design an unbiased estimator for our query and prove that it is indeed unbiased. We then provide a second, biased estimator that makes use of the superpopulation concept from statistics to minimize the mean squared error of the resulting estimate. The two estimators are tested over an extensive set of experiments. Material in this paper is based upon work supported by the National Science Foundation via grants 0347408 and 0612170. |
| |
Keywords: | Sampling Approximate query processing Aggregate query processing |
本文献已被 SpringerLink 等数据库收录! |
|