Streaming multiple aggregations using phantoms |
| |
Authors: | Rui Zhang Nick Koudas Beng Chin Ooi Divesh Srivastava Pu Zhou |
| |
Affiliation: | 1. University of Melbourne, Parkville, VIC, Australia 2. University of Toronto, Toronto, Canada 3. National University of Singapore, Singapore, Singapore 4. AT&T Labs–Research, Middletown, USA
|
| |
Abstract: | Data streams characterize the high speed and large volume input of a new class of applications such as network monitoring, web content
analysis and sensor networks. Among these applications, network monitoring may be the most compelling one—the backbone of
a large internet service provider can generate 1 petabyte of data per day. For many network monitoring tasks such as traffic
analysis and statistics collection, aggregation is a primitive operation. Various analytical and statistical needs naturally
lead to related aggregate queries. In this article, we address the problem of efficiently computing multiple aggregations
over high-speed data streams based on the two-level query processing architecture of GS, a real data stream management system
deployed in AT & T. We discern that additionally computing and maintaining fine-granularity aggregations (called phantoms) has the benefit of supporting shared computation. Based on a thorough analysis, we propose algorithms to identify the best
set of phantoms to maintain and determine allocation of resources (particularly, space) to compute the aggregations. Experiments
show that our algorithm achieves near-optimal computation costs, which outperforms the best adapted algorithm by more than
an order of magnitude. |
| |
Keywords: | |
本文献已被 SpringerLink 等数据库收录! |
|