Approximate distributed top-k queries |
| |
Authors: | Boaz Patt-Shamir Allon Shafrir |
| |
Affiliation: | (1) Department of Electrical Engineering, Tel Aviv University, Tel Aviv, 69978, Israel |
| |
Abstract: | We consider a distributed system where each node keeps a local count for items (similar to elections where nodes are ballot
boxes and items are candidates). A top-k query in such a system asks which are the k items whose global count, across all nodes in the system, is the largest. In this paper, we present a Monte Carlo algorithm
that outputs, with high probability, a set of k candidates which approximates the top-k items. The algorithm is motivated by sensor networks in that it focuses on reducing the individual communication complexity.
In contrast to previous algorithms, the communication complexity depends only on the global scores and not on the partition
of scores among nodes. If the number of nodes is large, our algorithm dramatically reduces the communication complexity when
compared with deterministic algorithms. We show that the complexity of our algorithm is close to a lower bound on the cell-probe
complexity of any non-interactive top-k approximation algorithm. We show that for some natural global distributions (such as the Geometric or Zipf distributions),
our algorithm needs only polylogarithmic number of communication bits per node.
An extended abstract of this paper appeared in Proc. 13th Int. Colloquium on Structural Information and Communication Complexity,
SIROCCO 2006, Lecture Notes in Computer Science 4056, pp. 319–333. |
| |
Keywords: | Distributed algorithms Aggregate queries Communication complexity Sensor networks Random sampling |
本文献已被 SpringerLink 等数据库收录! |
|