Super-EGO: fast multi-dimensional similarity join |
| |
Authors: | Dmitri V Kalashnikov |
| |
Affiliation: | 1. Department of Computer Science, University of California, Irvine, CA, USA
|
| |
Abstract: | Efficient processing of high-dimensional similarity joins plays an important role for a wide variety of data-driven applications. In this paper, we consider $\varepsilon $ -join variant of the problem. Given two $d$ -dimensional datasets and parameter $\varepsilon $ , the task is to find all pairs of points, one from each dataset that are within $\varepsilon $ distance from each other. We propose a new $\varepsilon $ -join algorithm, called Super-EGO, which belongs the EGO family of join algorithms. The new algorithm gains its advantage by using novel data-driven dimensionality re-ordering technique, developing a new EGO-strategy that more aggressively avoids unnecessary computation, as well as by developing a parallel version of the algorithm. We study the newly proposed Super-EGO algorithm on large real and synthetic datasets. The empirical study demonstrates significant advantage of the proposed solution over the existing state of the art techniques. |
| |
Keywords: | |
本文献已被 SpringerLink 等数据库收录! |
|