首页 | 本学科首页   官方微博 | 高级检索  
     


Super-EGO: fast multi-dimensional similarity join
Authors:Dmitri V Kalashnikov
Affiliation:1. Department of Computer Science, University of California, Irvine, CA, USA
Abstract:Efficient processing of high-dimensional similarity joins plays an important role for a wide variety of data-driven applications. In this paper, we consider $\varepsilon $ -join variant of the problem. Given two $d$ -dimensional datasets and parameter $\varepsilon $ , the task is to find all pairs of points, one from each dataset that are within $\varepsilon $ distance from each other. We propose a new $\varepsilon $ -join algorithm, called Super-EGO, which belongs the EGO family of join algorithms. The new algorithm gains its advantage by using novel data-driven dimensionality re-ordering technique, developing a new EGO-strategy that more aggressively avoids unnecessary computation, as well as by developing a parallel version of the algorithm. We study the newly proposed Super-EGO algorithm on large real and synthetic datasets. The empirical study demonstrates significant advantage of the proposed solution over the existing state of the art techniques.
Keywords:
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号