首页 | 本学科首页   官方微博 | 高级检索  
     


High-dimensional similarity joins
Authors:Shim   K. Srikant   R. Agrawal   R.
Affiliation:Adv. Inf. Technol. Res. Center, Korea Adv. Inst. of Sci. & Technol., Taejon;
Abstract:Many emerging data mining applications require a similarity join between points in a high-dimensional domain. We present a new algorithm that utilizes a new index structure, called the ε tree, for fast spatial similarity joins on high-dimensional points. This index structure reduces the number of neighboring leaf nodes that are considered for the join test, as well as the traversal cost of finding appropriate branches in the internal nodes. The storage cost for internal nodes is independent of the number of dimensions. Hence, the proposed index structure scales to high-dimensional data. We analyze the cost of the join for the ε tree and the R-tree family, and show that the ε tree will perform better for high-dimensional joins. Empirical evaluation, using synthetic and real-life data sets, shows that similarity join using the ε tree is twice to an order of magnitude faster than the R+ tree, with the performance gap increasing with the number of dimensions. We also discuss how some of the ideas of the ε tree can be applied to the R-tree family. These biased R-trees perform better than the corresponding traditional R-trees for high-dimensional similarity joins, but do not match the performance of the ε tree
Keywords:
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号