Indexing expensive functions for efficient multi-dimensional similarity search |
| |
Authors: | Hanxiong Chen Jianquan Liu Kazutaka Furuse Jeffrey Xu Yu Nobuo Ohbo |
| |
Affiliation: | (1) Center for Web Research, Department of Computer Science, University of Chile, Santiago, Chile |
| |
Abstract: | Similarity search is important in information retrieval applications where objects are usually represented as vectors of high dimensionality. This leads to the increasing need for supporting the indexing of high-dimensional data. On the other hand, indexing structures based on space partitioning are powerless because of the well-known “curse of dimensionality”. Linear scan of the data with approximation is more efficient in the high-dimensional similarity search. However, approaches so far have concentrated on reducing I/O, and ignored the computation cost. For an expensive distance function such as L p norm with fractional p, the computation cost becomes the bottleneck. We propose a new technique to address expensive distance functions by “indexing the function” by pre-computing some key values of the function once. Then, the values are used to develop the upper/lower bounds of the distance between a data vector and the query vector. The technique is extremely efficient since it avoids most of the distance function computations; moreover, it does not involve any extra secondary storage because no index is constructed and stored. The efficiency is confirmed by cost analysis, as well as experiments on synthetic and real data. |
| |
Keywords: | |
本文献已被 SpringerLink 等数据库收录! |
|