Quantized ranking for permutation-based indexing |
| |
Affiliation: | 1. Vicomtech Foundation, Basque Research and Technology Alliance (BRTA) Paseo Mikeletegi 57, Donostia/San Sebastian 20009, Spain;2. Institute of Smart Cities, Public University of Navarre, Pamplona 31006, Spain |
| |
Abstract: | The K-Nearest Neighbor (K-NN) search problem is the way to find the K closest and most similar objects to a given query. The K-NN is essential for many applications such as information retrieval and visualization, machine learning and data mining. The exponential growth of data imposes to find approximate approaches to this problem. Permutation-based indexing is one of the most recent techniques for approximate similarity search. Objects are represented by permutation lists ordering their distances to a set of selected reference objects, following the idea that two neighboring objects have the same surrounding. In this paper, we propose a novel quantized representation of permutation lists with its related data structure for effective retrieval on single and multicore architectures. Our novel permutation-based indexing strategy is built to be fast, memory efficient and scalable. This is experimentally demonstrated in comparison to existing proposals using several large-scale datasets of millions of documents and of different dimensions. |
| |
Keywords: | Large-scale indexing Permutation-based indexing Approximate similarity search Metric permutation table Quantized ranking Big-data |
本文献已被 ScienceDirect 等数据库收录! |
|