Learning to hash: forgiving hash functions and applications |
| |
Authors: | Shumeet Baluja Michele Covell |
| |
Affiliation: | (1) Google, Inc., 1600 Amphitheatre Parkway, Mountain View, CA 94043, USA |
| |
Abstract: | The problem of efficiently finding similar items in a large corpus of high-dimensional data points arises in many real-world
tasks, such as music, image, and video retrieval. Beyond the scaling difficulties that arise with lookups in large data sets,
the complexity in these domains is exacerbated by an imprecise definition of similarity. In this paper, we describe a method
to learn a similarity function from only weakly labeled positive examples. Once learned, this similarity function is used
as the basis of a hash function to severely constrain the number of points considered for each lookup. Tested on a large real-world
audio dataset, only a tiny fraction of the points (~0.27%) are ever considered for each lookup. To increase efficiency, no
comparisons in the original high-dimensional space of points are required. The performance far surpasses, in terms of both
efficiency and accuracy, a state-of-the-art Locality-Sensitive-Hashing-based (LSH) technique for the same problem and data
set. |
| |
Keywords: | Hashing Audio matching Machine learning Locality sensitive hashing |
本文献已被 SpringerLink 等数据库收录! |
|