首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Cong Leng  Jian Cheng 《Machine Learning》2015,100(2-3):379-398
  相似文献   

2.
The strongest well-known measure for the quality of a universal hash-function family H is its being ε-strongly universal, which measures, for randomly chosen hH, one's inability to guess h(m) even if h(m) is known for some mm. We give example applications in which this measure is too weak, and we introduce a stronger measure for the quality of a hash-function family, ε-variationally universal, which measures one's inability to distinguish h(m) from a random value even if h(m) is known for some mm. We explain the utility of this notion and provide an approach for constructing efficiently computable ε-VU hash-function families.  相似文献   

3.
Fast retrieval methods are critical for many large-scale and data-driven vision applications. Recent work has explored ways to embed high-dimensional features or complex distance functions into a low-dimensional Hamming space where items can be efficiently searched. However, existing methods do not apply for high-dimensional kernelized data when the underlying feature embedding for the kernel is unknown. We show how to generalize locality-sensitive hashing to accommodate arbitrary kernel functions, making it possible to preserve the algorithm's sublinear time similarity search guarantees for a wide class of useful similarity functions. Since a number of successful image-based kernels have unknown or incomputable embeddings, this is especially valuable for image retrieval tasks. We validate our technique on several data sets, and show that it enables accurate and fast performance for several vision problems, including example-based object classification, local feature matching, and content-based retrieval.  相似文献   

4.
A better similarity index structure for high-dimensional feature datapoints is very desirable for building scalable content-based search systems on feature-rich dataset. In this paper, we introduce sparse principal component analysis (Sparse PCA) and Boosting Similarity Sensitive Hashing (Boosting SSC) into traditional spectral hashing for both effective and data-aware binary coding for real data. We call this Sparse Spectral Hashing (SSH). SSH formulates the problem of binary coding as a thresholding a subset of eigenvectors of the Laplacian graph by constraining the number of nonzero features. The convex relaxation and eigenfunction learning are conducted in SSH to make the coding globally optimal and effective to datapoints outside the training data. The comparisons in terms of F1 score and AUC show that SSH outperforms other methods substantially over both image and text datasets.  相似文献   

5.
6.
We define a strategy of including an overflow capability into extendible hashing (EXHASH). We show that both an O(1) expected access cost and an O(N) expected storage cost are achieved by using this mechanism.  相似文献   

7.
A clocked adversary is a program that can time its operations and base its behavior on the results of those timings. While it is well known that hashing performs poorly in the worst case, recent results have proven that, for reference-string programs, the probability of falling into a bad case can be driven arbitrarily low. We show that this is not true for clocked adversaries. This emphasizes the limits on the appiicability of theorems on the behavior of hashing schemes on reference string programs, and raises a novel set of problems dealing with optimality of and vulnerability to clocked adversaries.Work was supported by DARPA and ONR Contracts N00014-85-C-0456 and N00014-85-K-0465, and by NSF Cooperative Agreement DCR-8420948.  相似文献   

8.
Adaptive hashing with signatures combines the adaptive hashing file structure together with superimposed signatures and several new algorithms to produce a new order-preserving data structure. This new technique has excellent direct retrieval performance, localized index organizations, and improved file index balance. In keeping with the principle advantage of the original adaptive hashing technique, algorithms to improve both primary and secondary memory storage utilization are also discussed. Furthermore, the new data structure has a high degree of flexibility, allowing it to be tailored for the optimum performance vs storage utilization ratio for a given application.  相似文献   

9.
Hashing is so commonly used in computing that one might expect hash functions to be well understood, and that choosing a suitable function should not be difficult. The results of investigations into the performance of some widely used hashing algorithms are presented and it is shown that some of these algorithms are far from optimal. Recommendations are made for choosing a hashing algorithm and measuring its performance.  相似文献   

10.
11.
Double hashing with bucket capacity one is augmented with multiple passbits to obtain significant reduction to unsuccessful search lengths. This improves the analysis of Martini et al. [P.M. Martini, W.A. Burkhard, Double hashing with multiple passbits, Internat. J. Found. Theoret. Comput. Sci. 14 (6) (2003) 1165-1188] by providing a closed form expression for the expected unsuccessful search lengths.  相似文献   

12.
Robust and secure image hashing   总被引:8,自引:0,他引:8  
Image hash functions find extensive applications in content authentication, database search, and watermarking. This paper develops a novel algorithm for generating an image hash based on Fourier transform features and controlled randomization. We formulate the robustness of image hashing as a hypothesis testing problem and evaluate the performance under various image processing operations. We show that the proposed hash function is resilient to content-preserving modifications, such as moderate geometric and filtering distortions. We introduce a general framework to study and evaluate the security of image hashing systems. Under this new framework, we model the hash values as random variables and quantify its uncertainty in terms of differential entropy. Using this security framework, we analyze the security of the proposed schemes and several existing representative methods for image hashing. We then examine the security versus robustness tradeoff and show that the proposed hashing methods can provide excellent security and robustness.  相似文献   

13.
Proposes a generalized approach for designing a class of dynamic hashing schemes which require no index and have the growth of a file at a rate of (n+1)/n per full expansion, where n is the number of pages of the file, as compared to a rate of 2 in linear hashing. Based on this generalized approach, we derive a new dynamic hashing scheme called alternating hashing, in which, when a split occurs in page k, the data records in page k are redistributed to page k and page (k+1), or to page k and page (k-1), according to whether the value of level d is even or odd, respectively (d is defined as the number of full expansions that have happened so far). From our performance analysis, given a fixed load control, the proposed scheme can achieve nearly 97% storage utilization, as compared to 78% by using linear hashing  相似文献   

14.
15.
Zhang  Yong  Ou  Weihua  Shi  Yufeng  Deng  Jiaxin  You  Xinge  Wang  Anzhi 《World Wide Web》2022,25(4):1519-1536
World Wide Web - Medical cross-modal retrieval aims to retrieve semantically similar medical instances across different modalities, such as retrieving X-ray images using radiology reports or...  相似文献   

16.
Hashing methods have received significant attention for effective and efficient large scale similarity search in computer vision and information retrieval community. However, most existing cross-view hashing methods mainly focus on either similarity preservation of data or cross-view correlation. In this paper, we propose a graph regularized supervised cross-view hashing (GSCH) to preserve both the semantic correlation and the intra-view and inter view similarity simultaneously. In particular, GSCH uses intra-view similarity to estimate inter-view similarity structure. We further propose a sequential learning approach to derive the hashing function for each view. Experimental results on benchmark datasets against state-of-the-art methods show the effectiveness of our proposed method.  相似文献   

17.
Trie hashing (TH), a primary key access method for storing and accessing records of dynamic files, is discussed. The key address is computed through a trie. A key search usually requires only one disk access when the trie is in core and two disk accesses for very large files when the trie must be on disk. A refinement to trie hashing, trie hashing with controlled load (THCL), is presented. It is designed to control the load factor of a TH file as tightly as that of a B-tree file, allows high load factor of up to 100% for ordered insertions, and increases the load factor for random insertions from 70% to over 85%. It is shown that these properties make trie hashing preferable to a B-tree  相似文献   

18.
Many geographically distributed proxies are increasingly used for collaborative Web caching to improve performance. In hashing-based collaborative Web caching, the response times can be negatively impacted for those URL requests hashed into geographically distant or overloaded proxies. In this paper, we present and evaluate a latency-sensitive hashing scheme for collaborative Web caching. It takes into account latency delays due to both geographical distances and dynamic load conditions. Each URL request is first hashed into an anchor hash bucket, with each bucket mapping to one of the proxies. Secondly, a number of nearby hash buckets are examined to select the proxy with the smallest latency delay to the browser. Trace-driven simulations are conducted to evaluate the performance of this new latency-sensitive hashing. The results show that (1) with the presence of load imbalance due to skew in request origination or hot-spot references, latency-sensitive hashing effectively balances the load by hashing into geographically distributed proxies for collaborative Web caching, and (2) when the overall system is lightly loaded, latency-sensitive hashing effectively reduces latency delays by directing requests to geographically closer proxies.  相似文献   

19.
This article presents a procedure for constructing a near-perfect hashing function. The procedure, which is a modification of Cichelli's algorithm, builds the near-perfect hashing function sufficiently fast to allow larger word sets to be used than were previously possible. The improved procedure is the result of examining the original algorithm for the causes of its sluggish performance and then modifying them. In doing so an attempt was made to preserve the basic simplicity of th original algorithm. The improved performance comes at the expense of more storage. The six modifications used to improve performance are explained in detail and experimental results are given for word sets of varying sizes.  相似文献   

20.
The inevitably poor utilization of storage by computed-access realizations of extendible rectangular arrays can be circumvented by storing such arrays by hashing. This paper studies the extent to which the same switch in storage strategy avoids the even worse utilization of storage by computed-access realizations of extendible ragged (i.e., nonrectangular) arrays. Unfortunately, the dramatic successes of the rectangular case do not carry over: any hashing scheme for extendible ragged arrays with storage demands very much smaller than those of computed-access realizations must suffer expected access time that is close to worst possible. On the other hand, one can obtain moderate savings in storage demands by hashing ragged arrays, together with sufferable access time. This last result issues from a general technique for trading increased access time for savings in storage. Even more striking savings are attainable if one restricts attention tofully-justified ragged arrays, whose raggedness is more regular than that of general ragged arrays. However, the overall impact of our results is that extendible ragged arrays do not succumb to the storage strategies that work efficiently on their rectangular counterparts.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号