This paper concerns the following problem: given a set of multi-attribute records, a fixed number of buckets and a two-disk system, arrange the records into the buckets and then store the buckets between the disks in such a way that, over all possible orthogonal range queries (ORQs), the disk access concurrency is maximized. We shall adopt the multiple key hashing (MKH) method for arranging records into buckets and use the disk modulo (DM) allocation method for storing buckets onto disks. Since the DM allocation method has been shown to be superior to any other allocation methods for allocating an MKH file onto a two-disk system for answering ORQs, the real issue is knowing how to determine an optimal way for organizing the records into buckets based upon the MKH concept.
A performance formula that can be used to evaluate the average response time, over all possible ORQs, of an MKH file in a two-disk system using the DM allocation method is first presented. Based upon this formula, it is shown that our design problem is related to a notoriously difficult problem, namely the Prime Number Problem. Then a performance lower bound and an efficient algorithm for designing optimal MKH files in certain cases are presented. It is pointed out that in some cases the optimal MKH file for ORQs in a two-disk system using the DM allocation method is identical to the optimal MKH file for ORQs in a single-disk system and the optimal average response time in a two-disk system is slightly greater than one half of that in a single-disk system. 相似文献
Consider the problem of efficiently simulating the shared-memory parallel random access machine (PRAM) model on massively parallel architectures with physically distributed memory. To prevent network congestion and memory bank contention, it may be advantageous to hash the shared memory address space. The decision on whether or not to use hashing depends on (1) the communication latency in the network and (2) the locality of memory accesses in the algorithm.We relate this decision directly to algorithmic issues by studying the complexity of hashing in the Block PRAM model of Aggarwal, Chandra, and Snir, a shared-memory model of parallel computation which accounts for communication locality. For this model, we exhibit a universal family of hash functions having optimal locality. The complexity of applying these hash functions to the shared address space of the Block PRAM (i.e., by permuting data elements) is asymptotically equivalent to the complexity of performing a square matrix transpose, and this result is best possible for all pairwise independent universal hash families. These complexity bounds provide theoretical evidence that hashing and randomized routing need not destroy communication locality, addressing an open question of Valiant.This work was started when the author was a student at Oxford University, supported by a National Science Foundation Graduate Fellowship and a Rhodes Scholarship. Any opinions, findings, conclusions, or recommendations expressed in this publication are those of the author and do not necessarily reflect the views of the National Science Foundation or the Rhodes Trust. 相似文献
For a large-scale palmprint identification system,it is necessary to speed up the identification process to reduce the response time and also to have a high rate of identification accuracy.In this paper,we propose a novel hashing-based technique called orientation field code hashing for fast palmprint identification.By investigating hashing-based algorithms,we first propose a double-orientation encoding method to eliminate the instability of orientation codes and make the orientation codes more reasonable.Secondly,we propose a window-based feature measurement for rapid searching of the target.We explore the influence of parameters related to hashing-based palmprint identification.We have carried out a number of experiments on the Hong Kong Poly U large-scale database and the CASIA palmprint database plus a synthetic database.The results show that on the Hong Kong Poly U large-scale database,the proposed method is about 1.5 times faster than the state-of-the-art ones,while achieves the comparable identification accuracy.On the CASIA database plus the synthetic database,the proposed method also achieves a better performance on identification speed. 相似文献