首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 33 毫秒
1.
This paper concerns the following problem: given a set of multi-attribute records, a fixed number of buckets and a two-disk system, arrange the records into the buckets and then store the buckets between the disks in such a way that, over all possible orthogonal range queries (ORQs), the disk access concurrency is maximized. We shall adopt the multiple key hashing (MKH) method for arranging records into buckets and use the disk modulo (DM) allocation method for storing buckets onto disks. Since the DM allocation method has been shown to be superior to any other allocation methods for allocating an MKH file onto a two-disk system for answering ORQs, the real issue is knowing how to determine an optimal way for organizing the records into buckets based upon the MKH concept.

A performance formula that can be used to evaluate the average response time, over all possible ORQs, of an MKH file in a two-disk system using the DM allocation method is first presented. Based upon this formula, it is shown that our design problem is related to a notoriously difficult problem, namely the Prime Number Problem. Then a performance lower bound and an efficient algorithm for designing optimal MKH files in certain cases are presented. It is pointed out that in some cases the optimal MKH file for ORQs in a two-disk system using the DM allocation method is identical to the optimal MKH file for ORQs in a single-disk system and the optimal average response time in a two-disk system is slightly greater than one half of that in a single-disk system.  相似文献   


2.
This paper results from an attempt to unify several different file system design theories. We define a term "partial match pattern" and show that in order to produce file systems optimal with respect to partial match patterns, both the multikey hashing (MKH) method [16] and the multidimensional directory (MDD) method [11] must be in such a form that the number of subdivisions is the same for all domains of keys. We show the conditions for the string homomorphism hashing (SHH) method [15], the MKH method, and the MDD method to be equivalent to one another. We define the so-called Cartesian product files and show that if all records are present, the records in a Cartesian product file form a shortest spanning path in which the Hamming distance between every pair of consecutive records is 1. Thus the SHH method, the MKH method, the MDD method, and the multikey sorting (MKS) method [10] are linked together. Finally, we show that for both partial and best match queries, the file systems exhibit a common characteristic: similar records are grouped together.  相似文献   

3.
We consider the complexity of the general information retrieval system design problem and multiattribute file systems based upon the multiple key hashing (MKH) design problem. We first show that the problem of designing an optimal multiattribute file system is NP-hard. The performance formula for multiattribute file systems based upon the MKH method is derived. We also show that the design problem for a multiattribute file system based upon the MKH method is related to the prime number problem. We show that the problem of designing optimal multiattribute files based upon the MKH method can be reduced to finding minimal N-tuples, which was discussed by Chang, Lee and Du. We further present a very efficient method for designing good multiple key hashing functions in the case where the number of buckets is a power of a prime number. We also propose a heuristic algorithm to design good multiple key hashing functions in general.  相似文献   

4.
A multidimensional file is one whose data are characterized by several attributes, each specified in a given domain. A partial match query on a multidimensional file extracts all data whose attributes match the values of one or more attributes specified in the query. The disk allocation problem of a multidimensional file F on a database system with multiple disks accessible in parallel is the problem of distributing F among the disks such that the data qualifying for each partial match query are distributed as evenly as possible among the disks of the system. We propose an optimal solution to this problem for multidimensional files with pairwise prime domains based on a large and flexible class of maximum distance separable codes, namely, the redundant residue codes. We also introduce a new family of residue codes, called the redundant nonpairwise prime residue codes, to deal with files whose attribute domains are nonpairwise prime.  相似文献   

5.
6.
In this paper, we shall show that the symbolic Gray code hashing mechanism is not only good for best matching, but also good for partial match queries. Essentially, we shall propose a new hashing scheme, called bucket-oriented symbolic Gray code, which can be used to produce any arbitrary Cartesian product file, which has been shown to be good for partial match queries. Many interesting properties of this new multiattribute hashing scheme, including the property that it is a perfect hashing scheme, have been discussed and proved.  相似文献   

7.
A class of order-preserving dynamic hashing structures is introduced and analyzed. The access method is referred to as the dynamic random-sequential access method (DRSAM) and is derived from linear hashing. A new logical to physical mapping that is based on sequential bucket allocations in hash order is proposed. With respect to previous methods, this allocation technique has the following characteristics: (1) the structure captures the hashed order in consecutive storage areas so that order preserving (OPH) schemes should result in performance improvements for range queries and sequential processing; and (2) it adapts elastic buckets for the control of file growth. Under specific conditions, this approach outperforms the partial expansion method previously proposed by P.-A. Larson (1982)  相似文献   

8.
In this paper, we shall derive two formulas for the average number of buckets to be examined over all possible partial match queries for Cartesian product files and random files, respectively. The superiority of the Cartesian product file is established. A new multi-key file, called a partition file, is introduced. It is shown that both Cartesian product files and random files are special cases of partition files.  相似文献   

9.
We present a genetic algorithm for tackling a file assignment problem for a large-scale video-on-demand system. The file assignment problem is to find the optimal replication and allocation of movie files to disks so that the request blocking probability is minimized subject to capacity constraints. We adopt a divide-and-conquer strategy, where the entire solution space of file assignments is divided into subspaces. Each subspace is an exclusive set of solutions sharing a common file replication instance. This allows us to utilize a greedy file allocation method for finding a good-quality heuristic solution within each subspace. We further design two performance indices to measure the quality of the heuristic solution on 1.) its assignment of multicopy movies and 2.) its assignment of single-copy movies. We demonstrate that these techniques, together with ad hoc population handling methods, enable genetic algorithms to operate in a significantly reduced search space and achieve good-quality file assignments in a computationally efficient way.  相似文献   

10.
Trie hashing (TH), a primary key access method for storing and accessing records of dynamic files, is discussed. The key address is computed through a trie. A key search usually requires only one disk access when the trie is in core and two disk accesses for very large files when the trie must be on disk. A refinement to trie hashing, trie hashing with controlled load (THCL), is presented. It is designed to control the load factor of a TH file as tightly as that of a B-tree file, allows high load factor of up to 100% for ordered insertions, and increases the load factor for random insertions from 70% to over 85%. It is shown that these properties make trie hashing preferable to a B-tree  相似文献   

11.
A file system tailored to the general needs of the office environment is proposed. This system supports large numbers of a wide variety of documents and inexact fuzzy queries on the documents. The file system is based on a multilevel file structure that combines and extends multikey extendible hashing and signature files to create a document-retrieval system that is more time efficient than other previously proposed systems and is also space efficient  相似文献   

12.
In this paper, we study the allocation of files in a star network. Unlike previous algorithms which assume that files are independently accessed and independently assigned, the interaction of files during the processing of queries is directly incorporated into our cost model. We present an adaptive algorithm, which is much faster than existing algorithms on file allocation, obtains solutions which are on the average only 0.1 percent away from the optimal solutions, and possesses many desirable properties such as the satisfaction of some necessary and sufficient conditions for file allocation.  相似文献   

13.
The design and implementation of a multikey, extensible hashing file addressing scheme and its application as an access method for a relational database are presented. This file organization was developed for Request, a testbed relational database-management system. It offers a viable alternative to indexed sequential files. Access operations, concurrency control, and relational operations are examined. Results of an experimental evaluation are reported  相似文献   

14.
When a file is to be transmitted from a sender to a recipient and when the latter already has a file somewhat similar to it, remote differential compression seeks to determine the similarities interactively so as to transmit only the part of the new file not already in the recipient's old file. Content-dependent chunking means that the sender and recipient chop their files into chunks, with the cutpoints determined by some internal features of the files, so that when segments of the two files agree (possibly in different locations within the files) the cutpoints in such segments tend to be in corresponding locations, and so the chunks agree. By exchanging hash values of the chunks, the sender and recipient can determine which chunks of the new file are absent from the old one and thus need to be transmitted.We propose two new algorithms for content-dependent chunking, and we compare their behavior, on random files, with each other and with previously used algorithms. One of our algorithms, the local maximum chunking method, has been implemented and found to work better in practice than previously used algorithms.Theoretical comparisons between the various algorithms can be based on several criteria, most of which seek to formalize the idea that chunks should be neither too small (so that hashing and sending hash values become inefficient) nor too large (so that agreements of entire chunks become unlikely). We propose a new criterion, called the slack of a chunking method, which seeks to measure how much of an interval of agreement between two files is wasted because it lies in chunks that don't agree.Finally, we show how to efficiently find the cutpoints for local maximum chunking.  相似文献   

15.
In this paper, we extend the binary Gray code to symbolic Gray code. We then show that this symbolic Gray code can be used as a multikey hashing function for storing symbolic records. The record stored at location k and the record stored at location k + 1 will be nearest neighbors if this hashing function is used. Thus, this symbolic Gray code hashing function exhibits some kind of clustering property which will group similar records together. This property is a desirable property for designing nearest neighbor searching (also called best match searching) systems. There are many other interesting properties of this hashing function. For instance, there exists an address-to-key transformation which can be used to determine the record stored at certain location k if this hashing function is used. Besides, if there are totally M records, we only have to reserve exactly M locations; there are no collisions and wasting of memory storage. At the end of this paper, it is shown that the resulting file exhibits the multiple-attribute tree structure.  相似文献   

16.
We present a distributed algorithm for file allocation that guarantees high assurance, availability, and scalability in a large distributed file system. The algorithm can use replication and fragmentation schemes to allocate the files over multiple servers. The file confidentiality and integrity are preserved, even in the presence of a successful attack that compromises a subset of the file servers. The algorithm is adaptive in the sense that it changes the file allocation as the read-write patterns and the location of the clients in the network change. We formally prove that, assuming read-write patterns are stable, the algorithm converges toward an optimal file allocation, where optimality is defined as maximizing the file assurance.  相似文献   

17.
A dynamic hashing scheme based on extendible hashing is proposed whose directory can grow into a multilevel directory. The scheme is compared to the extendible hashing and the extendible hashing tree schemes. The simulation results reveal that the proposed scheme is superior than the other two with respect to directory space utilization, especially for files with nonuniform record distribution. This scheme can be easily extended to multikey file systems and also has good performance  相似文献   

18.
Cartesian product files have been shown to exhibit attractive properties for partial match queries. The Disk Modulo (DM) allocation method is shown to have good performance on the distribution of Cartesian product files into an m-disk system. However, there was no explicit expression made before to represent the DM method's response time to a given partial match query. In this paper, based upon discrete Fourier transform, we derive one formula for such a computation. After obtaining this representation, the performance characteristics of the DM method can now be given an analytic interpretation. Some theoretical results are derived from this formula. We also use our formula to analyze the performance of several popular Disk Modulo algorithms.  相似文献   

19.
The goal of dynamic hashing is to design a function and a file structure that allow the address space allocated to the file to be increased and reduced without reorganizing the whole file. We propose a new scheme for dynamic hashing in which the growth of a file occurs at a rate of n+k/n per full expansion, where n is the number of pages of the file and k is a given integer constant which is smaller than n, as compared to a rate of two in linear hashing. Like linear hashing, the proposed scheme (called linear spiral hashing) requires no index; however, the proposed scheme may or may not add one more physical page, instead of always adding one more page in linear hashing, when a split occurs. Therefore, linear spiral hashing can maintain a more stable performance through the file expansions and have much better storage utilization than linear hashing. From our performance analysis, linear spiral hashing can achieve nearly 97 percent storage utilization as compared to 78 percent storage utilization by using linear hashing, which is also verified by a simulation study  相似文献   

20.
The file allocation problem considers a file and a fully connected network having n nodes. The problem assumes that the overall file usage over a unit time period is known and it asks for the optimal set of network sites at which to locate copies of the file. This paper considers the same problem but it assumes that the behavior of the user access patterns changes over v planning periods in a manner, known in advance. A model is presented which shows that there are (2n ? 1)v possible file allocations. To assist the searching of this large solution space four theorems are presented which are subsequently utilized to analyze the problem and to solve an example case.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号