期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

On the design of multiple key hashing files for concurrent orthogonal range retrieval between two disks

C. Y. Chen C. C. Chang R. C. T. Lee 《Information Systems》1991,16(6):613-625

This paper concerns the following problem: given a set of multi-attribute records, a fixed number of buckets and a two-disk system, arrange the records into the buckets and then store the buckets between the disks in such a way that, over all possible orthogonal range queries (ORQs), the disk access concurrency is maximized. We shall adopt the multiple key hashing (MKH) method for arranging records into buckets and use the disk modulo (DM) allocation method for storing buckets onto disks. Since the DM allocation method has been shown to be superior to any other allocation methods for allocating an MKH file onto a two-disk system for answering ORQs, the real issue is knowing how to determine an optimal way for organizing the records into buckets based upon the MKH concept.

A performance formula that can be used to evaluate the average response time, over all possible ORQs, of an MKH file in a two-disk system using the DM allocation method is first presented. Based upon this formula, it is shown that our design problem is related to a notoriously difficult problem, namely the Prime Number Problem. Then a performance lower bound and an efficient algorithm for designing optimal MKH files in certain cases are presented. It is pointed out that in some cases the optimal MKH file for ORQs in a two-disk system using the DM allocation method is identical to the optimal MKH file for ORQs in a single-disk system and the optimal average response time in a two-disk system is slightly greater than one half of that in a single-disk system. 相似文献

2.

Common Properties of Some Multiattribute File Systems

《IEEE transactions on pattern analysis and machine intelligence》1979,(2):160-174

This paper results from an attempt to unify several different file system design theories. We define a term "partial match pattern" and show that in order to produce file systems optimal with respect to partial match patterns, both the multikey hashing (MKH) method [16] and the multidimensional directory (MDD) method [11] must be in such a form that the number of subdivisions is the same for all domains of keys. We show the conditions for the string homomorphism hashing (SHH) method [15], the MKH method, and the MDD method to be equivalent to one another. We define the so-called Cartesian product files and show that if all records are present, the records in a Cartesian product file form a shortest spanning path in which the Hamming distance between every pair of consecutive records is 1. Thus the SHH method, the MKH method, the MDD method, and the multikey sorting (MKS) method [10] are linked together. Finally, we show that for both partial and best match queries, the file systems exhibit a common characteristic: similar records are grouped together. 相似文献

3.

Optimal information retrieval when queries are not random

C.C. Chang 《Information Sciences》1984,34(3):199-223

We consider the complexity of the general information retrieval system design problem and multiattribute file systems based upon the multiple key hashing (MKH) design problem. We first show that the problem of designing an optimal multiattribute file system is NP-hard. The performance formula for multiattribute file systems based upon the MKH method is derived. We also show that the design problem for a multiattribute file system based upon the MKH method is related to the prime number problem. We show that the problem of designing optimal multiattribute files based upon the MKH method can be reduced to finding minimal N-tuples, which was discussed by Chang, Lee and Du. We further present a very efficient method for designing good multiple key hashing functions in the case where the number of buckets is a power of a prime number. We also propose a heuristic algorithm to design good multiple key hashing functions in general. 相似文献

4.

Load balanced and optimal disk allocation strategy for partial match queries on multidimensional files

Das S.K. Pinotti C.M. 《Parallel and Distributed Systems, IEEE Transactions on》2002,13(12):1211-1219

A multidimensional file is one whose data are characterized by several attributes, each specified in a given domain. A partial match query on a multidimensional file extracts all data whose attributes match the values of one or more attributes specified in the query. The disk allocation problem of a multidimensional file F on a database system with multiple disks accessible in parallel is the problem of distributing F among the disks such that the data qualifying for each partial match query are distributed as evenly as possible among the disks of the system. We propose an optimal solution to this problem for multidimensional files with pairwise prime domains based on a large and flexible class of maximum distance separable codes, namely, the redundant residue codes. We also introduce a new family of residue codes, called the redundant nonpairwise prime residue codes, to deal with files whose attribute domains are nonpairwise prime. 相似文献

5.

On the File Design Problem for Partial Match Retrieval

《IEEE transactions on pattern analysis and machine intelligence》1985,(2):213-222

相似文献

6.

Symbolic Gray Code as a Perfect Multiattribute Hashing Scheme for Partial Match Queries

《IEEE transactions on pattern analysis and machine intelligence》1982,(3):235-249

In this paper, we shall show that the symbolic Gray code hashing mechanism is not only good for best matching, but also good for partial match queries. Essentially, we shall propose a new hashing scheme, called bucket-oriented symbolic Gray code, which can be used to produce any arbitrary Cartesian product file, which has been shown to be good for partial match queries. Many interesting properties of this new multiattribute hashing scheme, including the property that it is a perfect hashing scheme, have been discussed and proved. 相似文献

7.

New order preserving access methods for very large files derivedfrom linear hashing

Hachem N.I. Berra P.B. 《Knowledge and Data Engineering, IEEE Transactions on》1992,4(1):68-82

A class of order-preserving dynamic hashing structures is introduced and analyzed. The access method is referred to as the dynamic random-sequential access method (DRSAM) and is derived from linear hashing. A new logical to physical mapping that is based on sequential bucket allocations in hash order is proposed. With respect to previous methods, this allocation technique has the following characteristics: (1) the structure captures the hashed order in consecutive storage areas so that order preserving (OPH) schemes should result in performance improvements for range queries and sequential processing; and (2) it adapts elastic buckets for the control of file growth. Under specific conditions, this approach outperforms the partial expansion method previously proposed by P.-A. Larson (1982) 相似文献

8.

Performance Analyses of Cartesian Product Files and Random Files

Chang C. C. Du M. W. Lee R. C. T. 《IEEE transactions on pattern analysis and machine intelligence》1984,(1):88-99

In this paper, we shall derive two formulas for the average number of buckets to be examined over all possible partial match queries for Cartesian product files and random files, respectively. The superiority of the Cartesian product file is established. A new multi-key file, called a partition file, is introduced. It is shown that both Cartesian product files and random files are special cases of partition files. 相似文献

9.

Evolutionary Optimization of File Assignment for a Large-Scale Video-on-Demand System

Jun Guo Yi Wang Kit-Sang Tang Chan S. Wong E.W.M. Taylor P. Zukerman M. 《Knowledge and Data Engineering, IEEE Transactions on》2008,20(6):836-850

We present a genetic algorithm for tackling a file assignment problem for a large-scale video-on-demand system. The file assignment problem is to find the optimal replication and allocation of movie files to disks so that the request blocking probability is minimized subject to capacity constraints. We adopt a divide-and-conquer strategy, where the entire solution space of file assignments is divided into subspaces. Each subspace is an exclusive set of solutions sharing a common file replication instance. This allows us to utilize a greedy file allocation method for finding a good-quality heuristic solution within each subspace. We further design two performance indices to measure the quality of the heuristic solution on 1.) its assignment of multicopy movies and 2.) its assignment of single-copy movies. We demonstrate that these techniques, together with ad hoc population handling methods, enable genetic algorithms to operate in a significantly reduced search space and achieve good-quality file assignments in a computationally efficient way. 相似文献

10.

Trie hashing with controlled load

Litwin W.A. Roussopoulos N. Levy G. Hong W. 《IEEE transactions on pattern analysis and machine intelligence》1991,17(7):678-691

Trie hashing (TH), a primary key access method for storing and accessing records of dynamic files, is discussed. The key address is computed through a trie. A key search usually requires only one disk access when the trie is in core and two disk accesses for very large files when the trie must be on disk. A refinement to trie hashing, trie hashing with controlled load (THCL), is presented. It is designed to control the load factor of a TH file as tightly as that of a B-tree file, allows high load factor of up to 100% for ordered insertions, and increases the load factor for random insertions from 70% to over 85%. It is shown that these properties make trie hashing preferable to a B-tree 相似文献

11.

An efficient file structure for document retrieval in the automatedoffice environment

Du D.H.-C. Ghanta S. Maly K.J. Sharrock S.M. 《Knowledge and Data Engineering, IEEE Transactions on》1989,1(2):258-273

A file system tailored to the general needs of the office environment is proposed. This system supports large numbers of a wide variety of documents and inexact fuzzy queries on the documents. The file system is based on a multilevel file structure that combines and extends multikey extendible hashing and signature files to create a document-retrieval system that is more time efficient than other previously proposed systems and is also space efficient 相似文献

12.

Adaptive File Allocation in Star Computer Network

《IEEE transactions on pattern analysis and machine intelligence》1985,(9):959-965

In this paper, we study the allocation of files in a star network. Unlike previous algorithms which assume that files are independently accessed and independently assigned, the interaction of files during the processing of queries is directly incorporated into our cost model. We present an adaptive algorithm, which is much faster than existing algorithms on file allocation, obtains solutions which are on the average only 0.1 percent away from the optimal solutions, and possesses many desirable properties such as the satisfaction of some necessary and sufficient conditions for file allocation. 相似文献

13.

Multikey, extensible hashing for relational databases

Kelley K.L. Rusinkiewicz M. 《Software, IEEE》1988,5(4):77-85

The design and implementation of a multikey, extensible hashing file addressing scheme and its application as an access method for a relational database are presented. This file organization was developed for Request, a testbed relational database-management system. It offers a viable alternative to indexed sequential files. Access operations, concurrency control, and relational operations are examined. Results of an experimental evaluation are reported 相似文献

14.

Content-dependent chunking for differential compression,the local maximum approach

Nikolaj Bjørner Andreas Blass Yuri Gurevich 《Journal of Computer and System Sciences》2010,76(3-4):154-203

When a file is to be transmitted from a sender to a recipient and when the latter already has a file somewhat similar to it, remote differential compression seeks to determine the similarities interactively so as to transmit only the part of the new file not already in the recipient's old file. Content-dependent chunking means that the sender and recipient chop their files into chunks, with the cutpoints determined by some internal features of the files, so that when segments of the two files agree (possibly in different locations within the files) the cutpoints in such segments tend to be in corresponding locations, and so the chunks agree. By exchanging hash values of the chunks, the sender and recipient can determine which chunks of the new file are absent from the old one and thus need to be transmitted.We propose two new algorithms for content-dependent chunking, and we compare their behavior, on random files, with each other and with previously used algorithms. One of our algorithms, the local maximum chunking method, has been implemented and found to work better in practice than previously used algorithms.Theoretical comparisons between the various algorithms can be based on several criteria, most of which seek to formalize the idea that chunks should be neither too small (so that hashing and sending hash values become inefficient) nor too large (so that agreements of entire chunks become unlikely). We propose a new criterion, called the slack of a chunking method, which seeks to measure how much of an interval of agreement between two files is wasted because it lies in chunks that don't agree.Finally, we show how to efficiently find the cutpoints for local maximum chunking. 相似文献

15.

Symbolic gray code as a multikey hashing function

Du HC Lee RC 《IEEE transactions on pattern analysis and machine intelligence》1980,(1):83-90

In this paper, we extend the binary Gray code to symbolic Gray code. We then show that this symbolic Gray code can be used as a multikey hashing function for storing symbolic records. The record stored at location k and the record stored at location k + 1 will be nearest neighbors if this hashing function is used. Thus, this symbolic Gray code hashing function exhibits some kind of clustering property which will group similar records together. This property is a desirable property for designing nearest neighbor searching (also called best match searching) systems. There are many other interesting properties of this hashing function. For instance, there exists an address-to-key transformation which can be used to determine the record stored at certain location k if this hashing function is used. Besides, if there are totally M records, we only have to reserve exactly M locations; there are no collisions and wasting of memory storage. At the end of this paper, it is shown that the resulting file exhibits the multiple-attribute tree structure. 相似文献

16.

Secure dynamic fragment and replica allocation in large-scale distributed file systems 总被引：1，自引：0，他引：1

Mei A. Mancini L.V. Jajodia S. 《Parallel and Distributed Systems, IEEE Transactions on》2003,14(9):885-896

We present a distributed algorithm for file allocation that guarantees high assurance, availability, and scalability in a large distributed file system. The algorithm can use replication and fragmentation schemes to allocate the files over multiple servers. The file confidentiality and integrity are preserved, even in the presence of a successful attack that compromises a subset of the file servers. The algorithm is adaptive in the sense that it changes the file allocation as the read-write patterns and the location of the clients in the network change. We formally prove that, assuming read-write patterns are stable, the algorithm converges toward an optimal file allocation, where optimality is defined as maximizing the file assurance. 相似文献

17.

Multilevel extendible hashing: a file structure for very largedatabases

Du D.H.C. Tong S.-R. 《Knowledge and Data Engineering, IEEE Transactions on》1991,3(3):357-370

A dynamic hashing scheme based on extendible hashing is proposed whose directory can grow into a multilevel directory. The scheme is compared to the extendible hashing and the extendible hashing tree schemes. The simulation results reveal that the proposed scheme is superior than the other two with respect to directory space utilization, especially for files with nonuniform record distribution. This scheme can be easily extended to multikey file systems and also has good performance 相似文献

18.

Performance Analysis of Disk Modulo Allocation Method for Cartesian Product Files

《IEEE transactions on pattern analysis and machine intelligence》1987,(9):1018-1026

Cartesian product files have been shown to exhibit attractive properties for partial match queries. The Disk Modulo (DM) allocation method is shown to have good performance on the distribution of Cartesian product files into an m-disk system. However, there was no explicit expression made before to represent the DM method's response time to a given partial match query. In this paper, based upon discrete Fourier transform, we derive one formula for such a computation. After obtaining this representation, the performance characteristics of the DM method can now be given an analytic interpretation. Some theoretical results are derived from this formula. We also use our formula to analyze the performance of several popular Disk Modulo algorithms. 相似文献

19.

Linear spiral hashing for expansible files

Ye-In Chang Chien-I Lee Wann-Bay ChangLiaw 《Knowledge and Data Engineering, IEEE Transactions on》1999,11(6):969-984

The goal of dynamic hashing is to design a function and a file structure that allow the address space allocated to the file to be increased and reduced without reorganizing the whole file. We propose a new scheme for dynamic hashing in which the growth of a file occurs at a rate of n+k/n per full expansion, where n is the number of pages of the file and k is a given integer constant which is smaller than n, as compared to a rate of two in linear hashing. Like linear hashing, the proposed scheme (called linear spiral hashing) requires no index; however, the proposed scheme may or may not add one more physical page, instead of always adding one more page in linear hashing, when a split occurs. Therefore, linear spiral hashing can maintain a more stable performance through the file expansions and have much better storage utilization than linear hashing. From our performance analysis, linear spiral hashing can achieve nearly 97 percent storage utilization as compared to 78 percent storage utilization by using linear hashing, which is also verified by a simulation study 相似文献

20.

The file allocation problem under dynamic usage

M. Hatzopoulos J.G. Kollias 《Information Systems》1980,5(3):197-201

The file allocation problem considers a file and a fully connected network having n nodes. The problem assumes that the overall file usage over a unit time period is known and it asks for the optimal set of network sites at which to locate copies of the file. This paper considers the same problem but it assumes that the behavior of the user access patterns changes over v planning periods in a manner, known in advance. A model is presented which shows that there are (2ⁿ ? 1)^v possible file allocations. To assist the searching of this large solution space four theorems are presented which are subsequently utilized to analyze the problem and to solve an example case. 相似文献