期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Blocking in Parallel Multisearch Problems

W. Dittrich D. A. Hutchinson A. Maheshwari 《Theory of Computing Systems》2001,34(2):145-189

External memory (EM) algorithms are designed for computational problems in which the size of the internal memory of the computer is only a small fraction of the problem size. Blockwise access to data is a central theme in the design of efficient EM algorithms. A similar requirement arises in the transmission of data between processors in high performance parallel computation systems, for which blockwise communication is a crucial issue. {We consider multisearch problems, where a large number of queries are to be simultaneously processed and satisfied by navigating through large data structures on parallel computers. Our examples originate as algorithms for parallel machines, and we adapt them to the EM situation where the queries and data structure are considered to be much larger than the size of the available internal memory. } This paper presents techniques to achieve blocking for I/ O as well as for communication in multisearch on the BSP and EM-BSP models. We describe improvements to the 1-optimal BSP* multisearch algorithm of [8] which permit larger search trees to be handled. In the area of EM algorithms new algorithms for multisearch in balanced trees are described. For search trees of size O(nlog n) where n is the number of queries, we obtain a work-optimal, parallel, randomized EM multisearch algorithm whose I/ O and communication time are smaller, asymptotically, than the computation time. We obtain a deterministic version via a similar technique. These algorithms are obtained via the simulation techniques of [12], [17], [13], [14], and [24]. For larger trees we describe a parallel, EM algorithm which is 1-optimal considering computation, communication, and I/ O (for suitable parameter constellations) plus I/ O-optimal. We give a lower bound to the number of I/ O operations required for filtering n queries through a binary or multiway search tree of size m when m\geq n ^2+ε , for a constant ε>0 . Online publication February 20, 2001. 相似文献

2.

Efficient External Memory Algorithms by Simulating Coarse-Grained Parallel Algorithms

Dehne Dittrich Hutchinson 《Algorithmica》2008,36(2):97-122

Abstract. External memory (EM) algorithms are designed for large-scale computational problems in which the size of the internal memory of the computer is only a small fraction of the problem size. Typical EM algorithms are specially crafted for the EM situation. In the past, several attempts have been made to relate the large body of work on parallel algorithms to EM, but with limited success. The combination of EM computing, on multiple disks, with multiprocessor parallelism has been posted as a challenge by the ACM Working Group on Storage I/ O for Large-Scale Computing. In this paper we provide a simulation technique which produces efficient parallel EM algorithms from efficient BSP-like parallel algorithms. The techniques obtained can accommodate one or multiple processors on the EM target machine, each with one or more disks, and they also adapt to the disk blocking factor of the target machine. When applied to existing BSP-like algorithms, our simulation technique produces improved parallel EM algorithms for a large number of problems. 相似文献

3.

Fast Concurrent Access to Parallel Disks

Sanders Egner Korst 《Algorithmica》2003,35(1):21-55

Abstract. High performance applications involving large data sets require the efficient and flexible use of multiple disks. In an external memory machine with D parallel, independent disks, only one block can be accessed on each disk in one I/ O step. This restriction leads to a load balancing problem that is perhaps the main inhibitor for the efficient adaptation of single-disk external memory algorithms to multiple disks. We solve this problem for arbitrary access patterns by randomly mapping blocks of a logical address space to the disks. We show that a shared buffer of O (D) blocks suffices to support efficient writing. The analysis uses the properties of negative association to handle dependencies between the random variables involved. This approach might be of independent interest for probabilistic analysis in general. If two randomly allocated copies of each block exist, N arbitrary blocks can be read within

I/ O steps with high probability. The redundancy can be further reduced from 2 to 1+1/r for any integer r without a big impact on reading efficiency. From the point of view of external memory models, these results rehabilitate Aggarwal and Vitter's ``single-disk multi-head' model [1] that allows access to D arbitrary blocks in each I/ O step. This powerful model can be emulated on the physically more realistic independent disk model [2] with small constant overhead factors. Parallel disk external memory algorithms can therefore be developed in the multi-head model first. The emulation result can then be applied directly or further refinements can be added. 相似文献

4.

Parallel Algorithms for Partitioning Sorted Sets and Related Problems

D. Z. Chen W. Chen K. Wada K. Kawaguchi 《Algorithmica》2000,28(2):217-241

We consider the following partition problem: Given a set S of n elements that is organized as k sorted subsets of size n/k each and given a parameter h with 1/k ≤ h ≤ n/k , partition S into g = O(n/(hk)) subsets D ₁ , D ₂ , . . . , D _g of size Θ(hk) each, such that, for any two indices i and j with 1 ≤ i < j ≤ g , no element in D _i is bigger than any element in D _j . Note that with various combinations of the values of parameters h and k , several fundamental problems, such as merging, sorting, and finding an approximate median, can be formulated as or be reduced to this partition problem. The partition problem also finds many applications in solving problems of parallel computing and computational geometry. In this paper we present efficient parallel algorithms for solving the partition problem and a number of its applications. Our parallel partition algorithm runs in O( log n) time using processors in the EREW PRAM model. The complexity bounds of our parallel partition algorithm on the respective special cases match those of the optimal EREW PRAM algorithms for merging, sorting, and finding an approximate median. Using our parallel partition algorithm, we are also able to obtain better complexity bounds (even possibly on a weaker parallel model) than the previously best known parallel algorithms for several important problems, including parallel multiselection, parallel multiranking, and parallel sorting of k sorted subsets. Received May 5, 1996; revised July 30, 1998. 相似文献

5.

Optimal Algorithms for Finding User Access Sessions from Very Large Web Logs 总被引：1，自引：0，他引：1

Chen Zhixiang Fu Ada Wai-Chee Tong Frank Chi-Hung 《World Wide Web》2003,6(3):259-279

Although efficient identification of user access sessions from very large web logs is an unavoidable data preparation task for the success of higher level web log mining, little attention has been paid to algorithmic study of this problem. In this paper we consider two types of user access sessions, interval sessions and gap sessions. We design two efficient algorithms for finding respectively those two types of sessions with the help of some proposed structures. We present theoretical analysis of the algorithms and prove that both algorithms have optimal time complexity and certain error-tolerant properties as well. We conduct empirical performance analysis of the algorithms with web logs ranging from 100 megabytes to 500 megabytes. The empirical analysis shows that the algorithms just take several seconds more than the baseline time, i.e., the time needed for reading the web log once sequentially from disk to RAM, testing whether each user access record is valid or not, and writing each valid user access record back to disk. The empirical analysis also shows that our algorithms are substantially faster than the sorting based session finding algorithms. Finally, optimal algorithms for finding user access sessions from distributed web logs are also presented. 相似文献

6.

Efficient Parallel Algorithms for Planar st -Graphs

Atallah Chen Daescu 《Algorithmica》2008,35(3):194-215

Abstract. Planar st -graphs find applications in a number of areas. In this paper we present efficient parallel algorithms for solving several fundamental problems on planar st -graphs. The problems we consider include all-pairs shortest paths in weighted planar st -graphs, single-source shortest paths in weighted planar layered digraphs (which can be reduced to single-source shortest paths in certain special planar st -graphs), and depth-first search in planar st -graphs. Our parallel shortest path techniques exploit the specific geometric and graphic structures of planar st -graphs, and involve schemes for partitioning planar st -graphs into subgraphs in a way that ensures that the resulting path length matrices have a monotonicity property [1], [2]. The parallel algorithms we obtain are a considerable improvement over the previously best known solutions (when they are applied to these st -graph problems), and are in fact relatively simple. The parallel computational models we use are the CREW PRAM and EREW PRAM. 相似文献

7.

A Simple and Efficient Parallel Disk Mergesort

R. D. Barve J. S. Vitter 《Theory of Computing Systems》2002,35(2):189-215

External sorting—the process of sorting a file that is too large to fit into the computer's internal memory and must be stored externally on disks—is a fundamental subroutine in database systems[G], [IBM]. Of prime importance are techniques that use multiple disks in parallel in order to speed up the performance of external sorting. The simple randomized merging (SRM ) mergesort algorithm proposed by Barve et al. [BGV] is the first parallel disk sorting algorithm that requires a provably optimal number of passes and that is fast in practice. Knuth [K,Section 5.4.9] recently identified SRM (which he calls ``randomized striping') as the method of choice for sorting with parallel disks. In this paper we present an efficient implementation of SRM, based upon novel and elegant data structures. We give a new implementation for SRM's lookahead forecasting technique for parallel prefetching and its forecast and flush technique for buffer management. Our techniques amount to a significant improvement in the way SRM carries out the parallel, independent disk accesses necessary to read blocks of input runs efficiently during external merging. Our implementation is based on synchronous parallel I/O primitives provided by the TPIE programming environment[TPI]; whenever our program issues an I/O read (write) operation, one block of data is synchronously read from (written to) each disk in parallel. We compare the performance of SRM over a wide range of input sizes with that of disk-striped mergesort (DSM ), which is widely used in practice. DSM consists of a standard mergesort in conjunction with striped I/O for parallel disk access. SRM merges together significantly more runs at a time compared with DSM, and thus it requires fewer merge passes. We demonstrate in practical scenarios that even though the streaming speeds for merging with DSM are a little higher than those for SRM (since DSM merges fewer runs at a time), sorting using SRM is often significantly faster than with DSM (since SRM requires fewer passes). The techniques in this paper can be generalized to meet the load-balancing requirements of other applications using parallel disks, including distribution sort and multiway partitioning of a file into several other files. Since both parallel disk merging and multimedia processing deal with streams that get ``consumed' at nonuniform and partially predictable rates, our techniques for lookahead based upon forecasting data may have relevance in video server applications. Received June 28, 2000, and in revised form June 5, 2001. Online publication April 8, 2002. 相似文献

8.

Efficient External Memory Algorithms by Simulating Coarse-Grained Parallel Algorithms

Dehne Dittrich Hutchinson 《Algorithmica》2003,36(2):97-122

External memory (EM) algorithms are designed for large-scale computational problems in which the size of the internal memory of the computer is only a small fraction of the problem size. Typical EM algorithms are specially crafted for the EM situation. In the past, several attempts have been made to relate the large body of work on parallel algorithms to EM, but with limited success. The combination of EM computing, on multiple disks, with multiprocessor parallelism has been posted as a challenge by the ACM Working Group on Storage I/ O for Large-Scale Computing. In this paper we provide a simulation technique which produces efficient parallel EM algorithms from efficient BSP-like parallel algorithms. The techniques obtained can accommodate one or multiple processors on the EM target machine, each with one or more disks, and they also adapt to the disk blocking factor of the target machine. When applied to existing BSP-like algorithms, our simulation technique produces improved parallel EM algorithms for a large number of problems. 相似文献

9.

PRAM programming: in theory and in practice

D. S. Lecomber C. J. Siniolakis K. R. Sujithan 《Concurrency and Computation》2000,12(4):211-226

That the influence of the PRAM model is ubiquitous in parallel algorithm design is as clear as the fact that it is technologically infeasible for the forseeable future. The current generation of parallel hardware prominently features distributed memory and high‐performance interconnection networks—very much the antithesis of the shared memory required for the PRAM model. It has been shown that, in spite of communication costs, for some problems very fast parallel algorithms are available for distributed‐memory machines—from embarassingly parallel problems to sorting and numerical analysis. In contrast it is known that for other classes of problem PRAM‐style shared‐memory simulation on a distributed‐memory machine can, in theory, produce solutions of comparable performance to the best possible for such architectures. The Bulk Synchronous Parallel (BSP) model accurately represents most parallel machines—theoretical and actual—in an execution and cost model. We introduce a scalable portable PRAM realization appropriate for BSP computers and a methodology for usage. Our system is fast and built upon the familiar sequential C++ coupled with the new standard BSP library of parallel computation and communication primitives. It is portable to and predictable on a vast number of parallel computers including workstation clusters, a 256‐processor Cray T3D, an 8‐node IBM SP/2 and a 4‐node shared‐memory SGI Power Challenge machine. Our approach achieves simplicity of programming over direct‐mode BSP programming for reasonable overhead cost. We objectively compare optimized BSP and PRAM algorithms implemented with our C++ PRAM library and provide encouraging experimental results for our new style of programming. Copyright © 2000 John Wiley & Sons, Ltd. 相似文献

10.

Optimal Parallel Randomized Algorithms for the Voronoi Diagram of Line Segments in the Plane

Rajasekaran Ramaswami 《Algorithmica》2002,33(4):436-460

Abstract. We present an optimal parallel randomized algorithm for the Voronoi diagram of a set of n nonintersecting (except possibly at endpoints) line segments in the plane. Our algorithm runs in O(log n) time with high probability using O(n) processors on a CRCW PRAM. This algorithm is optimal in terms of work done since the sequential time bound for this problem is Ω(n log n) . Our algorithm improves by an O(log n) factor the previously best known deterministic parallel algorithm, given by Goodrich, ó'Dúnlaing, and Yap, which runs in O( log ² n) time using O(n) processors. We obtain this result by using a new ``two-stage' random sampling technique. By choosing large samples in the first stage of the algorithm, we avoid the hurdle of problem-size ``blow-up' that is typical in recursive parallel geometric algorithms. We combine the two-stage sampling technique with efficient search and merge procedures to obtain an optimal algorithm. This technique gives an alternative optimal algorithm for the Voronoi diagram of points as well (all other optimal parallel algorithms for this problem use the transformation to three-dimensional half-space intersection). 相似文献

11.

Escaping a Grid by Edge-Disjoint Paths

Wun-Tat Chan Francis Y. L. Chin Hing-Fung Ting 《Algorithmica》2008,36(4):343-359

Abstract. We present a technique for designing external memory data structures that support batched operations I/ O efficiently. We show how the technique can be used to develop external versions of a search tree, a priority queue, and a segment tree, and give examples of how these structures can be used to develop I/ O-efficient algorithms. The developed algorithms are either extremely simple or straightforward generalizations of known internal memory algorithms—given the developed external data structures. 相似文献

12.

New Algorithms for Disk Scheduling

Andrews Bender Zhang 《Algorithmica》2008,32(2):277-301

Abstract. Processor speed and memory capacity are increasing several times faster than disk speed. This disparity suggests that disk I/ O performance could become an important bottleneck. Methods are needed for using disks more efficiently. Past analysis of disk scheduling algorithms has largely been experimental and little attempt has been made to develop algorithms with provable performance guarantees. We consider the following disk scheduling problem. Given a set of requests on a computer disk and a convex reachability function that determines how fast the disk head travels between tracks, our goal is to schedule the disk head so that it services all the requests in the shortest time possible. We present a 3/2 -approximation algorithm (with a constant additive term). For the special case in which the reachability function is linear we present an optimal polynomial-time solution. The disk scheduling problem is related to the special case of the Asymmetric Traveling Salesman Problem with the triangle inequality (ATSP-Δ ) in which all distances are either 0 or some constant α . We show how to find the optimal tour in polynomial time and describe how this gives another approximation algorithm for the disk scheduling problem. Finally we consider the on-line version of the problem in which uniformly distributed requests arrive over time. We present an algorithm related to the above ATSP-Δ . 相似文献

13.

Overlapping Computations, Communications and I/O in Parallel Sorting

Clement M. J. Quinn M. J. 《Journal of Parallel and Distributed Computing》1995,28(2)

In this paper, we present a new parallel sorting algorithm which maximizes the overlap between the disk, network, and CPU subsystems of a processing node. This algorithm is shown to be of similar complexity to known efficient sorting algorithms. The pipelining effect exploited by our algorithm should lead to higher levels of performance on distributed memory parallel processors. In order to achieve the best results using this strategy, the CPU, network, and disk operations must take comparable time. We suggest acceptable levels of system balance for sorting machines and analyze the performance of the sorting algorithm as system parameters vary. 相似文献

14.

A Theoretical and Experimental Study on the Construction of Suffix Arrays in External Memory

Crauser Ferragina 《Algorithmica》2008,32(1):1-35

Abstract. The construction of full-text indexes on very large text collections is nowadays a hot problem. The suffix array [32] is one of the most attractive full-text indexing data structures due to its simplicity, space efficiency and powerful/ fast search operations supported. In this paper we analyze, both theoretically and experimentally, the I/ O complexity and the working space of six algorithms for constructing large suffix arrays. Three of them are state-of-the-art, the other three algorithms are our new proposals. We perform a set of experiments based on three different data sets (English texts, amino-acid sequences and random texts) and give a precise hierarchy of these algorithms according to their working-space versus construction-time tradeoff. Given the current trends in model design [12], [32] and disk technology [29], [30], we pose particular attention to differentiate between ``random' and ``contiguous' disk accesses, in order to explain reasonably some practical I/ O phenomena which are related to the experimental behavior of these algorithms and that would otherwise be meaningless in the light of other simpler external-memory models. We also address two other issues. The former is concerned with the problem of building word indexes; we show that our results can be successfully applied to this case too, without any loss in efficiency and without compromising the simplicity of programming to achieve a uniform, simple and efficient approach to both the two indexing models. The latter issue is related to the intriguing and apparently counterintuitive ``contradiction' between the effective practical performance of the well-known Baeza-Yates—Gonnet—Snider algorithm [17], verified in our experiments, and its unappealing worst-case behavior. We devise a new external-memory algorithm that follows the basic philosophy underlying that algorithm but in a significantly different manner, thus resulting in a novel approach which combines good worst-case bounds with efficient practical performance. 相似文献

15.

A Framework for Simple Sorting Algorithms on Parallel Disk Systems

S. Rajasekaran 《Theory of Computing Systems》2001,34(2):101-114

In this paper we present a simple parallel sorting algorithm and illustrate its application in general sorting, disk sorting, and hypercube sorting. The algorithm (called the (l,m) -mergesort (LMM)) is an extension of the bitonic and odd—even mergesorts. Literature on parallel sorting is abundant. Many of the algorithms proposed, though being theoretically important, may not perform satisfactorily in practice owing to large constants in their time bounds. The algorithm presented in this paper has the potential of being practical. We present an application to the parallel disk sorting problem. The algorithm is asymptotically optimal (assuming that N is a polynomial in M , where N is the number of records to be sorted and M is the internal memory size). The underlying constant is very small. This algorithm performs better than the disk-striped mergesort (DSM) algorithm when the number of disks is large. Our implementation is as simple as that of DSM (requiring no fancy data structures or prefetch techniques.) As a second application, we prove that we can get a sparse enumeration sort on the hypercube that is simpler than that of the classical algorithm of Nassimi and Sahni [16]. We also show that Leighton's columnsort algorithm is a special case of LMM. Online publication December 26, 2000. 相似文献

16.

Tree-Based Parallel Algorithm Design

G. L. Miller S. -H. Teng 《Algorithmica》1997,19(4):369-389

In this paper a systematic method for the design of efficient parallel algorithms for the dynamic evaluation of computation trees and/or expressions is presented. This method involves the use of uniform closure properties of certain classes of unary functions. Using this method, optimal parallel algorithms are given for many computation tree problems which are important in parallel algebraic and numerical computation, and parallel code generation on exclusive read and exclusive write parallel random access machines. Our algorithmic result is complemented by a P-complete tree problem. Received February 13, 1995; revised March 25, 1996. 相似文献

17.

BSPlib: The BSP programming library 总被引：1，自引：0，他引：1

Jonathan M.D. Hill Bill McColl Dan C. Stefanescu Mark W. Goudreau Kevin Lang Satish B. Rao Torsten Suel Thanasis Tsantilas Rob H. Bisseling 《Parallel Computing》1998,24(14):1947-1980

BSPlib is a small communications library for bulk synchronous parallel (BSP) programming which consists of only 20 basic operations. This paper presents the full definition of BSPlib in C, motivates the design of its basic operations, and gives examples of their use. The library enables programming in two distinct styles: direct remote memory access (DRMA) using put or get operations, and bulk synchronous message passing (BSMP). Currently, implementations of BSPlib exist for a variety of modern architectures, including massively parallel computers with distributed memory, shared memory multiprocessors, and networks of workstations. BSPlib has been used in several scientific and industrial applications; this paper briefly describes applications in benchmarking, Fast Fourier Transforms (FFTs), sorting, and molecular dynamics. 相似文献

18.

On Certificates and Lookahead in Dynamic Graph Problems

S. Khanna R. Motwani R. H. Wilson 《Algorithmica》1998,21(4):377-394

Recent work in dynamic graph algorithms has led to efficient algorithms for dynamic undirected graph problems such as connectivity. However, no efficient deterministic algorithms are known for the dynamic versions of fundamental directed graph problems like strong connectivity and transitive closure, as well as some undirected graph problems such as maximum matchings and cuts. We provide some explanation for this lack of success by presenting quadratic lower bounds on the certificate complexity of the seemingly difficult problems, in contrast to the known linear certificate complexity for the problems which have efficient dynamic algorithms. In many applications of dynamic (di)graph problems, a certain form of lookahead is available. Specifically, we consider the problems of assembly planning in robotics and the maintenance of relations in databases. These give rise to dynamic strong connectivity and dynamic transitive closure problems, respectively. We explain why it is reasonable, and indeed natural and desirable, to assume that lookahead is available in these two applications. Exploiting lookahead to circumvent their inherent complexity, we obtain efficient dynamic algorithms for strong connectivity and transitive closure. Received August 11, 1995; revised August 23, 1996. 相似文献

19.

Binary Searching with Nonuniform Costs and Its Application to Text Retrieval

G. Navarro R. Baeza-Yates E. F. Barbosa N. Ziviani W. Cunto 《Algorithmica》2000,27(2):145-169

We study the problem of minimizing the expected cost of binary searching for data where the access cost is not fixed and depends on the last accessed element, such as data stored in magnetic or optical disk. We present an optimal algorithm for this problem that finds the optimal search strategy in O(n ³ ) time, which is the same time complexity of the simpler classical problem of fixed costs. Next, we present two practical linear expected time algorithms, under the assumption that the access cost of an element is independent of its physical position. Both practical algorithms are online, that is, they find the next element to access as the search proceeds. The first one is an approximate algorithm which minimizes the access cost disregarding the goodness of the problem partitioning. The second one is a heuristic algorithm, whose quality depends on its ability to estimate the final search cost, and therefore it can be tuned by recording statistics of previous runs. We present an application for our algorithms related to text retrieval. When a text collection is large it demands specialized indexing techniques for efficient access. One important type of index is the suffix array, where data access is provided through an indirect binary search on the text stored in magnetic disk or optical disk. Under this cost model we prove that the optimal algorithm cannot perform better than Ω(1/ log n) times the standard binary search. We also prove that the approximate strategy cannot, on average, perform worse than 39% over the optimal one. We confirm the analytical results with simulations, showing improvements between 34% (optimal) and 60% (online) over standard binary search for both magnetic and optical disks. Received February 13, 1997; revised May 27, 1998. 相似文献

20.

Fast Four‐Way Parallel Radix Sorting on GPUs

Linh Ha Jens Krüger Cláudio T. Silva 《Computer Graphics Forum》2009,28(8):2368-2378

Efficient sorting is a key requirement for many computer science algorithms. Acceleration of existing techniques as well as developing new sorting approaches is crucial for many real‐time graphics scenarios, database systems, and numerical simulations to name just a few. It is one of the most fundamental operations to organize and filter the ever growing massive amounts of data gathered on a daily basis. While optimal sorting models for serial execution on a single processor exist, efficient parallel sorting remains a challenge. In this paper, we present a hardware‐optimized parallel implementation of the radix sort algorithm that results in a significant speed up over existing sorting implementations. We outperform all known General Processing Unit (GPU) based sorting systems by about a factor of two and eliminate restrictions on the sorting key space. This makes our algorithm not only the fastest, but also the first general GPU sorting solution. 相似文献