首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Randomized algorithms are algorithms that employ randomness in their solution method. We show that the performance of randomized algorithms is less affected by factors that prevent most parallel deterministic algorithms from attaining their theoretical speedup bounds. A major reason is that the mapping of randomized algorithms onto multiprocessors involves very little scheduling or communication overhead. Furthermore, reliability is enhanced because the failure of a single processor leads only to degradation, not failure, of the algorithm. We present results of an extensive simulation done on a multiprocessor simulator, running a randomized branch-and-bound algorithm. The particular case we consider is the knapsack problem, due to its ease of formulation. We observe the largest speedups in precisely those problems that take large amounts of time to solve. This work has been supported by the U.S. Army Research Office under Contract No. DAAG 29-85-K-0236.  相似文献   

2.
We propose a model of parallel computation, the YPRAM, that allows general parallel algorithms to be designed for a wide class of parallel models. The basic model captures locality among processors, which is measured as a function of two parameters; latency and bandwidth.

We design YPRAM algorithms for solving several fundamental problems: parallel prefix, sorting, sorting numbers from a bounded range, and list ranking. We show that our model predicts, reasonably accurately, the actual known performances of several basic parallel models — PRAM, hypercube, mesh and tree — when solving these problems.  相似文献   


3.
The backtrack search problem involves visiting all the nodes of an arbitrary binary tree given a pointer to its root subject to the constraint that the children of a node are revealed only after their parent is visited. We present a fast, deterministic backtrack search algorithm for a p-processor COMMON CRCW-PRAM, which visits any n-node tree of height h in time O((n/p+h)(logloglogp)2). This upper bound compares favourably with a natural Ω(n/p+h) lower bound for this problem. Our approach embodies novel, efficient techniques for dynamically assigning tree-nodes to processors to ensure that the work is shared equitably among them.  相似文献   

4.
A. Chin 《Algorithmica》1994,12(2-3):170-181
Consider the problem of efficiently simulating the shared-memory parallel random access machine (PRAM) model on massively parallel architectures with physically distributed memory. To prevent network congestion and memory bank contention, it may be advantageous to hash the shared memory address space. The decision on whether or not to use hashing depends on (1) the communication latency in the network and (2) the locality of memory accesses in the algorithm.We relate this decision directly to algorithmic issues by studying the complexity of hashing in the Block PRAM model of Aggarwal, Chandra, and Snir, a shared-memory model of parallel computation which accounts for communication locality. For this model, we exhibit a universal family of hash functions having optimal locality. The complexity of applying these hash functions to the shared address space of the Block PRAM (i.e., by permuting data elements) is asymptotically equivalent to the complexity of performing a square matrix transpose, and this result is best possible for all pairwise independent universal hash families. These complexity bounds provide theoretical evidence that hashing and randomized routing need not destroy communication locality, addressing an open question of Valiant.This work was started when the author was a student at Oxford University, supported by a National Science Foundation Graduate Fellowship and a Rhodes Scholarship. Any opinions, findings, conclusions, or recommendations expressed in this publication are those of the author and do not necessarily reflect the views of the National Science Foundation or the Rhodes Trust.  相似文献   

5.
In this paper we give a parallel algorithm for line-segment intersection reporting in the plane. It runs in timeO(((n +k) logn log logn)/p) usingp processors on a concurrent-read-exclusive-write (CREW)-PRAM, wheren is the number of line segments,k is the number of intersections, andp n +k.This work was supported by the DFG, SFB 124, TP B2, VLSI Entwurfsmethoden und Parallelität.  相似文献   

6.
    
Interval orders are partial orders defined by having interval representations. It is well known that a transitively oriented digraph G is an interval order iff its (undirected) complement G¯ is chordal. We investigate parallel algorithms Tot the following scheduling problem: Given a system consisting of a set T of n tasks (each requiring unit execution time) and an interval order ≺ over T, and given m identical parallel processors, construct an optimal (i.e., minimal length) schedule for (T, ≺).

Our algorithm is based on a subroutine for computing so-called scheduling distances, i.e., the minimal number of time steps needed to schedule all those tasks succeeding some given task t and preceding some other task t 1. For a given interval order with n tasks, these scheduling distances can be computed using n 3 processors and O(log2n) time on a CREW-PRAM. We then give an incremental version of the scheduling distance algorithm, which can be used to compute the empty slots in an optimal schedule. From these, we derive the optimal schedule, using no more resources than for the initial scheduling distance computation and considerably improving on previous work by Sunder and He.

The algorithm can also be extended to handle task systems which, in addition to interval order precedence constraints, have individual deadlines and/or release times for the tasks. Our algorithm is the first NC-algorithm for this problem. As another application, it also provides NC-algorithms for some graph problems on interval graphs (which are NP-complete in general).  相似文献   

7.
    
We present a cost-optimal parallel algorithm for the maximum matching problem on bipartite permutation graphs on an EREW PRAM. Previously, Chen and Yesha have dealt with this problem. Their solution relies on Dekel and Sahni's matching algorithm for convex bipartite graphs, which runs in O(log2 n) time using O(n) processors. Given a permutation diagram, our algorithm runs in O(log n) time by using O(n/log n) processors. Our method starts with an easily understood greedy algorithm. We define a nontrivial binary operation which is associative and equivalent to the greedy algorithm. Thus parallel prefix can be applied to the problem.  相似文献   

8.
A major reason for the lack of practical use of parallel computers has been the absence of a suitable model of parallel computation. Many existing models are either theoretical or are tied to a particular architecture. A more general model must be architecture independent, must realistically reflect execution costs, and must reduce the cognitive overhead of managing massive parallelism. A growing number of models meeting some of these goals have been suggested. We discuss their properties and relative strengths and weaknesses. We conclude that data parallelism is a style with much to commend it, and discuss the Bird-Meertens formalism as a coherent approach to data parallel programming.This work was supported by the Natural Sciences and Engineering Research Council of Canada.  相似文献   

9.
We consider the problem of sorting n integers when the elements are drawn from the restricted domain [1...n]. A new deterministic parallel algorithm for sorting n integers is obtained. Its running time is O(lognlog(n/logn)) using n/logn processors on EREW (exclusive read exclusive write) PRAM (parallel random access machine). Also, our algorithm was modified to become optimal when we use processors. This algorithm belongs to class EP (Efficient, Polynomial fast).  相似文献   

10.
We present a randomized EREW PRAM algorithm to find a minimum spanning forest in a weighted undirected graph. On an n -vertex graph the algorithm runs in o(( log n) 1+ ɛ ) expected time for any ɛ >0 and performs linear expected work. This is the first linear-work, polylog-time algorithm on the EREW PRAM for this problem. This also gives parallel algorithms that perform expected linear work on two general-purpose models of parallel computation—the QSM and the BSP.  相似文献   

11.
12.
Fork95 is an imperative parallel programming language intended to express algorithms for synchronous shared memory machines (PRAMs). It is based on ANSI C and offers additional constructs to hierarchically divide processor groups into subgroups and manage shared and private address subspaces. Fork95 makes the assembly-level synchronicity of the underlying hardware available to the programmer at the language level. Nevertheless, it supports locally asynchronous computation where desired by the programmer. We present a one pass compiler, fcc, which compiles Fork95 and C programs to the SB-PRAM machine. The SB-PRAM is a lock-step synchronous, massively parallel multiprocessor currently being built at Saarbrücken University, with a physically shared memory and uniform memory access time. We examine three important types of parallel computation frequently used for the parallel solution of real-world problems. While farming and parallel divide-and-conquer are directly supported by Fork95 language constructs, pipelining can be easily expressed using existing language features; an additional language construct for pipelining is not required.  相似文献   

13.
Parallel updates of minimum spanning trees (MSTs) have been studied in the past. These updates allowed a single change in the underlying graph, such as a change in the cost of an edge or an insertion of a new vertex. Multiple update problems for MSTs are concerned with handling more than one such change. In the sequential case multiple update problems may be solved using repeated applications of an efficient algorithm for a single update. However, for efficiency reasons, parallel algorithms for multiple update problems must consider all changes to the underlying graph simultaneously. In this paper we describe parallel algorithms for updating an MST whenk new vertices are inserted or deleted in the underlying graph, when the costs ofk edges are changed, or whenk edge insertions and deletions are performed. For multiple vertex insertion update, our algorithm achieves time and processor bounds ofO(log n·logk) and nk/(logn·logk), respectively, on a CREW parallel random access machine. These bounds are optimal for dense graphs. A novel feature of this algorithm is a transformation of the previous MST andk new vertices to a bipartite graph which enables us to obtain the above-mentioned bounds.  相似文献   

14.
In this paper we discuss methods for predicting the performance of any formulation of randomized parallel search, and propose a new performance prediction method that is based on obtaining an accurate estimate of the k-processor run-time distribution. We show that the k-processor prediction method delivers accurate performance predictions and demonstrate the validity of our analysis on several robot motion planning problems.  相似文献   

15.
The vertex updating problem for a minimum spanning tree (MST) is defined as follows: Given a graphG=(V, E G) and an MSTT forG, find a new MST forG to which a new vertexz has been added along with weighted edges that connectz with the vertices ofG. We present a set of rules that produce simple optimal parallel algorithms that run inO(lgn) time usingn/lgn EREW PRAM processors, wherenV¦. These algorithms employ any valid tree-contraction schedule that can be produced within the stated resource bounds. These rules can also be used to derive simple linear-time sequential algorithms for the same problem. The previously best-known parallel result was a rather complicated algorithm that usedn processors in the more powerful CREW PRAM model. Furthermore, we show how our solution can be used to solve the multiple vertex updating problem: Update a given MST whenk new vertices are introduced simultaneously. This problem is solved inO(lgk·lgn) parallel time using (k·n)/(lgk·lgn) EREW PRAM processors. This is optimal for graphs having (kn) edges.Part of this work was done while P. Metaxas was with the Department of Mathematics and Computer Science, Dartmouth College.  相似文献   

16.
Spatial regularity amidst a seemingly chaotic image is often meaningful. Many papers in computational geometry are concerned with detecting some type of regularity via exact solutions to problems in geometric pattern recognition. However, real-world applications often have data that is approximate, and may rely on calculations that are approximate. Thus, it is useful to develop solutions that have an error tolerance.

A solution has recently been presented by Robins et al. [Inform. Process. Lett. 69 (1999) 189–195] to the problem of finding all maximal subsets of an input set in the Euclidean plane that are approximately equally-spaced and approximately collinear. This is a problem that arises in computer vision, military applications, and other areas. The algorithm of Robins et al. is different in several important respects from the optimal algorithm given by Kahng and Robins [Patter Recognition Lett. 12 (1991) 757–764] for the exact version of the problem. The algorithm of Robins et al. seems inherently sequential and runs in O(n5/2) time, where n is the size of the input set. In this paper, we give parallel solutions to this problem.  相似文献   


17.
S. Sunder  Xin He 《Algorithmica》1996,16(3):243-262
We present a parallel algorithm for solving the minimum weighted completion time scheduling problem for transitive series parallel graphs. The algorithm takesO(log2 n) time withO(n 3) processors on a CREW PRAM, wheren is the number of vertices of the input graph. This is the first NC algorithm for solving the problem.Research supported in part by NSF Grants CCR-9011214 and CCR-9205982.  相似文献   

18.
Randomized incremental construction of Delaunay and Voronoi diagrams   总被引:8,自引:0,他引:8  
In this paper we give a new randomized incremental algorithm for the construction of planar Voronoi diagrams and Delaunay triangulations. The new algorithm is more on-line than earlier similar methods, takes expected timeO(ngn) and spaceO(n), and is eminently practical to implement. The analysis of the algorithm is also interesting in its own right and can serve as a model for many similar questions in both two and three dimensions. Finally we demonstrate how this approach for constructing Voronoi diagrams obviates the need for building a separate point-location structure for nearest-neighbor queries.Leonidas Guibas and Micha Sharir wish to acknowledge the generous support of the DEC Systems Research Center in Palo Alto, California, where some of this work was carried out. Donald Knuth has been supported by NSF Grant CCR-86-10181. Micha Sharir has been supported by NSF Grant CCR-89-01484, ONR Grant N00014-K-87-0129, the U.S.-Israeli Binational Science Foundation, and the Fund for Basic Research administered by the Israeli Academy of Sciences.  相似文献   

19.
The Dandelion-like codes are eight bijections between labeled trees and strings of node labels. The literature contains optimal sequential algorithms for these bijections, but no parallel algorithms have been reported. In this paper the first parallel encoding and decoding algorithms for Dandelion-like codes are presented. Namely, a unique encoding algorithm and a unique decoding algorithm, which when properly parameterized, can be used for all Dandelion-like codes, are designed. These algorithms are optimal in the sequential setting. The encoding algorithm implementation on an EREW PRAM is optimal, while the efficient implementation of the decoding algorithm requires concurrent reading.  相似文献   

20.
A new general parallel algorithmic technique for computations on trees is presented. In particular, it provides the firstn/logn processor,O(logn)-time deterministic EREW PRAM algorithm for expression tree evaluation. The technique solves many other tree problems within the same complexity bounds.Richard Cole was supported in part by NSF Grants DCR-84-01633 and CCR-8702271, ONR Grant N00014-85-K-0046 and by an IBM faculty development award. Uzi Vishkin was supported in part by NSF Grants NSF-CCR-8615337 and NSF-DCR-8413359, ONR Grant N00014-85-K-0046, by the Applied Mathematical Science subprogram of the office of Energy Research, U.S. Department of Energy under Contract DE-AC02-76ER03077 and the Foundation for Research in Electronics, Computers and Communication, administered by the Israeli Academy of Sciences and Humanities.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号