首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Randomized algorithms are algorithms that employ randomness in their solution method. We show that the performance of randomized algorithms is less affected by factors that prevent most parallel deterministic algorithms from attaining their theoretical speedup bounds. A major reason is that the mapping of randomized algorithms onto multiprocessors involves very little scheduling or communication overhead. Furthermore, reliability is enhanced because the failure of a single processor leads only to degradation, not failure, of the algorithm. We present results of an extensive simulation done on a multiprocessor simulator, running a randomized branch-and-bound algorithm. The particular case we consider is the knapsack problem, due to its ease of formulation. We observe the largest speedups in precisely those problems that take large amounts of time to solve. This work has been supported by the U.S. Army Research Office under Contract No. DAAG 29-85-K-0236.  相似文献   

2.
We propose a model of parallel computation, the YPRAM, that allows general parallel algorithms to be designed for a wide class of parallel models. The basic model captures locality among processors, which is measured as a function of two parameters; latency and bandwidth.

We design YPRAM algorithms for solving several fundamental problems: parallel prefix, sorting, sorting numbers from a bounded range, and list ranking. We show that our model predicts, reasonably accurately, the actual known performances of several basic parallel models — PRAM, hypercube, mesh and tree — when solving these problems.  相似文献   


3.
The backtrack search problem involves visiting all the nodes of an arbitrary binary tree given a pointer to its root subject to the constraint that the children of a node are revealed only after their parent is visited. We present a fast, deterministic backtrack search algorithm for a p-processor COMMON CRCW-PRAM, which visits any n-node tree of height h in time O((n/p+h)(logloglogp)2). This upper bound compares favourably with a natural Ω(n/p+h) lower bound for this problem. Our approach embodies novel, efficient techniques for dynamically assigning tree-nodes to processors to ensure that the work is shared equitably among them.  相似文献   

4.
Binary images can be compressed efficiently using context‐based statistical modeling and arithmetic coding. However, this approach is fully sequential and therefore additional computing power from parallel computers cannot be utilized. We attack this problem and show how to implement the context‐based compression in parallel. Our approach is to segment the image into non‐overlapping blocks, which are compressed independently by the processors. We give two alternative solutions about how to construct, distribute and utilize the model in parallel, and study the effect on the compression performance and execution time. We show by experiments that the proposed approach achieves speedup that is proportional to the number of processors. The work efficiency exceeds 50% with any reasonable number of processors. Copyright © 2002 John Wiley & Sons, Ltd.  相似文献   

5.
A. Chin 《Algorithmica》1994,12(2-3):170-181
Consider the problem of efficiently simulating the shared-memory parallel random access machine (PRAM) model on massively parallel architectures with physically distributed memory. To prevent network congestion and memory bank contention, it may be advantageous to hash the shared memory address space. The decision on whether or not to use hashing depends on (1) the communication latency in the network and (2) the locality of memory accesses in the algorithm.We relate this decision directly to algorithmic issues by studying the complexity of hashing in the Block PRAM model of Aggarwal, Chandra, and Snir, a shared-memory model of parallel computation which accounts for communication locality. For this model, we exhibit a universal family of hash functions having optimal locality. The complexity of applying these hash functions to the shared address space of the Block PRAM (i.e., by permuting data elements) is asymptotically equivalent to the complexity of performing a square matrix transpose, and this result is best possible for all pairwise independent universal hash families. These complexity bounds provide theoretical evidence that hashing and randomized routing need not destroy communication locality, addressing an open question of Valiant.This work was started when the author was a student at Oxford University, supported by a National Science Foundation Graduate Fellowship and a Rhodes Scholarship. Any opinions, findings, conclusions, or recommendations expressed in this publication are those of the author and do not necessarily reflect the views of the National Science Foundation or the Rhodes Trust.  相似文献   

6.
In this paper we give a parallel algorithm for line-segment intersection reporting in the plane. It runs in timeO(((n +k) logn log logn)/p) usingp processors on a concurrent-read-exclusive-write (CREW)-PRAM, wheren is the number of line segments,k is the number of intersections, andp n +k.This work was supported by the DFG, SFB 124, TP B2, VLSI Entwurfsmethoden und Parallelität.  相似文献   

7.
A major reason for the lack of practical use of parallel computers has been the absence of a suitable model of parallel computation. Many existing models are either theoretical or are tied to a particular architecture. A more general model must be architecture independent, must realistically reflect execution costs, and must reduce the cognitive overhead of managing massive parallelism. A growing number of models meeting some of these goals have been suggested. We discuss their properties and relative strengths and weaknesses. We conclude that data parallelism is a style with much to commend it, and discuss the Bird-Meertens formalism as a coherent approach to data parallel programming.This work was supported by the Natural Sciences and Engineering Research Council of Canada.  相似文献   

8.
9.
We consider the problem of sorting n integers when the elements are drawn from the restricted domain [1...n]. A new deterministic parallel algorithm for sorting n integers is obtained. Its running time is O(lognlog(n/logn)) using n/logn processors on EREW (exclusive read exclusive write) PRAM (parallel random access machine). Also, our algorithm was modified to become optimal when we use processors. This algorithm belongs to class EP (Efficient, Polynomial fast).  相似文献   

10.
We present a randomized EREW PRAM algorithm to find a minimum spanning forest in a weighted undirected graph. On an n -vertex graph the algorithm runs in o(( log n) 1+ ɛ ) expected time for any ɛ >0 and performs linear expected work. This is the first linear-work, polylog-time algorithm on the EREW PRAM for this problem. This also gives parallel algorithms that perform expected linear work on two general-purpose models of parallel computation—the QSM and the BSP.  相似文献   

11.
基于种群迭代搜索的智能优化算法在农业、交通、工业等很多领域都取得了广泛的应用.但是该类算法迭代寻优的特点使其求解效率通常较低,很难应用到大规模、高维或实时性要求较高的复杂优化问题中.随并行分布式技术的发展,国内外很多学者开始着手研究智能优化算法的并行化.本文首要介绍了并行智能优化算法的基本概念;其次从协同机制、并行模型以及硬件结构3个维度综述了几类常见的并行智能优化算法,详细分析阐述了它们优点及不足;最后对并行智能优化算法的未来研究进行了展望.  相似文献   

12.
Fork95 is an imperative parallel programming language intended to express algorithms for synchronous shared memory machines (PRAMs). It is based on ANSI C and offers additional constructs to hierarchically divide processor groups into subgroups and manage shared and private address subspaces. Fork95 makes the assembly-level synchronicity of the underlying hardware available to the programmer at the language level. Nevertheless, it supports locally asynchronous computation where desired by the programmer. We present a one pass compiler, fcc, which compiles Fork95 and C programs to the SB-PRAM machine. The SB-PRAM is a lock-step synchronous, massively parallel multiprocessor currently being built at Saarbrücken University, with a physically shared memory and uniform memory access time. We examine three important types of parallel computation frequently used for the parallel solution of real-world problems. While farming and parallel divide-and-conquer are directly supported by Fork95 language constructs, pipelining can be easily expressed using existing language features; an additional language construct for pipelining is not required.  相似文献   

13.
Parallel updates of minimum spanning trees (MSTs) have been studied in the past. These updates allowed a single change in the underlying graph, such as a change in the cost of an edge or an insertion of a new vertex. Multiple update problems for MSTs are concerned with handling more than one such change. In the sequential case multiple update problems may be solved using repeated applications of an efficient algorithm for a single update. However, for efficiency reasons, parallel algorithms for multiple update problems must consider all changes to the underlying graph simultaneously. In this paper we describe parallel algorithms for updating an MST whenk new vertices are inserted or deleted in the underlying graph, when the costs ofk edges are changed, or whenk edge insertions and deletions are performed. For multiple vertex insertion update, our algorithm achieves time and processor bounds ofO(log n·logk) and nk/(logn·logk), respectively, on a CREW parallel random access machine. These bounds are optimal for dense graphs. A novel feature of this algorithm is a transformation of the previous MST andk new vertices to a bipartite graph which enables us to obtain the above-mentioned bounds.  相似文献   

14.
The vertex updating problem for a minimum spanning tree (MST) is defined as follows: Given a graphG=(V, E G) and an MSTT forG, find a new MST forG to which a new vertexz has been added along with weighted edges that connectz with the vertices ofG. We present a set of rules that produce simple optimal parallel algorithms that run inO(lgn) time usingn/lgn EREW PRAM processors, wherenV¦. These algorithms employ any valid tree-contraction schedule that can be produced within the stated resource bounds. These rules can also be used to derive simple linear-time sequential algorithms for the same problem. The previously best-known parallel result was a rather complicated algorithm that usedn processors in the more powerful CREW PRAM model. Furthermore, we show how our solution can be used to solve the multiple vertex updating problem: Update a given MST whenk new vertices are introduced simultaneously. This problem is solved inO(lgk·lgn) parallel time using (k·n)/(lgk·lgn) EREW PRAM processors. This is optimal for graphs having (kn) edges.Part of this work was done while P. Metaxas was with the Department of Mathematics and Computer Science, Dartmouth College.  相似文献   

15.
In this paper we discuss methods for predicting the performance of any formulation of randomized parallel search, and propose a new performance prediction method that is based on obtaining an accurate estimate of the k-processor run-time distribution. We show that the k-processor prediction method delivers accurate performance predictions and demonstrate the validity of our analysis on several robot motion planning problems.  相似文献   

16.
S. Sunder  Xin He 《Algorithmica》1996,16(3):243-262
We present a parallel algorithm for solving the minimum weighted completion time scheduling problem for transitive series parallel graphs. The algorithm takesO(log2 n) time withO(n 3) processors on a CREW PRAM, wheren is the number of vertices of the input graph. This is the first NC algorithm for solving the problem.Research supported in part by NSF Grants CCR-9011214 and CCR-9205982.  相似文献   

17.
That the influence of the PRAM model is ubiquitous in parallel algorithm design is as clear as the fact that it is technologically infeasible for the forseeable future. The current generation of parallel hardware prominently features distributed memory and high‐performance interconnection networks—very much the antithesis of the shared memory required for the PRAM model. It has been shown that, in spite of communication costs, for some problems very fast parallel algorithms are available for distributed‐memory machines—from embarassingly parallel problems to sorting and numerical analysis. In contrast it is known that for other classes of problem PRAM‐style shared‐memory simulation on a distributed‐memory machine can, in theory, produce solutions of comparable performance to the best possible for such architectures. The Bulk Synchronous Parallel (BSP) model accurately represents most parallel machines—theoretical and actual—in an execution and cost model. We introduce a scalable portable PRAM realization appropriate for BSP computers and a methodology for usage. Our system is fast and built upon the familiar sequential C++ coupled with the new standard BSP library of parallel computation and communication primitives. It is portable to and predictable on a vast number of parallel computers including workstation clusters, a 256‐processor Cray T3D, an 8‐node IBM SP/2 and a 4‐node shared‐memory SGI Power Challenge machine. Our approach achieves simplicity of programming over direct‐mode BSP programming for reasonable overhead cost. We objectively compare optimized BSP and PRAM algorithms implemented with our C++ PRAM library and provide encouraging experimental results for our new style of programming. Copyright © 2000 John Wiley & Sons, Ltd.  相似文献   

18.
Spatial regularity amidst a seemingly chaotic image is often meaningful. Many papers in computational geometry are concerned with detecting some type of regularity via exact solutions to problems in geometric pattern recognition. However, real-world applications often have data that is approximate, and may rely on calculations that are approximate. Thus, it is useful to develop solutions that have an error tolerance.

A solution has recently been presented by Robins et al. [Inform. Process. Lett. 69 (1999) 189–195] to the problem of finding all maximal subsets of an input set in the Euclidean plane that are approximately equally-spaced and approximately collinear. This is a problem that arises in computer vision, military applications, and other areas. The algorithm of Robins et al. is different in several important respects from the optimal algorithm given by Kahng and Robins [Patter Recognition Lett. 12 (1991) 757–764] for the exact version of the problem. The algorithm of Robins et al. seems inherently sequential and runs in O(n5/2) time, where n is the size of the input set. In this paper, we give parallel solutions to this problem.  相似文献   


19.
Randomized incremental construction of Delaunay and Voronoi diagrams   总被引:8,自引:0,他引:8  
In this paper we give a new randomized incremental algorithm for the construction of planar Voronoi diagrams and Delaunay triangulations. The new algorithm is more on-line than earlier similar methods, takes expected timeO(ngn) and spaceO(n), and is eminently practical to implement. The analysis of the algorithm is also interesting in its own right and can serve as a model for many similar questions in both two and three dimensions. Finally we demonstrate how this approach for constructing Voronoi diagrams obviates the need for building a separate point-location structure for nearest-neighbor queries.Leonidas Guibas and Micha Sharir wish to acknowledge the generous support of the DEC Systems Research Center in Palo Alto, California, where some of this work was carried out. Donald Knuth has been supported by NSF Grant CCR-86-10181. Micha Sharir has been supported by NSF Grant CCR-89-01484, ONR Grant N00014-K-87-0129, the U.S.-Israeli Binational Science Foundation, and the Fund for Basic Research administered by the Israeli Academy of Sciences.  相似文献   

20.
The Dandelion-like codes are eight bijections between labeled trees and strings of node labels. The literature contains optimal sequential algorithms for these bijections, but no parallel algorithms have been reported. In this paper the first parallel encoding and decoding algorithms for Dandelion-like codes are presented. Namely, a unique encoding algorithm and a unique decoding algorithm, which when properly parameterized, can be used for all Dandelion-like codes, are designed. These algorithms are optimal in the sequential setting. The encoding algorithm implementation on an EREW PRAM is optimal, while the efficient implementation of the decoding algorithm requires concurrent reading.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号