首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
In this paper we present deterministic parallel algorithms for the coarse-grained multicomputer (CGM) and bulk synchronous parallel (BSP) models for solving the following well-known graph problems: (1) list ranking, (2) Euler tour construction in a tree, (3) computing the connected components and spanning forest, (4) lowest common ancestor preprocessing, (5) tree contraction and expression tree evaluation, (6) computing an ear decomposition or open ear decomposition, and (7) 2-edge connectivity and biconnectivity (testing and component computation). The algorithms require O(log p) communication rounds with linear sequential work per round (p = no. processors, N = total input size). Each processor creates, during the entire algorithm, messages of total size O(log (p) (N/p)) . The algorithms assume that the local memory per processor (i.e., N/p ) is larger than p ε , for some fixed ε > 0 . Our results imply BSP algorithms with O(log p) supersteps, O(g log (p) (N/p)) communication time, and O(log (p) (N/p)) local computation time. It is important to observe that the number of communication rounds/ supersteps obtained in this paper is independent of the problem size, and grows only logarithmically with respect to p . With growing problem size, only the sizes of the messages grow but the total number of messages remains unchanged. Due to the considerable protocol overhead associated with each message transmission, this is an important property. The result for Problem (1) is a considerable improvement over those previously reported. The algorithms for Problems (2)—(7) are the first practically relevant parallel algorithms for these standard graph problems. Received July 5, 2000; revised April 16, 2001.  相似文献   

Sibeyn 《Algorithmica》2008,32(3):345-363
Abstract. It is hard to achieve good speed-ups for parallel list ranking on distributed-memory machines because the problem requires a substantial number of communication rounds, each incurring some start-up delay. For input sizes N that are very large in comparison with the number of processors P these start-up costs can be amortized to a certain extent. For modest N/P values, so far the best approach was the basic pointer-jumping approach. In this paper a novel algorithm, one-by-one cleaning , is presented. It has the unique property that the routing consists of O(P) rounds in which each processing unit (PU) sends a packet to only one other PU. Pointer jumping requires a logarithmic number of rounds in which each PU sends a packet to all other PUs. Because the constants are small, and the internal work performed is less than that of pointer jumping, one-by-one cleaning is about twice as fast, which is demonstrated by comparing the performance of implementations of both algorithms on the Intel Paragon.  相似文献   

The range tree is a fundamental data structure for multidimensional point sets, and, as such, is central in a wide range of geometric and database applications. In this paper we describe the first nontrivial adaptation of range trees to the parallel distributed memory setting (BSP-like models). Given a set of n points in d -dimensional Cartesian space, we show how to construct on a coarse-grained multicomputer a distributed range tree T in time O( s / p + T c (s,p)) , where s = n log d-1 n is the size of the sequential data structure and T c (s,p) is the time to perform an h -relation with h=Θ (s/p) . We then show how T can be used to answer a given set Q of m=O(n) range queries in time O((s log m)/p + T c (s,p)) and O((s log m)/p + T c (s,p) + k/p) , where k is the number of results to be reported. These parallel construction and search algorithms are both highly efficient, in that their running times are the sequential time divided by the number of processors, plus a constant number of parallel communication rounds. Received June 1, 1997; revised March 10, 1998.  相似文献   

Abstract. We present an optimal parallel randomized algorithm for the Voronoi diagram of a set of n nonintersecting (except possibly at endpoints) line segments in the plane. Our algorithm runs in O(log n) time with high probability using O(n) processors on a CRCW PRAM. This algorithm is optimal in terms of work done since the sequential time bound for this problem is Ω(n log n) . Our algorithm improves by an O(log n) factor the previously best known deterministic parallel algorithm, given by Goodrich, ó'Dúnlaing, and Yap, which runs in O( log 2 n) time using O(n) processors. We obtain this result by using a new ``two-stage' random sampling technique. By choosing large samples in the first stage of the algorithm, we avoid the hurdle of problem-size ``blow-up' that is typical in recursive parallel geometric algorithms. We combine the two-stage sampling technique with efficient search and merge procedures to obtain an optimal algorithm. This technique gives an alternative optimal algorithm for the Voronoi diagram of points as well (all other optimal parallel algorithms for this problem use the transformation to three-dimensional half-space intersection).  相似文献   

Given two strings A and B of lengths na and nb, respectively, the All-substrings Longest Common Subsequence (ALCS) problem obtains, for any substring B' of B, the length of the longest string that is a subsequence of both A and B'. The sequential algorithm for this problem takes O(na nb) time and O(nb) space. We present a parallel algorithm for the ALCS problem on the Coarse-Grained Multicomputer (BSP/CGM) model with p < √na processors, that takes O(na nb/p) time, O(log p) communication rounds and O(nb √na) space per processor. The proposed algorithm also solves the basic Longest Common Subsequence (LCS) problem that finds the longest string (and not only its length) that is a subsequence of both A and B. To our knowledge, this is the best BSP/CGM algorithm in the literature for the LCS and ALCS problems.  相似文献   

Das  Loui 《Algorithmica》2008,31(4):530-547
Abstract. Updating a minimum spanning tree (MST) is a basic problem for communication networks. In this paper we consider single node deletions in MSTs. Let G=(V,E) be an undirected graph with n nodes and m edges, and let T be the MST of G . For each node v in V , the node replacement for v is the minimum weight set of edges R(v) that connect the components of T-v . We present a sequential algorithm and a parallel algorithm that find R(v) for all V simultaneously. The sequential algorithm takes O(m log n) time, but only O(m α (m,n)) time when the edges of E are presorted by weight. The parallel algorithm takes O(log 2 n) time using m processors on a CREW PRAM.  相似文献   

Consider a network of processors modeled by an n-vertex directed graph G = (V,E). Assume that the communication in the network is synchronous, i.e., occurs in discrete "rounds," and in every round every processor is allowed to pick one of its neighbors, and to send him a message. A set of terminals T ⫅ V of size |T| = k is given. The telephone k-multicast} problem requires computing a schedule with a minimal number of rounds that delivers a message from a given single processor, that generates the message, to all the processors of T. The processors of V\T may be left uninformed. The telephone multicast is a basic primitive in distributed computing and computer communication theory. In this paper we devise an algorithm that constructs a schedule with O(log k · b* + k1/2) rounds for the directed k-multicast} problem, where b* is the value of the optimum solution. This is the first algorithm with a non-trivial approximation guarantee for this problem. We show that our algorithm for the directed multicast problem can be used to derive an algorithm with a similar ratio for the directed Steiner poise problem, that is, the problem of constructing an arborescence that spans a collection T of terminals and has the minimum poise.  相似文献   

Design and Implementation of a Practical Parallel Delaunay Algorithm   总被引:1,自引:0,他引:1  
This paper describes the design and implementation of a practical parallel algorithm for Delaunay triangulation that works well on general distributions. Although there have been many theoretical parallel algorithms for the problem, and some implementations based on bucketing that work well for uniform distributions, there has been little work on implementations for general distributions. We use the well known reduction of 2D Delaunay triangulation to find the 3D convex hull of points on a paraboloid. Based on this reduction we developed a variant of the Edelsbrunner and Shi 3D convex hull algorithm, specialized for the case when the point set lies on a paraboloid. This simplification reduces the work required by the algorithm (number of operations) from O(n log 2 n) to O(n log n) . The depth (parallel time) is O( log 3 n) on a CREW PRAM. The algorithm is simpler than previous O(n log n) work parallel algorithms leading to smaller constants. Initial experiments using a variety of distributions showed that our parallel algorithm was within a factor of 2 in work from the best sequential algorithm. Based on these promising results, the algorithm was implemented using C and an MPI-based toolkit. Compared with previous work, the resulting implementation achieves significantly better speedups over good sequential code, does not assume a uniform distribution of points, and is widely portable due to its use of MPI as a communication mechanism. Results are presented for the IBM SP2, Cray T3D, SGI Power Challenge, and DEC AlphaCluster. Received June 1, 1997; revised March 10, 1998.  相似文献   

K. Diks  A. Pelc 《Algorithmica》2000,28(1):37-50
We consider broadcasting among n processors, f of which can be faulty. A fault-free processor, called the source, holds a piece of information which has to be transmitted to all other fault-free processors. We assume that the fraction f/n of faulty processors is bounded by a constant γ<1 . Transmissions are fault free. Faults are assumed to be of the crash type: faulty processors do not send or receive messages. We use the whispering model: pairs of processors communicating in one round must form a matching. A fault-free processor sending a message to another processor becomes aware of whether this processor is faulty or fault free and can adapt future transmissions accordingly. The main result of the paper is a broadcasting algorithm working in O( log n) rounds and using O(n) messages of logarithmic size, in the worst case. This is an improvement of the result from [17] where O ((log n) 2 ) rounds were used. Our method also gives the first algorithm for adaptive distributed fault diagnosis in O( log n) rounds. Received May 1997; revised May 1998.  相似文献   

Given a set of n intervals representing an interval graph, the problem of finding a maximum matching between pairs of disjoint (nonintersecting) intervals has been considered in the sequential model. In this paper we present parallel algorithms for computing maximum cardinality matchings among pairs of disjoint intervals in interval graphs in the EREW PRAM and hypercube models. For the general case of the problem, our algorithms compute a maximum matching in O( log 3 n) time using O(n/ log 2 n) processors on the EREW PRAM and using n processors on the hypercubes. For the case of proper interval graphs, our algorithm runs in O( log n ) time using O(n) processors if the input intervals are not given already sorted and using O(n/ log n ) processors otherwise, on the EREW PRAM. On n -processor hypercubes, our algorithm for the proper interval case takes O( log n log log n ) time for unsorted input and O( log n ) time for sorted input. Our parallel results also lead to optimal sequential algorithms for computing maximum matchings among disjoint intervals. In addition, we present an improved parallel algorithm for maximum matching between overlapping intervals in proper interval graphs. Received November 20, 1995; revised September 3, 1998.  相似文献   

Xiaotie Deng  Binhai Zhu 《Algorithmica》1999,24(3-4):270-286
We present a randomized algorithm for computing the Voronoi diagram of line segments using coarse-grained parallel machines. Operating on P processors, for any input of n line segments, this algorithm performs O((n log n)/P) local operations per processor, O(n/P) messages per processor, and O(1) communication phases, with high probability for n=Ω(P 3+ε ) . Received June 1, 1997; revised March 10, 1998.  相似文献   

The block-cyclic data distribution is commonly used to organize array elements over the processors of a coarse-grained distributed memory parallel computer. In many scientific applications, the data layout must be reorganized at run-time in order to enhance locality and reduce remote memory access overheads. In this paper we present a general framework for developing array redistribution algorithms. Using this framework, we have developed efficient algorithms that redistribute an array from one block-cyclic layout to another. Block-cyclic redistribution consists of index set computation , wherein the destination locations for individual data blocks are calculated, and data communication , wherein these blocks are exchanged between processors. The framework treats both these operations in a uniform and integrated way. We have developed efficient and distributed algorithms for index set computation that do not require any interprocessor communication. To perform data communication in a conflict-free manner, we have developed direct indirect and hybrid algorithms. In the direct algorithm, a data block is transferred directly to its destination processor. In an indirect algorithm, data blocks are moved from source to destination processors through intermediate relay processors. The hybrid algorithm is a combination of the direct and indirect algorithms. Our framework is based on a generalized circulant matrix formalism of the redistribution problem and a general purpose distributed memory model of the parallel machine. Our algorithms sustain excellent performance over a wide range of problem and machine parameters. We have implemented our algorithms using MPI, to allow for easy portability across different HPC platforms. Experimental results on the IBM SP-2 and the Cray T3D show superior performance over previous approaches. When the block size of the cyclic data layout changes by a factor of K , the redistribution can be performed in O( log K) communication steps. This is true even when K is a prime number. In contrast, previous approaches take O(K) communication steps for redistribution. Our framework can be used for developing scalable redistribution libraries, for efficiently implementing parallelizing compiler directives, and for developing parallel algorithms for various applications. Redistribution algorithms are especially useful in signal processing applications, where the data access patterns change significantly between computational phases. They are also necessary in linear algebra programs, to perform matrix transpose operations. Received June 1, 1997; revised March 10, 1998.  相似文献   

Abstract. In this paper two problems on the class of k -trees, a subclass of the class of chordal graphs, are considered: the fast reordering problem and the isomorphism problem. An O(log 2 n) time parallel algorithm for the fast reordering problem is described that uses O(nk(n-k)/\kern -1ptlog n) processors on a CRCW PRAM proving membership in the class NC for fixed k . An O(nk(k+1)!) time sequential algorithm for the isomorphism problem is obtained representing an improvement over the O(n 2 k(k+1)!) algorithm of Sekharan (the second author) [10]. A parallel version of this sequential algorithm is presented that runs in O(log 2 n) time using O((nk((k+1)!+n-k))/log n) processors improving on a parallel algorithm of Sekharan for the isomorphism problem [10]. Both the sequential and parallel algorithms use a concept introduced in this paper called the kernel of a k -tree.  相似文献   

V. Kumar 《Algorithmica》2001,30(3):406-417
We consider the problem of colouring a family of n arcs of a circle. This NP-complete problem, which occurs in routing and network design problems, is modelled as a 0-1 integer multicommodity flow problem. We present an algorithm that routes the commodities in the network by augmenting the network with some extra edges which correspond to extra colours. The algorithm, which relies on probabilistic techniques such as randomized rounding and path selection, is a randomized approximation algorithm which has an asymptotic performance ratio of 1+1/e (approximately 1.37) except when the minimum number of colours required is very small (O(\ln n) ). This is an improvement over the best previously known result [7], which is a deterministic approximation algorithm with a performance ratio of 3/2. The substantial improvement is valuable, for instance in wavelength allocation strategies in communication networks where bandwidth is a precious resource. Received October 25, 1998; revised August 26, 1999, and April 17, 2000.  相似文献   

We consider the distributed complexity of the stable matching problem (a.k.a. “stable marriage”). In this problem, the communication graph is undirected and bipartite, and each node ranks its neighbors. Given a matching of the nodes, a pair of unmatched nodes is called blocking if they prefer each other to their assigned match. A matching is called stable if it does not induce any blocking pair. In the distributed model, nodes exchange messages in each round over the communication links, until they find a stable matching. We show that if messages may contain at most B bits each, then any distributed algorithm that solves the stable matching problem requires ${\Omega(\sqrt{n/B\log n})}We consider the distributed complexity of the stable matching problem (a.k.a. “stable marriage”). In this problem, the communication graph is undirected and bipartite, and each node ranks its neighbors. Given a matching of the nodes, a pair of unmatched nodes is called blocking if they prefer each other to their assigned match. A matching is called stable if it does not induce any blocking pair. In the distributed model, nodes exchange messages in each round over the communication links, until they find a stable matching. We show that if messages may contain at most B bits each, then any distributed algorithm that solves the stable matching problem requires W(?{n/Blogn}){\Omega(\sqrt{n/B\log n})} communication rounds in the worst case, even for graphs of diameter O(log n), where n is the number of nodes in the graph. Furthermore, the lower bound holds even if we allow the output to contain O(?n){O(\sqrt n)} blocking pairs, and if a pair is considered blocking only if they like each other much more then their assigned match.  相似文献   

We consider the problem of generating random permutations with uniform distribution. That is, we require that for an arbitrary permutation π of n elements, with probability 1/n! the machine halts with the i th output cell containing π(i) , for 1 ≤ i ≤ n . We study this problem on two models of parallel computations: the CREW PRAM and the EREW PRAM. The main result of the paper is an algorithm for generating random permutations that runs in O(log log n) time and uses O(n 1+o(1) ) processors on the CREW PRAM. This is the first o(log n) -time CREW PRAM algorithm for this problem. On the EREW PRAM we present a simple algorithm that generates a random permutation in time O(log n) using n processors and O(n) space. This algorithm outperforms each of the previously known algorithms for the exclusive write PRAMs. The common and novel feature of both our algorithms is first to design a suitable random switching network generating a permutation and then to simulate this network on the PRAM model in a fast way. Received November 1996; revised March 1997.  相似文献   

We present a deterministic algorithm running in space O(log2 n /log log n ) solving the connectivity problem on strongly unambiguous graphs. In addition, we present an O(log n ) time-bounded algorithm for this problem running on a parallel pointer machine. Received February 1997, and in revised form August 1997, and in final form February 1998.  相似文献   

We consider multimessage multicasting over the n processor complete (or fully connected) static network when the forwarding of messages is allowed. We present an efficient algorithm that constructs for every degree d problem instance a communication schedule with total communication time at most 2d , where d is the maximum number of messages that each processor may send (or receive). Our algorithm consists of two phases. In the first phase a set of communications are scheduled to be carried out in d time periods in such a way that the resulting problem is a multimessage unicasting problem of degree d . In the second phase we generate a communication schedule for this problem by reducing it to the Makespan Openshop Preemptive Scheduling problem which can be solved in polynomial time. The final schedule is the concatenation of the communication schedules for each of these two phases. For 2 ≤ l ≤ d , we present an algorithm to generate a communication schedule with total communication time at most \lfloor ( 2 - 1/l ) d \rfloor +1 , for problem instances where each processor needs to send messages to at most ld destinations. We also discuss multimessage multicasting for dynamic networks. Received September 22, 1997; revised August 29, 1998.  相似文献   

In this paper, we present randomized algorithms for selection on the hypercube. We identify two variants of the hypercube, namely, thesequential modeland theparallel model. In the sequential model, any node at any time can handle only communication along a single incident edge, whereas in the parallel model a node can communicate along all its incident edges at the same time. We specify three variations of the parallel model and present optimal randomized algorithms on all these three versions of parallel model. In particular, we show that selection on an input of sizencan be performed on ap-node hypercube in timeO((n/p) + logp) with high probability, on any of the three versions of the parallel model. This result is important in view of a lower bound that implies that selection needs Ω((n/p)log logp+ logp) time on ap-node sequential hypercube. We modify our selection algorithm to run on the sequential hypercube in which case it runs in an expected time nearly matching this lower bound. For the special case whenn=p, our selection algorithm runs in an optimalO(logn) time on the sequential hypercube. Our algorithms are very simple and are most likely to perform well in practice.  相似文献   

Klein  Netzer  Lu 《Algorithmica》2008,35(4):321-345
Abstract. We address the problem of detecting race conditions in programs that use semaphores for synchronization. Netzer and Miller showed that it is NP-complete to detect race conditions in programs that use many semaphores. We show in this paper that it remains NP-complete even if only two semaphores are used in the parallel programs. For the tractable case, i.e., using only one semaphore, we give two algorithms for detecting race conditions from the trace of executing a parallel program on p processors, where n semaphore operations are executed. The first algorithm determines in O(n) time whether a race condition exists between any two given operations. The second algorithm runs in O( np log n) time and outputs a compact representation from which one can determine in O(1) time whether a race condition exists between any two given operations. The second algorithm is near-optimal in that the running time is only O( log n) times the time required simply to write down the output.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号