首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We consider the problem of routing and sorting ond-dimensionaln×...× mesh connected computers. Each of the processing units initially holdsk packets. We present randomized algorithms that solve these problems with (1+o(1))·max{2·d·n,k·n/2} communication steps. On a torus these problems are solved twice as fast. Thus we match the bisection bound up to lower-order terms, for allk≥4·d. Earlier algorithms required some additional Θ(n) steps or more, and were more complicated. With 2·d·n extra steps our algorithm can also route in the cut-through routing model.  相似文献   

2.
We study the renaming problem in a fully connected synchronous network with Byzantine failures. We show that when the original namespace of the processors is unbounded, this problem cannot be solved in an a priori bounded number of rounds for , where n is the size of the network and t is the number of failures. On the other hand, for n > 3t, we present a Byzantine renaming algorithm that runs in O(lg n) rounds. In addition, we present a fast, efficient strong renaming algorithm for n > t, which runs in rounds, where N 0 is the value of the highest identifier among all the correct processors.  相似文献   

3.
Givenn numbersa 0,a 1,...,a n –1, it is required to compute all sums of the forma 0+a 1+...+a i , fori=0, 1,...,n–1. This problem arises in many applications and is trivial to solve sequentially in O(n) time. Besides its practical importance, the problem gains an additional theoretical interest in parallel computation. A technique known asrecursive doubling allows all sums to be computed in O(logn) time on a model of computation wheren processors communicate through aninverse perfect suffle interconnection network. In this paper we show how the problem can be solved on a simple network, namely abinary tree of processors. In addition, we show how to extend our solution to obtain an optimal-cost algorithm. The algorithm usesp processors and runs in O((n/p)+logp) time, for a cost of O(n+p logp). This cost is optimal whenp logp=O(n). Finally, two applications of our results are illustrated, namely job scheduling with deadlines and the knapsack problem.This work was supported by the Natural Sciences and Engineering Research Council of Canada under Grants A0282 and A3336.  相似文献   

4.
Summary.  We consider agreement and leader election on asynchronous complete networks when the processors are reliable, but some of the channels are subject to failure. Fischer, Lynch, and Paterson have already shown that no deterministic algorithm can solve the agreement problem on asynchronous networks if any processor fails during the execution of the algorithm. Therefore, we consider only channel failures. The type of channel failure we consider in this paper is Byzantine failure, that is, channels fail by altering messages, sending false information, forging messages, losing messages at will, and so on. There are no restrictions on the behavior of a faulty channel. Therefore, a faulty channel may act as an adversary who forges messages on purpose to prevent the successful completion of the algorithm. Because we assume an asynchronous network, the channel delays are arbitrary. Thus, the faulty channels may not be detectable unless, for example, the faulty channels cause garbage to be sent. We present the first known agreement and leader election algorithm for asynchronous complete networks in which the processors are reliable but some channels may be Byzantine faulty. The algorithm can tolerate up to [n−22] faulty channels, where n is the number of processors in the network. We show that the bound on the number of faulty channels is optimal. When the processors terminate their corresponding algorithms, all the processors in the network will have the same correct vector, where the vector contains the private values of all the processors. Received: May 1994/Accepted: July 1995  相似文献   

5.
Given a string of lengthn, this short paper first presents anO(1)-time parallel algorithm for finding all initial palindromes and periods of the string on ann×n reconfigurable mesh (RM). Then, under the same cost (= time × the number of processors =O(n 2)), we provide a partitionable strategy when the RM doesn’t offer sufficient processors; this overcomes the hardware limitation and is very suitable for VLSI implementation. Prof. Chung was supported in part by the National Science Council of R. O. C. under contracts NSC87-2213-E011-001 and NSC87-2213-E011-003.  相似文献   

6.
Efficient Scheduling of Strict Multithreaded Computations   总被引:1,自引:0,他引:1  
In this paper we study the problem of efficiently scheduling a wide class of multithreaded computations, called strict; that is, computations in which all dependencies from a thread go to the thread's ancestors in the computation tree. Strict multithreaded computations allow the limited use of synchronization primitives. We present the first fully distributed scheduling algorithm which applies to any strict multithreaded computation. The algorithm is asynchronous, on-line, and follows the work-stealing paradigm. We prove that our algorithm is efficient not only in terms of its memory requirements and its execution time, but also in terms of its communication complexity. Our analysis applies to both shared and distributed memory machines. More specifically, the expected execution time of our algorithm is O(T 1 /P + hT ∈fty ) , where T 1 is the minimum serial execution time, T ∈fty is the minimum execution time with an infinite number of processors, P is the number of processors, and h is the maximum ``distance' in the computation tree between any two threads that need to communicate. Furthermore, the total space required during the execution is O(S 1 P) , where S 1 is the space required by a serial computer to execute the computation, while the expected communication cost incurred by our algorithm is O(PhT ∈fty (1+n d ) S max ) , where n d is the maximum number of dependencies entering any thread and S max is the largest amount of storage needed for the execution of any specific thread of the computation. Our communication complexity bound is the first nontrivial bound ever proved for the model of strict parallel programming. Received January 20, 1999, and in revised form March 26, 1999, and November 30, 1999.  相似文献   

7.
The problem is posed: find an algorithm which for any given n-dimensional relation R ? A1 × A2 × ? × An, defined on a set family A = { A1, A2, ?, Anrcub;, n = 1,2, ?, determines all functional dependences between disjoint subsets of A which are embedded in R. A solution algorithm is presented, a theorem is proved that allows a simplification in the algorithm, and an efficient computer implementation (available through the General Systems Depository) is demonstrated.  相似文献   

8.
Trees are a useful data type, but they are not routinely included in parallel programming systems, in part because their irregular structure makes partitioning and scheduling difficult. We present a method for algebraically constructing implementations of tree skeletons, high-level homomorphic operations that execute in parallel. Many computations on binary trees can be performed inO(logn) parallel time usingnprocessors, even taking account of communication costs. We extend these results to trees with arbitrary and variable degree. Then we show that it is possible to implement a distributed version of homomorphisms on binary trees, takingO(n/p+ log2p) parallel time onp < nprocessors, for trees of any skew and taking full account of communication costs. Under slightly stronger restrictions on the underlying functions, this can be improved toO(n/p+ logp). Furthermore, the technique for deriving distributed versions is algebraic, allowing the automatic generation of code for SPMD and data-parallel architectures.  相似文献   

9.
We give anO(log4 n)-timeO(n 2)-processor CRCW PRAM algorithm to find a hamiltonian cycle in a strong semicomplete bipartite digraph,B, provided that a factor ofB (i.e., a collection of vertex disjoint cycles covering the vertex set ofB) is computed in a preprocessing step. The factor is found (if it exists) using a bipartite matching algorithm, hence placing the whole algorithm in the class Random-NC. We show that any parallel algorithm which can check the existence of a hamiltonian cycle in a strong semicomplete bipartite digraph in timeO(r(n)) usingp(n) processors can be used to check the existence of a perfect matching in a bipartite graph in timeO(r(n)+n 2 /p(n)) usingp(n) processors. Hence, our problem belongs to the class NC if and only if perfect matching in bipartite graphs belongs to NC. We also consider the problem of finding a hamiltonian path in a semicomplete bipartite digraph.  相似文献   

10.
This paper describes an algorithm to compute the law of the Lie algebra \mathfrakgn\mathfrak{g}_{n} associated with the Lie group G n , formed of all the n×n upper-unitriangular matrices. The goal of this paper is to show the algorithm that computes the law of \mathfrakgn\mathfrak{g}_{n} and its implementation using the symbolic computation package MAPLE. In addition, the complexity of the algorithm is described.  相似文献   

11.
K. Diks  A. Pelc 《Algorithmica》2000,28(1):37-50
We consider broadcasting among n processors, f of which can be faulty. A fault-free processor, called the source, holds a piece of information which has to be transmitted to all other fault-free processors. We assume that the fraction f/n of faulty processors is bounded by a constant γ<1 . Transmissions are fault free. Faults are assumed to be of the crash type: faulty processors do not send or receive messages. We use the whispering model: pairs of processors communicating in one round must form a matching. A fault-free processor sending a message to another processor becomes aware of whether this processor is faulty or fault free and can adapt future transmissions accordingly. The main result of the paper is a broadcasting algorithm working in O( log n) rounds and using O(n) messages of logarithmic size, in the worst case. This is an improvement of the result from [17] where O ((log n) 2 ) rounds were used. Our method also gives the first algorithm for adaptive distributed fault diagnosis in O( log n) rounds. Received May 1997; revised May 1998.  相似文献   

12.
Summary.  In this paper, we prove a lower bound on the number of rounds required by a deterministic distributed protocol for broadcasting a message in radio networks whose processors do not know the identities of their neighbors. Such an assumption captures the main characteristic of mobile and wireless environments [3], i.e., the instability of the network topology. For any distributed broadcast protocol Π, for any n and for any Dn/2, we exhibit a network G with n nodes and diameter D such that the number of rounds needed by Π for broadcasting a message in G is Ω(D log n). The result still holds even if the processors in the network use a different program and know n and D. We also consider the version of the broadcast problem in which an arbitrary number of processors issue at the same time an identical message that has to be delivered to the other processors. In such a case we prove that, even assuming that the processors know the network topology, Ω(n) rounds are required for solving the problem on a complete network (D=1) with n processors. Received: August 1994 / Accepted: August 1996  相似文献   

13.
《国际计算机数学杂志》2012,89(11):1373-1383
In this study, a circularization network flow problem with m?+?n?+?2 nodes and (m?+?1)(n?+?1) arcs is described as a planar four-index transportation problem of order 1?×?m?×?n?×?1. Construction and several algebraic characterizations of the planar four-index transportation problem of order 1?×?m?×?n?×?1 are investigated using the generalized inverse and singular value decomposition of its coefficient matrix. The results are compared with some results we obtained on the transportation problem with m sources and n destinations. It is shown that these problems can be solved in terms of eigenvectors of the matrices J m and J n , where J m is a m?×?m matrix whose entries are 1.  相似文献   

14.
A parallel heuristic algorithm for traffic control problems in three-stage connecting networks is presented in this paper. A three-stage connecting network consists of an input crossbar switching stage, an intermediate crossbar switching stage, and an output crossbar switching stage. The goal of our algorithm is to quickly and efficiently find a conflict-free switching assignment for communication demands through the network. The algorithm requires n2 × m processing elements for the network composed of n input/output switches and m intermediate switches, where it runs not only on a sequential machine, but also on a parallel machine with maximally n2 × m processors. The algorithm was verified by 1100 simulation runs with the network size from 102 × 7 to 502 × 27. The simulation results show that the algorithm can find a solution in nearly constant time with n2 × m processors.  相似文献   

15.
Rapid computation of the QR factorization of a matrix is fundamental to many scientific and engineering problems. The paper presents a family of algorithms parameterized by the number of processors available P, arithmetic grain aggregation parameters g1, g2, …, gP, and communication grain aggregation parameter h, which computer the QR factorization of a matrix A ∈ Cm × n with minimal latency. The approach is particularly well suited for dedicated distributed memory architectures such as linear arrays of INMOS Transputers, Texas Instruments C40s or Analog Devices 21060s.  相似文献   

16.
This paper considers the Byzantine agreement problem in a completely connected network of anonymous processors. In this network model the processors have no identifiers and can only detect the link through which a message is delivered. We present a polynomial-time agreement algorithm that requires 3(nt)t/(n−2t)+4 rounds, where n>3t is the number of processors and t is the maximal number of faulty processors that the algorithm can tolerate. We also present an early-stopping variant of the algorithm.  相似文献   

17.
A parallel algorithm for transforming an n × n symmetric matrix to tridiagonal form is described. The algorithm implements Givens rotations on a square array of n × n processors in such a way that the transformation can be performed in time O(n log n). The processors require only nearest-neighbor communication. The reduction to tridiagonal form could be the first step in the parallel solution of the symmetric eigenvalue problem in time O(n log n).  相似文献   

18.
Previous research on developing parallel triangulation algorithms concentrated on triangulating planar point sets.O(log3 n) running time algorithms usingO(n) processors have been developed in Refs. 1 and 2. Atallah and Goodrich(3) presented a data structure that can be viewed as a parallel analogue of the sequential plane-sweeping paradigm, which can be used to triangulate a planar point set inO(logn loglogn) time usingO(n) processors. Recently Merks(4) described an algorithm for triangulating point sets which runs inO(logn) time usingO(n) processors, and is thus optimal. In this paper we develop a parallel algorithm for triangulating simplicial point sets in arbitrary dimensions based on the idea of the sequential algorithm presented in Ref. 5. The algorithm runs inO(log2 n) time usingO(n/logn) processors. The algorithm hasO(n logn) as the product of the running time and the number of processors; i.e., an optimal speed-up.  相似文献   

19.
The problem of routing packets onn 1×...×n r mesh-connected arrays or grids of processors is studied. The focus of this paper is on permutation routing where each processor contains exactly one packet initially and finally. A slight modification of permutation routing called balanced routing is also discussed. For two-dimensional grids a determinisitc routing algorithm is given forn×n meshes where each processor has a buffer of size f(n) < n. It needs 2n + O(n/f(n)) steps on grids without wrap-arounds. Hence, it is asymptoticaliy nearly optimal, and as good as randomized algorithms routing data only with high probability. Furthermore, it is demonstrated that onr-dimensional cubes of processors permutation routing can be performed asymptotically by (2r–2)n steps, which is faster than the running times of so-far known randomized algorithms and of deterministic algorithms.Partially supported by Siemens AG, München.  相似文献   

20.
A new scheme for the deterministic simulation of PRAMs in VLSI   总被引:3,自引:0,他引:3  
A deterministic scheme for the simulation of (n, m)-PRAM computation is devised. Each PRAM step is simulated on a bounded degree network consisting of a mesh-of-trees (MT) of siden. The memory is subdivided inn modules, each local to a PRAM processor. The roots of the MT contain these processors and the memory modules, while the otherO(n 2) nodes have the mere capabilities of packet switchers and one-bit comparators. The simulation algorithm makes a crucial use of pipelining on the MT, and attains a time complexity ofO(log2 n/log logn). The best previous time bound wasO(log2 n) on a different interconnection network withn processors. While the previous simulation schemes use an intermediate MPC model, which is in turn simulated on a bounded degree network, our method performs the simulation directly with a simple algorithm.This work has been supported in part by Ministero della Pubblica Istruzione of Italy under a research grant.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号