首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We consider the problem of broadcasting multiple messages from one processor to many processors in the k-port model for message-passing systems. In such systems, processors communicate in rounds, where in every round, each processor can send k messages to k processors and receive k messages from k processors In this paper, we first present a simple and practical algorithm based on variations of k complete k-ary trees. We then present an optimal algorithm up to an additive term of one for this problem for any number of processors, any number of messages, and any value for k  相似文献   

2.
We consider the problem of computing Byzantine Agreement in a synchronous network with n processors, each with a private random string, where each pair of processors is connected by a private communication line. The adversary is malicious and non-adaptive, i.e., it must choose the processors to corrupt at the start of the algorithm. Byzantine Agreement is known to be computable in this model in an expected constant number of rounds. We consider a scalable model where in each round each uncorrupt processor can send to any set of log n other processors and listen to any set of log n processors. We define the loss of an execution to be the number of uncorrupt processors whose output does not agree with the output of the majority of uncorrupt processors. We show that if there are t corrupt processors, then any randomised protocol which has probability at least 1/2 + 1/ logn of loss less than requires at least f rounds. This also shows that lossless protocols require both rounds, and for at least one uncorrupt processor to send messages during the protocol.  相似文献   

3.
We consider the Continuously Delivery Message Dissemination (CDMD) problem over the n processor single-port complete (all links are present and are bi-directional) static network with the multicasting communication primitive. This problem has been shown to be NP-complete even when all messages have equal length. For the CDMD problem we present an efficient approximation algorithm to construct a message routing schedule with total communication time at most 3.5d, where d is the total length of the messages that each processor needs to send or receive. The algorithm takes O(qn) time, where n is the number of processors and q is the total number of messages that the processors receive.  相似文献   

4.
Reliability is an important research topic in distributed computing systems consisting of a large number of processors. To achieve reliability, the fault-tolerance scheme of the distributed computing system must be revised. This kind of problem is known as a Byzantine agreement (BA) problem. It requires all fault-free processors to agree on a common value, even if some components are corrupt. Consequently, there have been significant studies of this agreement problem in distributed systems. However, the traditional BA protocols focus on running ⌊(n−1)/3⌋+1 rounds of message exchange continuously to make each fault-free processor reach an agreement. In other words, since having a large number of messages results in a large protocol overhead, those protocols are inefficient and unreasonable, especially for some network environments which have large number of processors. In this study, we propose a novel and efficient protocol to reduce the number of messages. Our protocol can collect, compare and replace the received values to find the reliable processors and replace the values sent by the unreliable processors. Subsequently, each processor can agree on a common value through three rounds of message exchange. Furthermore, the proposed protocol can use the minimum number of messages to tolerate the maximum number of faulty components in a distributed system.  相似文献   

5.
In this paper we discuss the problem of computing a multidimensional integral on a MIMD distributed-memory multiprocessor. Adaptive quadrature is known as a good approach to the problem of achieving accuracy and reliability while attempting to minimize the number of function evaluations. The implementation makes use of dynamical data structures able to manage subinterval partition. On a distributed-memory multiprocessor, each processor is able to execute code and to manipulate data structures in its own local memory only, and data are sent from one processor to another one by explicit message-passing. Efficient implementation of an adaptive algorithm for the multidimensional quadrature on a parallel computer is quite difficult, because of the need for continuous information exchange between processors. Our algorithm is based on a global adaptive strategy which dynamically balances the workload and reduces the data communication between processors in order to use the message-passing environment efficiently. The results and timings for several tests are given.  相似文献   

6.
There are a number of models that were proposed in recent years for message passing parallel systems. Examples are the postal model and its generalization the LogP model. In the postal model a parameter λ is used to model the communication latency of the message-passing system. Each node during each round can send a fixed-size message and, simultaneously, receive a message of the same size. Furthermore, a message sent out during round r will incur a latency of λ and will arrive at the receiving node at round r+λ-1. Our goal in this paper is to bridge the gap between the theoretical modeling and the practical implementation. In particular, we investigate a number of practical issues related to the design and implementation of two collective communication operations, namely, the broadcast operation and the global combine operation. Those practical issues include, for example, (1) techniques for measurement of the value of λ on a given machine, (2) creating efficient broadcast algorithms that get the latency h and the number of nodes n as parameters and (3) creating efficient global combine algorithms for parallel machines with λ which is not an integer. We propose solutions that address those practical issues and present results of an experimental study of the new algorithms on the Inter Delta machine. Our main conclusion is that the postal model can help in performance prediction and tuning, for example, a properly tuned broadcast improves the known implementation by more than 20%  相似文献   

7.
We present efficient algorithms for two all-to-all communication operations in message-passing systems: index (or all-to-all personalized communication) and concatenation (or all-to-all broadcast). We assume a model of a fully connected message-passing system, in which the performance of any point-to-point communication is independent of the sender-receiver pair. We also assume that each processor has k⩾1 ports, through which it can send and receive k messages in every communication round. The complexity measures we use are independent of the particular system topology and are based on the communication start-up time, and on the communication bandwidth  相似文献   

8.
Consider a network of processors modeled by an n-vertex directed graph G = (V,E). Assume that the communication in the network is synchronous, i.e., occurs in discrete "rounds," and in every round every processor is allowed to pick one of its neighbors, and to send him a message. A set of terminals T ⫅ V of size |T| = k is given. The telephone k-multicast} problem requires computing a schedule with a minimal number of rounds that delivers a message from a given single processor, that generates the message, to all the processors of T. The processors of V\T may be left uninformed. The telephone multicast is a basic primitive in distributed computing and computer communication theory. In this paper we devise an algorithm that constructs a schedule with O(log k · b* + k1/2) rounds for the directed k-multicast} problem, where b* is the value of the optimum solution. This is the first algorithm with a non-trivial approximation guarantee for this problem. We show that our algorithm for the directed multicast problem can be used to derive an algorithm with a similar ratio for the directed Steiner poise problem, that is, the problem of constructing an arborescence that spans a collection T of terminals and has the minimum poise.  相似文献   

9.
K. Diks  A. Pelc 《Algorithmica》2000,28(1):37-50
We consider broadcasting among n processors, f of which can be faulty. A fault-free processor, called the source, holds a piece of information which has to be transmitted to all other fault-free processors. We assume that the fraction f/n of faulty processors is bounded by a constant γ<1 . Transmissions are fault free. Faults are assumed to be of the crash type: faulty processors do not send or receive messages. We use the whispering model: pairs of processors communicating in one round must form a matching. A fault-free processor sending a message to another processor becomes aware of whether this processor is faulty or fault free and can adapt future transmissions accordingly. The main result of the paper is a broadcasting algorithm working in O( log n) rounds and using O(n) messages of logarithmic size, in the worst case. This is an improvement of the result from [17] where O ((log n) 2 ) rounds were used. Our method also gives the first algorithm for adaptive distributed fault diagnosis in O( log n) rounds. Received May 1997; revised May 1998.  相似文献   

10.
A new approach to the design of collective communication operations in wormhole-routed mesh networks is described. The approach extends the concept of dominating sets in graph theory by accounting for the relative distance-insensitivity of the wormhole switching strategy and by taking advantage of a multiport communication architecture, which allows each node to simultaneously transmit messages on different outgoing channels. Collective communication operations are defined in terms of sets of extended dominating nodes (EDNs). The nodes in a set of EDNs can deliver (receive) messages to (from) a different, larger set of nodes in a single message-passing step under dimension-ordered wormhole routing and without channel contention among messages. The EDN model can be applied to different collective operations in 2D and 3D mesh networks. The authors focus on EDN-based broadcast and global combine operations. Performance evaluation results are presented that confirm the advantage of this approach over other methods  相似文献   

11.
We consider the gossip problem in a synchronous message-passing system. Participating processors are prone to omission failures, that is, a faulty processor may fail to send or receive a message. The gossip problem in the fault-tolerant setting is defined as follows: every correct processor must learn the initial value of any other processor, unless the other one is faulty; in the latter case either the initial value or the information about the fault must be learned. We develop two efficient algorithms that solve the gossip problem in time O(logn), where n is the number of processors in the system. The first one is an explicit algorithm (i.e., constructed in polynomial time) sending O(nlogn+f2) messages, and the second one reduces the message complexity to O(n+f2), where f is the upper bound on the number of faulty processors.  相似文献   

12.
Efficient interprocessor communication is crucial to increasing the performance of parallel computers. In this paper, a special framework is developed on thegeneralized hypercube, a network that is currently receiving considerable attention. Using this framework as the basic tool, a number of spanning subgraphs with special properties to fit various communication needs are constructed on the network. The importance of these spanning subgraphs is demonstrated with the development of optimal algorithms for four fundamental communication problems, namely, theone-to-allandall-to-all broadcastingand theone-to-allandall-to-all scattering. Broadcastingis the distribution of the same group of messages from a source processor to all other processors, andscatteringis the distribution of distinct groups of messages from a source processor to each other processor. We consider broadcasting and scattering from a single processor of the network (one-to-all broadcasting and scattering) and simultaneously from all processors of the network (all-to-all broadcasting and scattering). For the all-to-all broadcasting and scattering algorithms, a special technique is developed on the generalized hypercube so that messages originating at individual nodes are interleaved in such a manner that no two messages contend for the same edge at any given time. The communication problems are studied under thestore-and-forward, all-portcommunication model. Lower bounds are derived for the above problems under the stated assumptions, in terms of time and number of message transmissions, and optimal algorithms are designed.  相似文献   

13.
Various contiguous and noncontiguous processor allocation policies have been proposed for mesh-connected multicomputers. Contiguous allocation suffers from high external processor fragmentation because it requires that the processors allocated to a parallel job be contiguous and have the same topology as the multicomputer. The goal of lifting the contiguity condition in noncontiguous allocation is reducing processor fragmentation. However, this can increase the communication overhead because the distances traversed by messages can be longer, and messages from different jobs can interfere with each other by competing for communication resources. The extra communication overhead depends on how the allocation request is partitioned and mapped to free processors. In this paper, we investigate a new class of noncontiguous allocation schemes for two-dimensional mesh-connected multicomputers. These schemes are different from previous ones in that request partitioning is based on the submeshes available for allocation. The available submeshes selected for allocation to a job are such that a high degree of contiguity among their processors is achieved. The proposed policies are compared to previous noncontiguous policies using detailed simulations, where several common communication patterns are considered. The results show that the proposed policies can reduce the communication overhead and improve performance substantially.  相似文献   

14.
We consider the distributed complexity of the stable matching problem (a.k.a. “stable marriage”). In this problem, the communication graph is undirected and bipartite, and each node ranks its neighbors. Given a matching of the nodes, a pair of unmatched nodes is called blocking if they prefer each other to their assigned match. A matching is called stable if it does not induce any blocking pair. In the distributed model, nodes exchange messages in each round over the communication links, until they find a stable matching. We show that if messages may contain at most B bits each, then any distributed algorithm that solves the stable matching problem requires ${\Omega(\sqrt{n/B\log n})}We consider the distributed complexity of the stable matching problem (a.k.a. “stable marriage”). In this problem, the communication graph is undirected and bipartite, and each node ranks its neighbors. Given a matching of the nodes, a pair of unmatched nodes is called blocking if they prefer each other to their assigned match. A matching is called stable if it does not induce any blocking pair. In the distributed model, nodes exchange messages in each round over the communication links, until they find a stable matching. We show that if messages may contain at most B bits each, then any distributed algorithm that solves the stable matching problem requires W(?{n/Blogn}){\Omega(\sqrt{n/B\log n})} communication rounds in the worst case, even for graphs of diameter O(log n), where n is the number of nodes in the graph. Furthermore, the lower bound holds even if we allow the output to contain O(?n){O(\sqrt n)} blocking pairs, and if a pair is considered blocking only if they like each other much more then their assigned match.  相似文献   

15.
《国际计算机数学杂志》2012,89(11):1609-1619
The Array redistribution problem is the heart of a number of applications in parallel computing. This paper presents a message combining approach for scheduling runtime array redistribution of one-dimensional arrays. The important contribution of the proposed scheme is that it eliminates the need for local data reorganization, as noted by Sundar in 2001; the blocks destined for each processor are combined in a series of messages exchanged between neighbouring nodes, so that the receiving processors do not need to reorganize the incoming data blocks before storing them to memory locations. Local data reorganization is of great importance, especially in networks where there is no direct communication between all nodes (like tori, meshes, and trees). Thus, a block must travel through a number of relays before reaching the target processor. This requires a higher number of messages generated, therefore, a higher number of data permutations within the memory of each target processor should be made to assure correct data order. The strategy is based on a relation between groups of communicating processor pairs called superclasses.  相似文献   

16.
We consider the following basic communication problems in a hypercube network of processors: the problem of a single processor sending a different packet to each of the other processors, the problem of simultaneous broadcast of the same packet from every processor to all other processors, and the problem of simultaneous exchange of different packets between every pair of processors. The algorithms proposed for these problems are optimal in terms of execution time and communication resource requirements; that is, they require the minimum possible number of time steps and packet transmissions. In contrast, algorithms in the literature are optimal only within an additive or multiplicative factor.  相似文献   

17.
18.
A new randomized Byzantine agreement algorithm is presented. This algorithm operates in a synchronous system of n processors, at most t of which can fail. The algorithm reaches agreement in 0(t/log n) expected rounds and O(n2tf/log n) expected message bits independent of the distribution of processor failures. This performance is further improved to a constant expected number of rounds and O(n2) message bits if the distribution of processor failures is assumed to be uniform. In either event, the algorithm improves on the known lower bound on rounds for deterministic algorithms. Some other advantages of the algorithm are that it requires no cryptographic techniques, that the amount of local computation is small, and that the expected number of random bits used per processor is only one. It is argued that in many practical applications of Byzantine agreement, the randomized algorithm of this paper achieves superior performance.  相似文献   

19.
This paper presents a new Byzantine agreement protocol that toleratest processor faults usingO(t·logt) processors,t + 1 rounds,O(t 2 +o·t) total message bits (whereo is the number of processors that must decide), and messages of maximum size 1 bit. It is the first Byzantine agreement protocol that is simultaneously optimal in rounds, message bits, and message size. The new Byzantine agreement protocol is actually a protocol for the (slightly) more general Byzantine relay problem—a problem which we formulate in this paper. The Byzantine relay protocol is the result of a general recursive construction. Each step of the construction combines two smaller (in terms of number of faults tolerated) Byzantine relay protocols into one larger Byzantine relay protocol. The base case is a collection of very simple Byzantine relay protocols, each tolerant of a small constant number of processor faults. A key new feature of the protocol construction technique presented in this paper is that it does not add unproductive overhead rounds: given two constituent protocols that are optimal in the number of rounds, the composite protocol that is constructed is also optimal in the number of rounds.The work of Jennifer Welch was supported in part by NSF Grant CCR-9010730 and an IBM Faculty Development Award. This work was done while she was at the University of North Carolina at Chapel Hill.  相似文献   

20.
In this paper we describe a new algorithm for maintaining a balanced search tree on a message-passing MIMD architecture; the algorithm is particularly well suited for implementation on a small number of processors. We introduce a (2B-2, 2B) search tree that uses a bidirectional ring of O(log n) processors to store n entries. Update operations use a bottom-up node-splitting scheme, which performs significantly better than top-down search tree algorithms. The bottom-up algorithm requires many fewer messages and results in less blocking due to synchronization than top-down algorithms. Additionally, for a given cost ratio of computation to communication the value of B may be varied to maximize performance. Implementations on a parallel-architecture simulator are described  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号