首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The theory of worm routing (rather than packet routing) has recently attracted increased attention as an abstraction of the underlying communication mechanisms in many parallel machines. Routing the worms in the hot potato style is a desired form of communication in high-speed optical interconnection networks. In this work, we develop a simple method for the design of parallel hot potato worm routing algorithms. Our basic approach is to simulate known packet routing algorithms, so that in each step worms are moved around instead of packets. By plugging in known results for packet routing, we get the fastest (so far) deterministic batch worm routing algorithms. Although the results are given for permutation routing on the mesh and the hypercube, the general method can be applied to many other networks and to more general communication patterns as well. Moreover, once better routing algorithms are found for the underlying network, the worm routing algorithms improve as well.  相似文献   

2.
One of the most critical factors for lifetime and operability of ad-hoc and sensor networks is the limited amount of available energy. To this respect, minimizing the interference in the network (i.e., the overlapping of signals at network nodes) has certainly a positive effect, because it induces a reduction of the number of conflicting transmissions, and then results in an overall saving of energy consumption. Along this direction, in this paper we study the computational hardness of several interference minimization problems which arise while supporting some classic network communication patterns such as broadcasting (one-to-all), gossiping (all-to-all), and symmetric gossiping (symmetric all-to-all). In particular, concerning the non-approximability results, we prove that for any of the above communication patterns, the prominent problem of minimizing the maximum interference experienced by any node in the network is hard to approximate within better than a logarithmic factor, unless NP admits slightly superpolynomial time algorithms. On a positive side, we show that any approximation algorithm for the problem of minimizing the total transmission power assigned to the nodes in order to guarantee any of the above communication patterns, can be transformed, by maintaining the same performance ratio, into an approximation algorithm for the problem of minimizing the total interference experienced by all the nodes in the network.  相似文献   

3.
We present a randomized parallel list ranking algorithm for distributed memory multiprocessors, using a BSP type model. We first describe a simple version which requires, with high probability, log(3p)+log ln(n)=Õ(logp+log logn) communication rounds (h-relations withh=Õ(n/p)) andÕ(n/p)) local computation. We then outline an improved version that requires high probability, onlyr?(4k+6) log(2/3p)+8=Õ(k logp) communication rounds wherek=min{i?0 |ln(i+1)n?(2/3p)2i+1}. Notekn) is an extremely small number. Forn andp?4, the value ofk is at most 2. Hence, for a given number of processors,p, the number of communication rounds required is, for all practical purposes, independent ofn. Forn?1, 500,000 and 4?p?2048, the number of communication rounds in our algorithm is bounded, with high probability, by 78, but the actual number of communication rounds observed so far is 25 in the worst case. Forn?10010100 and 4?p?2048, the number of communication rounds in our algorithm is bounded, with high probability, by 118; and we conjecture that the actual number of communication rounds required will not exceed 50. Our algorithm has a considerably smaller member of communication rounds than the list ranking algorithm used in Reid-Miller’s empirical study of parallel list ranking on the Cray C-90.(1) To our knowledge, Reid-Miller’s algorithm(1) was the fastest list ranking implementation so far. Therefore, we expect that our result will have considerable practical relevance.  相似文献   

4.
The effect which the representation of the data (matrices and vectors) has on the communication patterns of preconditionings for exploitation of massively parallel architectures is discussed. Preconditioned iterative methods are used to solve the sparse linear systems generated by discretizations of partial differential equations in many areas of science and engineering. The preconditionings considered are based on nested incomplete factorization with approximate tridiagonal inverses using a two color line ordering of the discretization grid. These preconditionings can be described in terms of vector-vector to vector operations of dimension equal to half the total number of grid points.  相似文献   

5.
Collective communication operations (CCOs) are one of the most powerful tools for parallel processing on distributed memory architectures. From the theoretical viewpoint there has been a major effort in the design of optimal algorithms for these operations, especially for massive parallel processors (MPPs). However, in spite of the increasing availability of MPPs, there are just a few limited experimental checks of the different theories, so the assessment of their real value is not easy. The aim of the present paper is to address such issues for the most common CCOs, considering practical algorithms that can be included in a generic communication library. The main result is a new algorithm for building a quasi-optimal broadcast tree that is much simpler than, and as efficient as, previously available algorithms. To investigate the advantages and drawbacks of the proposed algorithms, a large set of experimental data has been collected on an IBM SP2 parallel system. The data demonstrate the efficiency of our approach in a number of interesting cases. Finally, all the experimental results have been related to the model used in designing the algorithms. © 1998 John Wiley & Sons, Ltd.  相似文献   

6.
The critical problem in creating practical online SIMD mesh routing algorithms is to minimize both the number of communication steps and the size and complexity of the queues required at each PE (processing element). Currently, the best available algorithms for likely array sizes require 16n routing steps with queue size 1; if priority queues of size 2q − 1 are allowed, the number of routing steps required is reduced to 14n/q + 2n. We present an algorithm (the MGRA), based on wormhole routing, that has routed a large number of communication patterns (all patterns tried besides a synthetically constructed worst case) in 5n routing steps with a FIFO queue of size 2. We also show that the MGRA can be modified for meshes with broadcast buses and reconfigurable broadcast buses to route in a similar number of routing steps but with a queue size of 1. A second algorithm (the CGRA) uses reconfigurable broadcast buses in implementing cut-through routing. Using the CGRA, sparse patterns are routed in a small constant number of communication steps. We prove that the MGRA has bad worst case performance, but also show that a randomizing preprocessing step can improve the predictability of the original result. Finally, we show how performance scales with changing inter- and intra-PE path widths.  相似文献   

7.
Parallelizing sparse Simplex algorithms is one of the most challenging problems in computational science. We implemented the revised Simplex algorithm with LU decomposition on the Touchstone Delta and the iPSC/2. Because of very sparse matrices and very heavy communication, the ratio of computation to communication is extremely low. It becomes necessary to carefully select parallel algorithms, partitioning patterns, and communication optimization to achieve a reasonable speedup. Satisfactory performance has been obtained for a class of LP problems with high n/m ratios.  相似文献   

8.
《Computer Communications》2001,24(15-16):1618-1625
This paper presents two network architectures with associated routing and multicast algorithms for improved performance under multicasting traffic conditions. A conditionally nonblocking network, referred to as a Clos network, forms the basis for the development of efficient multicast communication networks. The Clos network is first analyzed under multicast traffic conditions for blocking and multicast overflow probability. The analysis determines the overflow probability under two different multicast distribution assumptions. The first distribution assumes all packets request the same number of copies and the second distribution uses a random number of requested copies. An analysis of an extension of the presented network to multiplexed parallel planes of a network shows a significant improvement on the network performance and particularly on the carried traffic load when compared with previously published multicast architectures using different buffering strategies.  相似文献   

9.
Ke Yi  Qin Zhang 《Algorithmica》2013,65(1):206-223
We consider the problem of tracking heavy hitters and quantiles in the distributed streaming model. The heavy hitters and quantiles are two important statistics for characterizing a data distribution. Let A be a multiset of elements, drawn from the universe U={1,…,u}. For a given 0≤?≤1, the ?-heavy hitters are those elements of A whose frequency in A is at least ?|A|; the ?-quantile of A is an element x of U such that at most ?|A| elements of A are smaller than A and at most (1??)|A| elements of A are greater than x. Suppose the elements of A are received at k remote sites over time, and each of the sites has a two-way communication channel to a designated coordinator, whose goal is to track the set of ?-heavy hitters and the ?-quantile of A approximately at all times with minimum communication. We give tracking algorithms with worst-case communication cost O(k/??logn) for both problems, where n is the total number of items in A, and ? is the approximation error. This substantially improves upon the previous known algorithms. We also give matching lower bounds on the communication costs for both problems, showing that our algorithms are optimal. We also consider a more general version of the problem where we simultaneously track the ?-quantiles for all 0≤?≤1.  相似文献   

10.
Traditionally parallel compilers have targeted a standard message passing communication library when generating communication code (e.g. PVM, MPI). The standard message passing model dynamically reserves communication resources for each message. For regular, repeating communication patterns, a static communication resource reservation model can be more efficient. By reserving resources once for many communication exchanges, the communication startup time is better amortized. Plus, with a global view of communication, the static model has a wider choice of routes. While the static resource reservation model can be a more efficient communication target for the compiler, this model reveals the problems of scheduling use of limited communication resources. This paper uses the abstraction of a communication resource to define two resource management problems and presents three algorithms that can be used by the compiler to address these problems. Initial measures of the effectiveness of these algorithms are presented from two programs for an $8 \times 8$ iWarp system. © 1997 by John Wiley & Sons, Ltd.  相似文献   

11.
Genetic algorithms (GAs) have been applied to solve the 2-page crossing number problem successfully, but since they work with one global population, the search time and space are limited. Parallelisation provides an attractive prospect to improve the efficiency and solution quality of GAs. This paper investigates the complexity of parallel genetic algorithms (PGAs) based on two evaluation measures: computation time to communication time and population size to chromosome size. Moreover, the paper unifies the framework of PGA models with the function PGA (subpopulation size, cluster size, migration period, topology), and explores the performance of PGAs for the 2-page crossing number problem.  相似文献   

12.
This paper presents algorithms for reducing the communication overhead for parallel C programs that use dynamically allocated data structures. The framework consists of an analysis phase called possible-placement analysis, and a transformation phase called communication selection. The fundamental idea of possible-placement analysis is to find all possible points for insertion of remote memory operations. Remote reads are propagated upwards, whereas remote writes are propagated downwards. Based on the results of the possible-placement analysis, the communication selection transformation selects the “best” place for inserting the communication and determines if pipelining or blocking of communication should be performed. The framework has been implemented in the EARTH-McCAT optimizing C compiler, and experimental results are presented for five pointer-intensive benchmarks running on the EARTH-MANNA distributed-memory parallel processor. These experiments show that the communication optimization can provide performance improvements of up to 16% over the unoptimized benchmarks.  相似文献   

13.
Then-dimensional grid is one of the most representative patterns of data flow in parallel computation. Many scientific algorithms, which require nearest neighbor communication in a lattice space, are modeled by a task graph with the properties of a simple or enhanced grid. The two most frequently used scheduling models for grids are the unit execution time-zero communication delay (UET) and the unit execution time–unit communication time (UET-UCT). In this paper we introduce an enhanced model of then-dimensional grid by adding extra diagonal edges and allowing unequal boundaries for each dimension. For this generalized grid topology we establish the optimal makespan for both cases of UET/UET-UCT grids. Then we give a closed formula that calculates the minimum number of processors required to achieve the optimal makespan. Finally, we propose a low-complexity optimal time and processor scheduling strategy for both cases.  相似文献   

14.
A scalable framework for mobile real-time group communication services is developed in this paper. Examples for possible applications of this framework are mobile social networks, mobile conference calls, mobile instant messaging services, and mobile multi-player on-line games. A key requirement for enabling a real-time group communication service is the tight constraint imposed on the call delivery delay. Since establishing such communication service for a group of independent mobile users under a tight delay constraint is NP-hard, a two-tier architecture is proposed, that can meet the delay constraint imposed by the real-time service requirement for many independent mobile clients in a scalable manner. This goal is achieved by two dimensional partition of the space, first by organization and then geographically. Both the time and memory complexity associated with the location management of N mobile users are O(N) for the location management provided by the proposed framework, while a distributed scheme requires O(N2) for both time and memory complexity.  相似文献   

15.
We present randomized and deterministic algorithms for many-to-one packet routing on an n-node two-dimensional mesh under the store-and-forward model. We consider the general instance of many-to-one routing where each node is the source (resp., destination) of ? (resp., k) packets, for arbitrary values of ? and k. All our algorithms run in optimal time and use queues of only constant size at each node to store packets in transit. The randomized algorithms, however, are simpler to implement. Our result closes a gap in the literature, where time-optimal algorithms using constant-size queues were known only for the special cases ?=1 and ?=k.  相似文献   

16.
A parallel algorithm for constructing k-valued fault-tolerant diagnostic tests is described. This algorithm combines two algorithms, viz. a parallel algorithm for constructing an irredundant implication matrix designed to distinguish objects from different patterns and a parallel algorithm for constructing irredundant h-fold column coverings. The IMSLOG intelligent instrumental software (IIS), on the basis of which we construct intelligent systems for various disciplines is described. A sufficient condition for constructing diagnostic tests tolerant to the given number of measurement (entry) errors of values of characteristic features of the object under investigation is applied to ensure fault-tolerance. Suggestions for further research are given.  相似文献   

17.
《Computer Networks》2007,51(2):426-438
This paper proposes two techniques to generate test sequences to check the conformance of an implementation of a feature-rich communication system to its specification, as well as to detect the interactions between the features of the system. Concepts color span and feasible combination of features are introduced to measure the extent and possibility of the interactions between different features. Several algorithms are proposed to produce an approximate minimum-cost and minimum color span tour of the transition graph of a finite-state machine. Test generation using the proposed algorithms for the SIP-based Internet telephony end system and for the Link Management Protocol is reported.  相似文献   

18.
Crossed cubes are an important class of hypercube variants. This paper addresses how to embed a family of disjoint multi-dimensional meshes into a crossed cube. We prove that for n?4 and 1?m?⌊n/2⌋−1, a family of m2 disjoint k-dimensional meshes of size t12×t22×?×tk2 each can be embedded in an n-dimensional crossed cube with unit dilation, where and max1?i?k{ti}?n−2m−1. This result means that a family of mesh-structured parallel algorithms can be executed on a same crossed cube efficiently and in parallel. Our work extends some recently obtained results.  相似文献   

19.
The reconfigurable array with slotted optical buses (RASOB) has recently received a lot of attention from the research community. In this paper, we first discuss the reconfiguration methods and communication capabilities of the RASOB architecture. Then, we use this architecture for the implementation of efficient sorting algorithms on the 1D RASOB and the 2D RASOB. Our parallel sorting algorithm on the 1D RASOB is based on an efficient divide-and-conquer scheme. It sortsNdata items usingNprocessors inO(k) communication cycles where k is the size of the data items to be sorted in bits. We further develop a parallel sorting algorithm on the 2D RASOB based on the sorting algorithm on the 1D RASOB in conjunction with the well known Rotatesort algorithm. Similarly, this algorithm sortsNdata items on a 2D RASOB of sizeNinO(k) communication cycles. These sorting algorithms are much more efficient than state-of-the-art sorting algorithms on reconfigurable arrays of processors withelectronicbuses using the same number of processors.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号