首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The work performed by a parallel algorithm is the product of its running time and the number of processors it requires. This paper presents work-efficient (or cost-optimal) routing algorithms to determine the switch settings for realizing permutations on rearrangeable symmetrical networks such as Benes and the reduced Ω NΩN-1. These networks have 2n-1 stages with N=2n inputs/outputs, each stage consisting of N/2 crossbar switches of size (2×2). Previously known parallel routing algorithms for a rearrangeable network with N inputs determine the states of all switches recursively in O(n) iterations using N processors. Each iteration determines the switch settings of at most two stages of the network and requires at least O(n) time on a computer of N processors, regardless of the type of its interconnection network. Hence, the work of any previously known parallel routing algorithm equals at least O(Nn2) for setting up all the switches of a rearrangeable network. The new routing algorithms run on a computer of p processors, 1⩽p⩽N/n, and perform work O(Nn). Moreover, because the range of p is large, the new routing algorithms do not have to be changed in case some processors become faulty  相似文献   

2.
刘信新  陈鲲 《计算机工程》2010,36(12):107-09
现有的广播算法一般采用分层的方法构建近似的最多叶子最短生成树作为广播树。分析此类算法存在的不足,提出利用分支限界的思想建立最多叶子最短生成树引导广播操作的方法。分析和仿真结果表明,与基于分层的广播算法相比,基于分支限界法的广播算法具有更低的转发比且不增加广播树的深度,能更有效地节省带宽和能量资源。  相似文献   

3.
M.G.  A.A.  M.A.  K.   《Journal of Systems Architecture》2008,54(10):919-928
A torus network has become increasingly important to multicomputer design because of its many features including scalability, low bandwidth and fixed degree of nodes. A multicast communication is a significant operation in multicomputer systems and can be used to support several other collective communication operations. This paper presents an efficient algorithm, TTPM, to find a deadlock-free multicast wormhole routing in two-dimensional torus parallel machines. The introduced algorithm is designed such that messages can be sent to any number of destinations within two start-up communication phases; hence the name Torus Two Phase Multicast (TTPM) algorithm. An efficient routing function is developed and used as a basis for the introduced algorithm. Also, TTPM allows some intermediate nodes that are not in the destination set to perform multicast functions. This feature allows flexibility in multicast path selection and therefore improves the performance. Performance results of a simulation study on torus networks are discussed to compare TTPM algorithm with a previous algorithm.  相似文献   

4.
Multicast communication services, in which the same message is delivered from a source node to an arbitrary number of destination nodes, are being provided in new-generation multicomputers. Broadcast is a special case of multicast in which a message is delivered to all nodes in the network. The nCUBE-2, a wormhole-routed hypercube multicomputer, provides hardware support for broadcast and a restricted form of multicast in which the destinations form a subcube. However, the broadcast routing algorithm adopted in the nCUBE-2 is not deadlock-free. In this paper, four multicast wormhole routing strategies for 2-D mesh multicomputers are proposed and studied. All of the algorithms are shown to be deadlock-free. These are the first deadlock-free multicast wormhole routing algorithms ever proposed. A simulation study has been conducted that compares the performance of these multicast algorithms under dynamic network traffic conditions in a 2-D mesh. The results indicate that a dual-path routing algorithm offers performance advantages over tree-based, multipath, and fixed-path algorithms  相似文献   

5.
介绍了一个称为环网维度气泡流控(TDBFC)的新型流控策略和称为环网维度气泡路由(TADBR)算法的新型自适应路由算法.在Bubble流控和DBFC流控的基础上设计了适合于环网的维度气泡流控.在环网中,如果采用TDBFC流控策略,设计的TADBR自适应路由算法可实现无死锁的最短距离的路由.对于以上结论,提供了详细的证明.最后,介绍了自行设计的模拟工具RingNetSim,该模拟器实现了TDBFC流控策略和TADBR算法.在RingNetSim上分析了TADBR算法的性能,结果显示环网维度气泡路由算法拥有较好的性能.  相似文献   

6.
Wavelength-division multiplexing (WDM) optical networks provide huge bandwidth by allowing multiple data streams transmitted simultaneously along the same optical fiber, with each stream assigned a distinct wavelength. A key issue on WDM optical networks is to minimize the number of wavelengths for communications. All-to-all broadcast (gossiping) is a fundamental communication application on computer/communication networks. It is known that the minimum numbers of wavelengths for realizing gossiping in one-hop of optical routing on the ring and the two-dimensional torus of N nodes are cN/sup 2/ and cN/sup 3/2/, c /spl ap/ 1/8, respectively. These numbers can be too large even for moderate values of N. One approach to reduce the number of wavelengths is to realize gossiping in multihops of routing. We give routing algorithms which realize gossiping in k-hops (k /spl ges/ 2) by O(N/sup 1+1/k/) wavelengths on the ring, O(N/sup 1+1/(2k)/) wavelengths on the 2D torus, and O(N/sup 1+1/(3k)/) wavelengths on the 3D torus on a simple multihop routing model. We also discuss the multihop routing for gossiping on a merge model. We give the upper bounds on the numbers of wavelengths for gossiping in two-hops and three-hops for the ring, 2D torus, and 3D torus on the merge model.  相似文献   

7.
As the VLSI technology makes large crossbar switches affordable, Clos networks have become a feasible option of large interconnection networks. However, to make these networks practical and useful, efficient routing algorithms need to be developed. This paper will develop and study several randomized routing algorithms for Clos networks. The algorithms are based on the idea that if the first column of Clos is set to some configuration somehow, then the resulting network becomes self-routed using the destination addresses. Each of the randomized algorithms sets the first column to a configuration selected by a random process. The algorithms are then self-routed and take no computation time to set the switches. Probabilistic analysis and simulation measurements of the communication delay of permutation routing are conducted. It is shown that the communication delay of any permutation is 3–6 cycles in networks of up to 1024 processors. Although other routing algorithms route arbitrary permutations in one cycle over Clos/Benes networks and 2 cycles over δ networks, these algorithms take prohibitively large times to compute the appropriate switch settings, while our randomized algorithms are self-routed and spend NO time on computing the switch settings. This makes our algorithms superior to any universal nonrandomized routing algorithm for Clos/Benes networks or δ networks. The speed, universality and ease of implementation of our randomized algorithms make Clos networks highly attractive for large parallel computer systems.  相似文献   

8.
All-to-all personalized exchange is one of the most dense collective communication patterns and occurs in many important applications in parallel computing. Previous all-to-all personalized exchange algorithms were mainly developed for hypercube and mesh/torus networks. Although the algorithms for a hypercube may achieve optimal time complexity, the network suffers from unbounded node degrees and thus has poor scalability in terms of I/O port limitation in a processor. On the other hand, a mesh/torus has a constant node degree and better scalability in this aspect, but the all-to-all personalized exchange algorithms have higher time complexity. In this paper, we propose an alternative approach to efficient all-to-all personalized exchange by considering another important type of networks, multistage networks, for parallel computing systems. We present a new all-to-all personalized exchange algorithm for a class of unique-path, self-routable multistage networks. We first develop a generic method for decomposing all-to-all personalized exchange patterns into some permutations which are realizable in these networks, and then present a new all-to-all personalized exchange algorithm based on this method. The newly proposed algorithm has O(n) time complexity for an n×n network, which is optimal for all-to-all personalized exchange. By taking advantage of fast switch setting of self-routable switches and the property of a single input/output port per processor in a multistage network, we believe that a multistage network could be a better choice for implementing all-to-all personalized exchange due to its shorter communication latency and better scalability  相似文献   

9.
This paper consists of two parts. In the first part, two new algorithms for deadlock- and livelock-free wormhole routing in the torus network are presented. The first algorithm, called Channels, is for the n-dimensional torus network. This technique is fully-adaptive minimal, that is, all paths with a minimal number of hops from source to destination are available for routing, and needs only five virtual channels per bidirectional link, the lowest channel requirement known in the literature for fully-adaptive minimal worm-hole routing. In addition, this result also yields the lowest buffer requirement known in the literature for packet-switched fully-adaptive minimal routing. The second algorithm, called 4-Classes, is for the bidimensional torus network. This technique is fully-adaptive minimal and requires only eight virtual channels per bidirectional link. Also, it allows for a highly parallel implementation of its associated routing node. In the second part of this paper, four worm-hole routing techniques for the two-dimensional torus are experimentally evaluated using a dynamic message injection model and different traffic patterns and message lengths  相似文献   

10.
All-to-all personalized communication, or complete exchange, is at the heart of numerous applications in parallel computing. It is one of the most dense communication patterns. In this paper, we consider this problem in a torus of any dimension with the wormhole-routing capability. We propose complete exchange algorithms that use optimal numbers of phases (if each side of the tori is a multiple of eight) or asymptotically optimal numbers of phases (otherwise). Interestingly, in order to achieve this, we only make weak assumptions-that a node is capable of sending and receiving at most one message at a time, and the network is capable of supporting the dimension-ordered (or e-cube) minimum routing  相似文献   

11.
Double-loop networks are widely used in computer networks. In this paper, we present an optimal message routing algorithm and an optimal fault-tolerant message routing algorithm for weighted bidirectional double-loop networks. The algorithms presented are novel, and they do not use routing tables. After a precalculation of O(log N) steps to determine network parameters, the algorithms can route messages using constant time at each node along the route. The algorithm presented can route messages in the presence of up to three faulty nodes or links. The fault-tolerant routing algorithm guarantees an optimal route in the presence of one node failure.  相似文献   

12.
We investigate a network routing problem where a probabilistic local broadcast transmission model is used to determine routing. We discuss this model's key features, and note that the local broadcast transmission model can be viewed as soft handoff for an ad-hoc network. We present results showing that an index policy is optimal for the routing problem. We extend the network model to allow for control of transmission type, and prove that the index nature of the optimal routing policy remains unchanged. We present three distributed algorithms which compute an optimal routing policy, discuss their convergence properties, and demonstrate their performance through simulation.  相似文献   

13.
拓扑结构和路由算法是影响多级交换网络性能的重要因素.在比较多种多级互连拓扑属性的基础上,提出将3D Torus结构应用于大规模交换网络设计.然后针对3D Torus交换网络中报文路由面临的两个关键问题:多路径负载均衡和报文保序,提出一种基于维序的多路径路由算法DMR(dimension-order—based multi—path routing).该算法可在保证报文顺序的同时在多条路径上平衡负载,提高交换网络吞吐率.最后通过模拟验证了算法的性能,并与维序路由和随机路由算法进行了比较.模拟结果表明,DMR算法的性能优于维序路由算法,能够达到随机路由算法性能水平,同时具有随机路由算法所不具备的报文保序特性.  相似文献   

14.
We study packet routing problems, in which we are given a set of N packets which will be sent on preselected paths with congestion C and dilation D. For store-and-forward routing, in which nodes have buffers for packets in transit, there are routing algorithms with a performance that matches the lower bound Ω(C+D). Motivated from optical networks, we study hot-potato routing in which the nodes are bufferless. Due to the lack of buffers, in hot-potato routing the packets may be delayed more than in store-and-forward routing. An interesting question is how much is the performance of routing algorithms affected by the absence of buffers. Here, we answer this question for the class of leveled networks, in which the nodes are partitioned into L+1 distinct levels. We present a randomized hot-potato routing algorithm for leveled networks, which routes the packets in O((C + L) ln9 (LN)) time with high probability. For routing problems with dilation Ω(L), and where N is a polynonial in L, this bound is within polylogarithmic factors of the lower bound Ω(C+L). Our algorithm demonstrates that for such routing problems the benefit from using buffers is no more than polylogarithmic; thus, hot-potato routing is an efficient way to route packets in leveled networks. In hot-potato routing, due to the lack of buffers, the packets may not be able to remain on their preselected paths during the course of routing (while in store-and-forward routing the packets remain on their preselected paths). However, in our algorithm the actual path that each packet follows contains its original preselected path; thus the lower bound Ω(C+L) is also a lower bound for the new paths. Our algorithm is distributed, that is, routing decisions are taken locally at each node while packets are routed in the network. To our knowledge, this is the first hot-potato algorithm designed and analyzed, in terms of congestion and dilation, for leveled networks.  相似文献   

15.
In this paper, we develop algorithms in order of efficiency for all-to-all broadcast problem in an N=2n-node n-dimensional faulty SIMD hypercube, Qn, with up to n-1 node faults. The algorithms use a property of a certain ordering of dimensions. Our analysis includes startup time (α) and transfer time (β). We have established the lower bound for such an algorithm to be nα+(2N-3)Lβ in a faulty hypercube with at most n-1 faults (each node has a value of L bytes). Our best algorithm requires 2nα+2NLβ and is near-optimal. We develop an optimal algorithm for matrix multiplication in a faulty hypercube using all-to-all broadcast and compare the efficiency of all-to-all broadcast approach with broadcast approach and global sum approach for matrix multiplication. The algorithms are congestion-free and applicable in the context of available hypercube machines  相似文献   

16.
In this paper, we propose an efficient multipath multicast routing algorithm in wormhole-routed 2D torus networks. We first introduce a hamiltonian cycle model for exploiting the feature of torus networks. Based on this model, we find a hamiltonian cycle in torus networks. Then, an efficient multipath multicast routing algorithm with hamiltonian cycle model (mulitpath-HCM) is presented. The proposed multipath multicast routing algorithm utilizes communication channels more uniformly in order to reduce the path length of the routing messages, making the multicasting more efficient. Simulation results show that the multicast latency of the proposed multipath-HCM routing algorithm is superior to that of fixed and dual-path routing algorithms.  相似文献   

17.
认知无线Mesh 网络中QoS 约束的组播路由算法   总被引:2,自引:0,他引:2  
邝祝芳  陈志刚 《软件学报》2012,23(11):3029-3044
对认知无线Mesh网络中满足QoS约束的联合组播路由及频谱分配问题进行研究,提出了一个针对该问题的求解框架,包括问题描述、解决方案的表示、适应度函数以及频谱分配算法.基于两种具有代表性的智能计算方法:遗传算法、模拟退火,提出了两种满足端到端延迟约束的组播路由及频谱分配算法GA-MRSA和SA-MRSA.这两种算法追求的目标是最小化组播树信道冲突总数,并且在获得较低的信道冲突数的情况下,还能占用较少的信道.仿真结果表明,所提出的两种算法能够达到预期目标,获得较低的信道冲突总数.  相似文献   

18.
The current standard for intra-domain network routing, Open Shortest Path First (OSPF), suffers from a number of problems-the tunable parameters (the weights) are hard to optimize, the chosen paths are not robust under changes in traffic or network state, and some network links are over-used at the expense of others. We present prototypical scenarios that illustrate these problems. Then we propose several variants of a protocol to eliminate or alleviate them and demonstrate the improvements in performance under those scenarios. We also prove that these protocols never perform significantly worse than OSPF and show that for at least a limited class of network topologies, it is possible to find efficiently the optimal weight settings. Some of the problems with OSPF are well known; indeed, there are several routing protocols that perform better than OSPF in routing quality (i.e., in terms of congestion, delay, etc.). OSPF’s popularity persists in part because of its efficiency with respect to several resource bounds. In contrast, many competing protocols that provide routing superior to OSPF are computationally prohibitive. Motivated by this consideration, we designed our protocols not only to achieve better routing quality than OSPF, but also to use resources in amount comparable with OSPF with respect to offline broadcast communication, size of and time to compute routing tables, packet delivery latency, and packet header structure and size.  相似文献   

19.
The critical problem in creating practical online SIMD mesh routing algorithms is to minimize both the number of communication steps and the size and complexity of the queues required at each PE (processing element). Currently, the best available algorithms for likely array sizes require 16n routing steps with queue size 1; if priority queues of size 2q − 1 are allowed, the number of routing steps required is reduced to 14n/q + 2n. We present an algorithm (the MGRA), based on wormhole routing, that has routed a large number of communication patterns (all patterns tried besides a synthetically constructed worst case) in 5n routing steps with a FIFO queue of size 2. We also show that the MGRA can be modified for meshes with broadcast buses and reconfigurable broadcast buses to route in a similar number of routing steps but with a queue size of 1. A second algorithm (the CGRA) uses reconfigurable broadcast buses in implementing cut-through routing. Using the CGRA, sparse patterns are routed in a small constant number of communication steps. We prove that the MGRA has bad worst case performance, but also show that a randomizing preprocessing step can improve the predictability of the original result. Finally, we show how performance scales with changing inter- and intra-PE path widths.  相似文献   

20.
A hierarchical torus network (HTN) is a 2D-torus network of multiple basic modules, in which the basic modules are 3D-torus networks that are hierarchically interconnected for higher-level networks. The static network performance of the HTN and its dynamic communication performance using the popular dimension-order routing algorithm have already been evaluated and shown to be superior to the performance of other conventional and hierarchical interconnection networks. In this paper, we propose a link-selection algorithm for efficient use of physical links of the HTN, while keeping the link-selection algorithm as simple as the dimension-order routing algorithm. We also prove that the proposed algorithm for the HTN is deadlock-free using three virtual channels. We evaluate the dynamic communication performance of an HTN using dimension-order routing and link-selection algorithms under various traffic patterns. We find that the dynamic communication performance of an HTN using the link-selection algorithm is better than when the dimension-order routing algorithm is used.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号