期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

On the use of virtual channels in networks of workstations withirregular topology

Silla F. Duato J. 《Parallel and Distributed Systems, IEEE Transactions on》2000,11(8):813-828

Networks of workstations are becoming increasingly popular as a cost-effective alternative to parallel computers. Typically, these networks connect workstations using irregular topologies, providing the wiring flexibility, scalability, and incremental expansion capability required in this environment. Recently, we proposed two methodologies for the design of adaptive routing algorithms for networks with irregular topology, as well as fully adaptive routing algorithms for these networks. These algorithms increase throughput considerably with respect to previously existing ones, but require the use of at least two virtual channels. In this paper, we propose a very efficient flow control protocol to support virtual channels when link wires are very long and/or have different lengths. This flow control protocol relies on the use of channel pipelining and control flits. Control traffic is minimized by assigning physical bandwidth to virtual channels until the corresponding message blocks or it is completely transmitted. Simulation results show that this flow control protocol performs as efficiently as an ideal network with short wires and flit-by-flit multiplexing. The effect of additional virtual channels per physical channel has also been studied, revealing that the optimal number of virtual channels varies with network size. The use of virtual channel priorities is also analyzed. The proposed flow control protocol may increase short message latency, due to long messages monopolizing channels and hindering the progress of short messages. Therefore, we have analyzed the impact of limiting the number of flits (block size) that a virtual channel may forward once it gets the link. Simulation results show that limiting the maximum block size causes the overall network performance to decrease 相似文献

2.

A theory of deadlock-free adaptive multicast routing in wormholenetworks

Duato J. 《Parallel and Distributed Systems, IEEE Transactions on》1995,6(9):976-987

A theory for the design of deadlock-free adaptive routing algorithms for wormhole networks, proposed by the author (1991, 1993), supplies sufficient conditions for an adaptive routing algorithm to be deadlock-free, even when there are cyclic dependencies between channels. Also, two design methodologies were proposed. Multicast communication refers to the delivery of the same message from one source node to an arbitrary number of destination nodes. A tree-like routing scheme is not suitable for hardware-supported multicast in wormhole networks because it produces many headers for each message, drastically increasing the probability of a message being blocked. A path-based multicast routing model was proposed by Lin and Ni (1991) for multicomputers with 2D-mesh and hypercube topologies. In this model, messages are not replicated at intermediate nodes. This paper develops the theoretical background for the design of deadlock-free adaptive multicast routing algorithms. This theory is valid for wormhole networks using the path-based routing model. It is also valid when messages with a single destination and multiple destinations are mixed together. The new channel dependencies produced by messages with several destinations are studied. Also, two theorems are proposed, developing conditions to verify that an adaptive multicast routing algorithm is deadlock-free, even when there are cyclic dependencies between channels. As an example, the multicast routing algorithms of Lin and Ni are extended, so that they can take advantage of the alternative paths offered by the network 相似文献

3.

Wait-free deflection routing of long messages

Kucera L. 《Parallel and Distributed Systems, IEEE Transactions on》2001,12(5):476-488

In order to obtain the lowest possible latency, routing algorithms should try to avoid a message waiting for resources (network links) blocked by other messages or multiplexing of more messages over one physical channel. This requirement becomes especially important in the case of long messages. The only type of protocols able to guarantee waiting free routing under heavy load are algorithms based on deflection (also called nonminimal adaptive or hot potato) routing. This paper deals with problems connected with the use of deflection algorithms. In contrast to the case of nonadaptive or partially (e.g., minimal) adaptive routing, it is very infrequent that an unrestricted deflection routing becomes deadlocked and, similarly, livelock is not a serious problem. On the other hand, there is another phenomenon, called a deflection jam, that limits throughput of deflection algorithms used to route long messages. It has been observed for many deflection heuristics, interconnection network topologies, and both virtual cut-through and wormhole routing. A deflection jam is a sudden and persistent saturation of a network which sometimes occur, after a very long period of undisturbed communication. This paper describes events that trigger this saturation which suggest ways to design improved and stable deflection routing algorithms 相似文献

4.

An effective methodology to improve the performance of the up*/down* routing algorithm

Sancho J.C. Robles A. Duato J. 《Parallel and Distributed Systems, IEEE Transactions on》2004,15(8):740-754

Networks of workstations (NOWs) are being considered as a cost-effective alternative to parallel computers. Most NOWs are arranged as a switch-based network and provide mechanisms for discovering the network topology. Hence, they provide support for both regular and irregular topologies, which makes routing and deadlock avoidance quite complicated. Current proposals use the up*/down* routing algorithm to remove cyclic dependencies between channels and avoid deadlock. However, routing is considerably restricted and most messages must follow nonminimal paths, increasing latency and wasting resources. We propose and evaluate a simple and effective methodology to compute up*/down* routing tables. The new methodology is based on computing a depth-first search (DPS) spanning tree on the network graph that decreases the number of routing restrictions with respect to the breadth-first search (BFS) spanning tree used by the traditional methodology. Additionally, we propose different heuristic rules for computing the spanning trees to improve the efficiency of up*/down* routing. Evaluation results for several different topologies show that computing the up*/down* routing tables by using the new methodology increases throughput by a factor of up to 2.48 in large networks with respect to the traditional methodology, and also reduces latency significantly. 相似文献

5.

Fully adaptive minimal deadlock-free packet routing in hypercubes,meshes, and other networks: algorithms and simulations

Pifarre G.D. Gravano L. Felperin S.A. Sanz J.L.C. 《Parallel and Distributed Systems, IEEE Transactions on》1994,5(3):247-263

This paper deals with the problem of packet-switched routing in parallel machines. Several new routing algorithms for different interconnection networks are presented. While the new techniques apply to a wide variety of networks, routing algorithms will be shown for the hypercube, the two-dimensional mesh, and the shuffle-exchange. Although the new techniques are designed for packet routing, they can be used alternatively for virtual cut-through routing models. The techniques presented for hypercubes and meshes are fully-adaptive and minimal. A fully-adaptive and minimal routing is one in which all possible minimal paths between a source and a destination are of potential use at the time a message is injected into the network. Minimal paths followed by messages ultimately depend on the local congestion encountered in each node of the network. All of the new techniques are completely free of deadlock situations 相似文献

6.

二维网格中基于最小通路区的自适应和最小容错路由算法

陈文斌杨小帆苏伟唐荣旺曾智《计算机科学》2006,33(7):292-294

网格结构是并行与分布式处理中最流行的一种网络拓扑结构。在存在故障的情况下，如何设计具有最优性的容错路由算法一直是研究的热，点问题。本文研究了采用故障块模型的二维网格的最小路由问题，提出存在最小通路的一个充分必要条件。基于最小通路区（RMP）的概念，提出一种自适应的最小容错路由算法。如果源节点和目的节点之间存在最小通路区，则在最小通路区中进行自适应最小容错路由；反之，则采用多阶段最小容错路由。主要思想就是在存在故障的情况下，尽量保证路由算法能走最短路径。因为只要求知道每个节点的局部信息，故算法是分布式的。相似文献

7.

LCFAA:一个低代价的完全自适应路由算法

刘燕孙利民 YANG Xiao-Dong 《计算机研究与发展》1999,36(3):331-336

大规模并行处理机系统（ＭＰＰ）中路由算法对互联网络通信性能和系统性能起着重要作用。自适应路由算法具有灵活性好、网络的通道利用率高和网络容错能力强等优点,但其实现难度较大,因而目前仅在少数ＭＰＰ系统中得以实现。文中在ｍｅｓｈ结构上提出了一个低代价无死锁的安全自适应最短虫孔路由算法ＬＣＦＡＡ,该算法所需虚通道数少,具有代价低、自适应性强的特点。文中证明了算法的无死锁、无活锁性和完全自适应性,并模拟验证相似文献

8.

Network Performance Under Bimodal Traffic Loads

《Journal of Parallel and Distributed Computing》1995,28(1):43-64

In actual multicomputer networks, communications consist of hybrid traffic in which messages exhibit a variety of sizes. However, to date, most studies on network performance are based on traffic loads of uniformly-sized messages. We investigate the performance of wormhole-routed networks under bimodal traffic distributions, a mix of short and long messages. Our studies show that the presence of long messages degrades network performance for short messages dramatically, qualitatively changing network behavior. We present an analytical model for wormhole-routed networks which not only models network performance under uniformly sized message loads more accurately than existing models, but also can be extended to support bimodal traffic distributions. The model is validated against detailed simulation of routing networks, over a variety of message size distributions and message lengths. In virtually all cases, the model accurately predicts both network throughput and average message latency to within 8%. Because the impact of long messages can be severe, we consider three techniques-packetization, virtual lanes, and adaptive routing-to alleviate their impact. Packetization reduces the blocking time of long messages, improving network performance in most cases. Virtual lanes and adaptive routing together provide sufficient routing freedom to eliminate much of the blocking, producing performance comparable or even superior to that produced by packetization. Together, all three techniques are complementary, providing robust performance over a variety of traffic mixes and message sizes. 相似文献

9.

Adaptive deadlock- and livelock-free routing with all minimal pathsin torus networks

Gravano L. Pifarre G.D. Berman P.E. Sanz J.L.C. 《Parallel and Distributed Systems, IEEE Transactions on》1994,5(12):1233-1251

This paper consists of two parts. In the first part, two new algorithms for deadlock- and livelock-free wormhole routing in the torus network are presented. The first algorithm, called Channels, is for the n-dimensional torus network. This technique is fully-adaptive minimal, that is, all paths with a minimal number of hops from source to destination are available for routing, and needs only five virtual channels per bidirectional link, the lowest channel requirement known in the literature for fully-adaptive minimal worm-hole routing. In addition, this result also yields the lowest buffer requirement known in the literature for packet-switched fully-adaptive minimal routing. The second algorithm, called 4-Classes, is for the bidimensional torus network. This technique is fully-adaptive minimal and requires only eight virtual channels per bidirectional link. Also, it allows for a highly parallel implementation of its associated routing node. In the second part of this paper, four worm-hole routing techniques for the two-dimensional torus are experimentally evaluated using a dynamic message injection model and different traffic patterns and message lengths 相似文献

10.

Adaptive-trail routing and performance evaluation in irregularnetworks using cut-through switches

Wenjian Qiao Ni L.M. Rokicki T. 《Parallel and Distributed Systems, IEEE Transactions on》1999,10(11):1138-1158

Cut-through switching promises low latency delivery and has been used in new generation switches, especially in high speed networks demanding low communication latency. The interconnection of cut-through switches provides an excellent network platform for high speed local area networks (LANs). For cost and performance reasons. Irregular topologies should be supported in such a switch-based network. Switched irregular networks are truly incrementally scalable and have potential to be reconfigured to adapt to the dynamics of network traffic conditions. Due to the arbitrary topologies of networks, it is critical to develop an efficient deadlock-free routing algorithm. A novel deadlock-free adaptive routing algorithm called adaptive-trail routing is proposed to allow irregular interconnection of cut-through switches. The adaptive routing algorithm is based on two unidirectional adaptive trails constructed from two opposite unidirectional Eulerian trails. Some heuristics are suggested in terms of the selection of Eulerian trails, the avoidance of long routing paths, and the degree of adaptivity. Extensive simulation experiments are conducted to evaluate the performance of the proposed and two other routing algorithms under different topologies and traffic workloads 相似文献

11.

The impact of virtual channel allocation on the performance of deterministic wormhole-routed k-ary n-cubes

S. Loucif M. Ould-Khaoua 《Simulation Modelling Practice and Theory》2002,10(8)

Virtual channels yield significant improvement in the performance of wormhole-routed networks as they can greatly reduce message blocking over network resources. K-ary n-cubes with deterministic routing have been widely analysed using analytical modelling tools. Most existing models, however, have either entirely ignored the effects of virtual channel multiplexing or have not considered the impact of virtual channels allocation on message latency. This paper discusses two different organisations of virtual channels in k-ary n-cubes, resulting in two deterministic routing algorithms. It then proposes an analytical model to compute message latency for the two routing algorithms. The proposed model is used in a case study to demonstrate the sensitivity of network latency to the way virtual channels are allocated to messages. 相似文献

12.

A survey of routing algorithm for mesh Network-on-Chip

Yue WU Chao LU Yunji CHEN 《Frontiers of Computer Science》2016,10(4):591-601

With the rapid development of semiconductor industry, the number of cores integrated on chip increases quickly, which brings tough challenges such as bandwidth, scalability and power into on-chip interconnection. Under such background, Network-on-Chip (NoC) is proposed and gradually replacing the traditional on-chip interconnections such as sharing bus and crossbar. For the convenience of physical layout, mesh is the most used topology in NoC design. Routing algorithm, which decides the paths of packets, has significant impact on the latency and throughput of network. Thus routing algorithm plays a vital role in a wellperformed network. This study mainly focuses on the routing algorithms of mesh NoC. By whether taking network information into consideration in routing decision, routing algorithms of NoC can be roughly classified into oblivious routing and adaptive routing. Oblivious routing costs less without adaptiveness while adaptive routing is on the contrary. To combine the advantages of oblivious and adaptive routing algorithm, half-adaptive algorithms were proposed. In this paper, the concepts, taxonomy and features of routing algorithms of NoC are introduced. Then the importance of routing algorithms in mesh NoC is highlighted, and representative routing algorithms with respective features are reviewed and summarized. Finally, we try to shed light upon the future work of NoC routing algorithms. 相似文献

13.

Balancing buffer utilization in meshes using a "restricted area" concept

Po-Jen Chuang Juei-Tang Chen Yue-Tsuen Jiang 《Parallel and Distributed Systems, IEEE Transactions on》2002,13(8):814-827

Adaptive routing and virtual channels are used to increase routing adaptivity in wormhole-routed two-dimensional meshes. But increasing channel buffer utilization without considering even distribution of the traffic loads tends to cause congestion in the most adaptive routing area. To avoid such traffic congestion, a concept of the restricted area is proposed. The proposed restricted area, defined to be a part of the network where message transmission concentrates, can be located following the region of adaptivity. By properly guiding message routing inside and outside the area, we are able to achieve more balanced buffer utilization and to reduce traffic congestion accordingly. The performance of several routing algorithms with or without using the restricted area is simulated and evaluated under various traffic loads and distribution patterns. The results indicate that routing algorithms with the restricted areas yield constantly larger throughput and smaller latency than routing algorithms without using the concept. 相似文献

14.

A general methodology for direction-based irregular routing algorithms

R. Moraveji H. Sarbazi-Azad A.Y. Zomaya 《Journal of Parallel and Distributed Computing》2010

This paper presents a general methodology for generating deadlock-free routing algorithms for irregular networks. Constructing a spanning tree on the given network, assigning directions to the network channels, creating deadlock-free zones, and specifying a logical sequence of the produced deadlock-free zones are the four fundamental steps that the proposed methodology takes to generate deadlock-free and connected routing algorithms. By applying the proposed methodology with two known labeling methods we have generated six irregular routing algorithms: three of them are novel routing algorithms and three of them (the Up/Down, Left/Right, and L-turn routing algorithms) have already been proposed in the literature. Extensive simulation experiments have been performed considering various network topologies, different network sizes (considering different network nodes and network channels), various message lengths, a variety of spanning tree roots, and a wide range of message (traffic) generation rates. Simulation results show that the six routing algorithms can be divided into three pairs. Routing members of each pair show similar behavior in terms of message latencies and saturation generation rates. However, it is worth noting that for a given topology the performance of the six routing algorithms may be totally different and it mainly depends on the network topology. 相似文献

15.

Compressionless wormhole routing: an analysis for hypercube with virtual channels

A. Khonsari Author Vitae Author Vitae 《Computers & Electrical Engineering》2004,30(1):45-60

Several recent studies have shown that adaptive routing algorithms based on deadlock recovery have superior performance characteristics than those based on deadlock avoidance. Most of these studies, however, have relied on software simulation due to the lack of analytical modelling tools. In an effort towards filling this gap, this paper presents a new analytical model of compressionless routing in wormhole-routed hypercubes. This routing algorithm exploits the tight coupling between wormhole routers for flow control to detect and recover from potential deadlock situations. The advantages of compressionless routing include deadlock-free adaptive routing with no extra virtual channels, simple router design, and order-preserving message transmission. The proposed analytical model computes message latency by determining the message transmission time, blocking delay at each router, multiplexing delay at each network channel, and waiting time in the source before entering the network. The validity of the model is demonstrated by comparing analytical results with those obtained through simulation experiments. 相似文献

16.

Logic-Based Distributed Routing for NoCs

《Computer Architecture Letters》2008,7(1):13-16

The design of scalable and reliable interconnection networks for multicore chips (NoCs) introduces new design constraints like power consumption, area, and ultra low latencies. Although 2D meshes are usually proposed for NoCs, heterogeneous cores, manufacturing defects, hard failures, and chip virtualization may lead to irregular topologies. In this context, efficient routing becomes a challenge. Although switches can be easily configured to support most routing algorithms and topologies by using routing tables, this solution does not scale in terms of latency and area. We propose a new circuit that removes the need for using routing tables. The new mechanism, referred to as Logic-Based Distributed Routing (LBDR), enables the implementation in NoCs of many routing algorithms for most of the practical topologies we might find in the near future in a multicore chip. From an initial topology and routing algorithm, a set of three bits per switch output port is computed. By using a small logic block, LBDR mimics (demonstrated by evaluation) the behavior of routing algorithms implemented with routing tables. This result is achieved both in regular and irregular topologies. Therefore, LBDR removes the need for using routing tables for distributed routing, thus enabling flexible, fast and power-efficient routing in NoCs. 相似文献

17.

FRoots: A Fault Tolerant and Topology-Flexible Routing Technique

Theiss I. Lysne O. 《Parallel and Distributed Systems, IEEE Transactions on》2006,17(10):1136-1150

Existing solutions for fault-tolerant routing in interconnection networks either work for only one given regular topology, or require slow and costly network reconfigurations that do not allow full and continuous network access. In this paper, we present FRroots, a routing method for fault tolerance in topology-flexible network technologies. Our method is based on redundant paths, and can handle single dynamic faults without sending control messages other than those that are needed to inform the source nodes of the failing component. Used in a modus with local rerouting, the source nodes need not be informed and no control messages are necessary for the network to stay connected despite of a single fault. In fault-free networks under nonuniform traffic our routing method performs comparable to, or even better than, topology specific routing algorithms in regular networks like meshes and tori. FRoots does not require any other features in the switches or end nodes than a flexible routing table, and a modest number of virtual channels. For that reason, it can be directly applied to several present day technologies like InfiniBand and Advanced Switching. 相似文献

18.

Globally Adaptive Load-Balanced Routing on Tori 总被引：1，自引：0，他引：1

《Computer Architecture Letters》2004,3(1):2-2

We introduce a new method of adaptive routing on k-ary n-cubes, Globally Adaptive Load-Balance (GAL). GAL makes global routing decisions using global information. In contrast, most previous adaptive routing algorithms make local routing decisions using local information (typically channel queue depth). GAL senses global congestion using segmented injection queues to decide the directions to route in each dimension. It further load balances the network by routing in the selected directions adaptively. Using global information, GAL achieves the performance (latency and throughput) of minimal adaptive routing on benign traffic patterns and performs as well as the best obliviously load-balanced routing algorithm (GOAL) on adversarial traffic. 相似文献

19.

Routing Function and Deadlock Avoidance in a Star Graph Interconnection Network

《Journal of Parallel and Distributed Computing》1994,22(2):216-228

In this paper the properties of paths in a star graph are investigated through the analysis of the corresponding star transposition tree. The general algebraic expression for all shortest paths between any two nodes (routing function) is found, and it is shown that every shortest path consists of a number of subpaths which can be combined in an arbitrary order or even mutually nested. Further, due to the known routing function the deadlock problem is solved using the method of virtual channels. A minimal deterministic routing algorithm is developed which recognizes the structure of the path by extracting subpaths and allows optimal adaptive management of virtual channels. Finally, based on the sufficient number of virtual channels, the minimal fully adaptive routing algorithm is suggested which offers an opportunity to reroute the message a number of times, while maintaining the shortest path between two nodes. 相似文献

20.

On the design of a high-performance adaptive router for CC-NUMA multiprocessors

Puente V. Gregorio J.-A. Beivide R. Izu C. 《Parallel and Distributed Systems, IEEE Transactions on》2003,14(5):487-501

This work presents the design and evaluation of an adaptive packet router aimed at supporting CC-NUMA traffic. We exploit a simple and efficient packet injection mechanism to avoid deadlock, which leads to a fully, adaptive routing by employing only three virtual channels. In addition, we selectively use output buffers for implementing the most utilized virtual paths in order to reduce head-of-line blocking. The careful implementation of these features has resulted in a good trade-off between the network performance and hardware cost. The outcome of this research is a high-performance adaptive router (HPAR), which adequately balances the needs of parallel applications: minimal network latency at low loads and high throughput at heavy loads. The paper includes an evaluation process in which HPAR is compared with other adaptive routers using FIFO input bufferring, with or without additional virtual channels to reduce head-of-line blocking. This evaluation contemplates both the VLSI costs of each router and their performance under synthetic and real application workloads. To make the comparison fair, all the routers use the same efficient deadlock avoidance mechanism. In all the experiments, HPAR exhibited the best response among all the routers tested. Moreover, the observed packet latencies were comparable to those exhibited by simpler routers. Therefore, HPAR can be considered as a suitable candidate to implement packet interchange in next generations of CC-NUMA multiprocessors. 相似文献