首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
多播路由算法对互连网络的通信性能和多处理机系统性能的发挥起着重要作用。针对基三分层互连网络,在权衡性能、成本和实现的基础上,提出一种基于树的受限多播路由算法TRMA。该算法充分利用基三分层互连网络的层次特性和节点编码中所含的网络拓扑信息实现消息路由,算法设计简单,易于硬件实现。和其他基于树的多播路由算法相比,TRMA算法不需要源节点在发送消息前构建多播树,并将多播树的信息存放在消息中,大大降低了源节点的工作负载,提高整个系统的性能。通过仿真比较了TRMA和基于单播的多播路由算法,结果表明TRMA具有较低的网络延迟和较小的网络流量。  相似文献   

2.
A new processor interconnection topology, the n-omega, is proposed in this paper. The n-omega has a fixed degree of 8 and is shown to have a diameter smaller than log2P, where P is the number of processors in the network. An elegant routing mechanism which is a simple extension to the destination tag control of the omega network is presented. The n-omega possesses good fault-tolerant properties and is amenable to easy partitioning. The proposed topology is compared against some existing distributed-memory multiprocessor networks.  相似文献   

3.
The authors introduce a multiprocessor interconnection network, known as cube-connected Mobius ladders, which has an inherently deadlock-free routing strategy and hence has none of the buffering and computational overhead required by deadlock-avoidance message passing algorithms. The basic network has a diameter φ of 4n-1 for n2n+2 nodes and has a fixed node degree of 4. The network can be interval routed in two stages and can be represented as a Cayley graph. This is the only practical fixed degree topology of size O(2φ) which has an inherently deadlock-free routing strategy, making it ideally suited for medium and large sized transputer networks  相似文献   

4.
The Journal of Supercomputing - Multistage interconnection networks are used as a medium for interconnecting processors and memories in multiprocessor systems. Multistage interconnection networks...  相似文献   

5.
高速互连网络是高性能计算系统的重要组成部分.随着网络规模需求的扩大,如何搭建更大规模的网络是高速互连网络拓扑结构设计的关键.因此,提出一种新型层次化的拓扑结构Paleyfly(PF),其结合了Paley图强正则的特性和Random Regular(RR)图支持任意规模大小的特点.相比其他新型高速互连网络拓扑结构,Paleyfly能够有效解决在路由芯片端口数受限的背景下,Dragonfly(DF)可扩展性受限、Fat tree(Ft)物理成本高、RR结构物理布局难、路由表规模大等问题.同时,根据强正则属性在路由策略上负载均衡的优势,提出了4种路由策略来解决网络的拥塞问题.最后,通过模拟器实验比较分析PF结构与其他拓扑结构及PF结构不同路由策略的性能,验证了PF结构在不同规模以及不同通信模式配置下网络延迟优于RR结构.  相似文献   

6.
Multistage interconnection networks (MINs) have been widely used in multiprocessor systems, and recently they have been adopted as a way to construct ATM switches for broadband networks. In such systems, the fault-tolerant ability is an important issue. Many researchers have proposed ways to enhance the reliability of MINs, among them a low-cost and efficient way is to use multipass routing schemes in MINs in which thedynamic full access(DFA) property exists. The performance of multipass routing, however, has been largely ignored by researchers in the past. In this paper, we show that multipass routing may degrade the system performance if the communication loads are not well balanced among processors; congestion may appear in some processors and the useful communication bandwidth is badly affected. We propose methods to design DFA routing schemes that are load-balanced and thus can utilize system resources (i.e., the bandwidth) more efficiently.  相似文献   

7.
Surmann  H. Ungering  A.P. 《Micro, IEEE》1995,15(4):40-48
The simulation and optimization of fuzzy systems with neural networks or genetic algorithms on general-purpose processors require fast implementations of fuzzy rule-based systems. We present several adapted implementation concepts that precisely analyze the fuzzy algorithm and are based on lookup tables, optimized rule processing, and digital pulse-duration modulation. These concepts allow general-purpose processors to produce solutions quicker than the second generation of special fuzzy processors  相似文献   

8.
The work performed by a parallel algorithm is the product of its running time and the number of processors it requires. This paper presents work-efficient (or cost-optimal) routing algorithms to determine the switch settings for realizing permutations on rearrangeable symmetrical networks such as Benes and the reduced Ω NΩN-1. These networks have 2n-1 stages with N=2n inputs/outputs, each stage consisting of N/2 crossbar switches of size (2×2). Previously known parallel routing algorithms for a rearrangeable network with N inputs determine the states of all switches recursively in O(n) iterations using N processors. Each iteration determines the switch settings of at most two stages of the network and requires at least O(n) time on a computer of N processors, regardless of the type of its interconnection network. Hence, the work of any previously known parallel routing algorithm equals at least O(Nn2) for setting up all the switches of a rearrangeable network. The new routing algorithms run on a computer of p processors, 1⩽p⩽N/n, and perform work O(Nn). Moreover, because the range of p is large, the new routing algorithms do not have to be changed in case some processors become faulty  相似文献   

9.
As transistor sizes shrink, interconnects represent an increasing bottleneck for chip designers. Several groups are developing new interconnection methods and system architectures to cope with this trend. New architectures require new methods for high-level application mapping and hardware/software codesign. We present high-level scheduling and interconnect topology synthesis techniques for embedded multiprocessor systems-on-chip that are streamlined for one or more digital signal processing applications. That is, we seek to synthesize an application-specific interconnect topology. We show that flexible interconnect topologies utilizing low-hop communication between processors offer advantages for reduced power and latency. We show that existing multiprocessor scheduling algorithms can deadlock if the topology graph is not strongly connected, or if a constraint is imposed on the maximum number of hops allowed for communication. We detail an efficient algorithm that can be used in conjunction with existing scheduling algorithms for avoiding this deadlock. We show that it is advantageous to perform application scheduling and interconnect synthesis jointly, and present a probabilistic scheduling/interconnect algorithm that utilizes graph isomorphism to pare the design space.  相似文献   

10.
胖树是最重要的互连网络拓扑结构之一。针对胖树拓扑结构,已经提出了多种路由算法,其中OSRM被证明是一种最优化的路由算法,但是所有算法都忽略了网络链路故障的易诊断性。为此,提出一种对OSRM改进的新型路由算法BT-OSRM。该算法定义了节点间的大小关系并通过比较节点大小而从OSRM路由路径与其反向路径中选择路由路径。此外,还针对常用的2级和3级胖树结构,分别详细给出了BT-OSRM2和BT-OSRM3路由算法。理论分析表明,BT OSRM路由算法不但继承了OSRM路由算法无死锁、负载均衡和性能最优等优点,而且保证了任意两节点间的路由路径具有原路返回特性,从而提高了网络故障链路的易诊断性。  相似文献   

11.
片上网络互连拓扑综述   总被引:1,自引:0,他引:1  
随着器件、工艺和应用技术的不断发展,片上多处理器已经成为主流技术,而且片上多处理器的规模越来越大、片内集成的处理器核数目越来越多,用于片内处理器核及其它部件之间互连的片上网络逐渐成为影响片上多处理器性能的瓶颈之一。片上网络的拓扑结构定义网络内部结点的物理布局和互连方法,决定和影响片上网络的成本、延迟、吞吐率、面积、容错能力和功耗等,同时影响网络路由策略和网络芯片的布局布线方法,是片上网络研究中的关键之一。对比了不同片上网络的拓扑结构,分析了各种结构的性能,并对未来片上网络拓扑研究提出建议。  相似文献   

12.
Communication between processors is one of the most important issues in parallel and distributed systems. The authors study the communication aspects of a well known multiprocessor structure, the cube-connected cycles (CCC). Only nonoptimal routing algorithms and bounds on the diameter of restricted subclasses of the CCC have been presented in earlier work. The authors present an optimal routing algorithm for the general CCC, with a formal proof of its optimality. Based on this routing algorithm, they derive the exact network diameter for the general CCC  相似文献   

13.
Optimally fault-tolerant partial-connection multiple-bus networks and their fault-tolerant routing algorithms are presented in this paper. The proposed networks are scalable and provide flexibility in the choice of network parameters determining construction cost, system performance, and fault tolerance, given a fixed number of processors. In this design, when performance begins to fall due to contention, the simple addition of a bus can improve performance without adding costly processors or changing the whole topology, as required for other multiple-bus designs. Also, in situations requiring high reliability, for a fixed number of processors, excellent fault tolerance can be obtained  相似文献   

14.
As the VLSI technology makes large crossbar switches affordable, Clos networks have become a feasible option of large interconnection networks. However, to make these networks practical and useful, efficient routing algorithms need to be developed. This paper will develop and study several randomized routing algorithms for Clos networks. The algorithms are based on the idea that if the first column of Clos is set to some configuration somehow, then the resulting network becomes self-routed using the destination addresses. Each of the randomized algorithms sets the first column to a configuration selected by a random process. The algorithms are then self-routed and take no computation time to set the switches. Probabilistic analysis and simulation measurements of the communication delay of permutation routing are conducted. It is shown that the communication delay of any permutation is 3–6 cycles in networks of up to 1024 processors. Although other routing algorithms route arbitrary permutations in one cycle over Clos/Benes networks and 2 cycles over δ networks, these algorithms take prohibitively large times to compute the appropriate switch settings, while our randomized algorithms are self-routed and spend NO time on computing the switch settings. This makes our algorithms superior to any universal nonrandomized routing algorithm for Clos/Benes networks or δ networks. The speed, universality and ease of implementation of our randomized algorithms make Clos networks highly attractive for large parallel computer systems.  相似文献   

15.
段新明  武继刚  张大坤 《计算机科学》2012,39(2):115-117,153
在应用于大规模并行计算机的互连网络的设计中,容错问题是其中的一个关键问题和难点问题。提出了一种基于Torus虫孔交换网络的容错路由算法,这一算法使用了矩形故障模型,无论故障区域大小多少和如何分布,算法始终是无死锁的,而且具有足够的自适应性,只要故障节点没有断开网络的连接,算法就能够通过选路使消息绕过故障区域,保持路由的连通性。同时,算法仅需要使用3个额外的虚拟通道。最后算法在不同故障率的Torus网络中进行了仿真实验,结果显示这一算法具有良好的平滑降级使用的特性。  相似文献   

16.
Array Processors with Pipelined Buses (APPBs) are hybrid optical-electronic multiprocessor architectures in which message-pipelined optical buses are used for interprocessor communications. Presented in this paper is a structural variation of the basic APPB which utilizes optical switches to provide the capability of switching messages between buses without their being relayed by intermediate processors. Such switching capability eliminates the optical-electronic-optical signal conversion due to message relays and offers improved communication efficiency. We discuss routing issues, evaluate bandwidth improvement, and present efficient communications including matrix transpose, binary tree routing, and perfect shuffle which take advantage of the switching capability.  相似文献   

17.
This paper presents a novel technique for routing in wormhole-switched multiprocessor interconnection networks with clustered configuration. The network model used here consists of a set of clusters interfaced through a common central network. We assume that the central network and the clusters use independent algorithms to route messages between their internal nodes. A technique for deriving a global routing algorithm based on the local algorithms is presented, which allows the transfer of messages between any pair of nodes in the network. This proposed method is shown to be deadlock-free with two virtual channels. The clustered network model and the proposed routing technique can be used to enhance the fault tolerance capability of existing routing algorithms. In particular, we describe fault-tolerant routing methods for meshes, which can tolerate any arbitrary fault distribution without disabling connected healthy nodes  相似文献   

18.
一类层次双环网络的构造及其路由算法   总被引:1,自引:0,他引:1       下载免费PDF全文
高效互联网络的拓扑结构一直是人们关注的热点问题。提出了一类层次双环互联网络HDRN(k),给出了HDRN(k)网络的构造方法,研究了它的性质,并且通过与相关网络的比较,证实了HDRN(k)具有好的连接性、短的直径以及简单的拓扑结构,是一种实用的互联网络。另外,讨论了HDRN(k)网络的路由性质,设计了点点路由和Broadcast路由算法,证明了这两种路由算法的通信效率与层次环网络上对应算法的通信效率相比均有明显的提高。综上所述,HDRN(k)是一种具有良好拓扑性质的新型互联网络。  相似文献   

19.
This paper extends research into rhombic overlapping-connectivity interconnection networks into the area of parallel applications. As a foundation for a shared-memory non-uniform access bus-based multiprocessor, these interconnection networks create overlapping groups of processors, buses, and memories, forming a clustered computer architecture where the clusters overlap. This overlapping-membership characteristic is shown to be useful for matching parallel application communication topology to the architecture's bandwidth characteristics. Many parallel applications can be mapped to the architecture topology so that most or all communication is localized within an overlapping cluster, at the low latency of processor direct to cache (or memory) over a bus. The latency of communication between parallel threads does not degrade parallel performance or limit the graininess of applications. Parallel applications can execute with good speedup and scaling on a proposed architecture which is designed to obtain maximum advantage from the overlapping-cluster characteristic, and also allows dynamic workload migration without moving the instructions or data. Scalability limitations of bus-based shared-memory multiprocessors are overcome by judicious workload allocation schemes, that take advantage of the overlapping-cluster memberships. Bus-based rhombic shared-memory multiprocessors are examined in terms of parallel speedup models to explain their advantages and justify their use as a foundation for the proposed computer architecture. Interconnection bandwidth is maximized with bi-directional circular and segmented overlapping buses. Strategies for mapping parallel application communication topologies to rhombic architectures are developed. Analytical models of enhanced rhombic multiprocessor performance are developed with a unique bandwidth modeling technique, and are compared with the results of simulation.  相似文献   

20.
High-speed interconnection networks are essential elements for different high-performance parallel-computing systems. One of the most common interconnection network topologies is the fat-tree, whose advantages have turned it into the favorite topology of many interconnect designers. One of these advantages is the possibility of using simple but efficient routing algorithms, like the recently proposed deterministic routing algorithm referred to as DET, which offers similar (or better) performance than Adaptive Routing while reducing complexity and guaranteeing in-order packet delivery. However, as other deterministic routing proposals, DET cannot react when packets intensely contend for network resources, leading to the appearance of Head-of-Line (HoL) blocking which spoils network performance. In this paper, we describe and evaluate a simple queue scheme that efficiently reduces HoL-blocking in fat-trees using the DET routing algorithm, without significantly increasing switch complexity and required silicon area. Additionally, we propose an implementation of OBQA in a feasible switch architecture.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号