期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A testbed for evaluation of fault-tolerant routing inmultiprocessor interconnection networks

Vaidya A.S. Das C.R. Sivasubramaniam A. 《Parallel and Distributed Systems, IEEE Transactions on》1999,10(10):1052-1066

This paper presents a comprehensive evaluation testbed for interconnection networks and routing algorithms using real applications. The testbed is flexible enough to implement any network topology and fault-tolerant routing algorithm, and allows the system architect to study the cost versus performance trade-offs for a range of network parameters. We illustrate its use with one fault-tolerant algorithm and analyze the performance of four shared memory applications with different fault conditions. We also show how the testbed can be used to drive future research in fault-tolerant routing algorithms and architectures by proposing and evaluating novel architectural enhancements to the network router, called path selection heuristics (PSH). We propose three such schemes and the Least Recently Used (LRU) PSH is shown to give the best performance in the presence of faults 相似文献

2.

An Efficient Fault-Tolerant Routing Methodology for Meshes and Tori

《Computer Architecture Letters》2004,3(1):3-3

In this paper we present a methodology to design fault-tolerant routing algorithms for regular direct interconnection networks. It supports fully adaptive routing, does not degrade performance in the absence of faults, and supports a reasonably large number of faults without significantly degrading performance. The methodology is mainly based on the selection of an intermediate node (if needed) for each source-destination pair. Packets are adaptively routed to the intermediate node and, at this node, without being ejected, they are adaptively forwarded to their destinations. In order to allow deadlock-free minimal adaptive routing, the methodology requires only one additional virtual channel (for a total of three), even for tori. Evaluation results for a 4 x 4 x 4 torus network show that the methodology is 5-fault tolerant. Indeed, for up to 14 link failures, the percentage of fault combinations supported is higher than 99.96%. Additionally, network throughput degrades by less than 10% when injecting three random link faults without disabling any node. In contrast, a mechanism similar to the one proposed in the BlueGene/L, that disables some network planes, would strongly degrade network throughput by 79%. 相似文献

3.

VTFTR：高维胖树中的无死锁容错路由算法

刘博阳胡舒凯施得君卢宏生《计算机工程》2022,48(12):38

随着近年来高性能计算系统规模的急剧扩大,高性能互连网络的可靠性成为愈发重要的问题。高维胖树是一种结合了胖树与多维环网优点的网络拓扑结构,凭借其良好的可扩展性与网络性能在E级时代具有广阔的应用前景。然而,目前关于高维胖树中容错路由算法的相关研究较为有限,其可靠性问题亟待解决。为提高高维胖树拓扑在高性能互连网络中的容错能力,进一步提高对应超算系统的运行效率,提出一种用于高维胖树中叶交换机故障的容错路由算法VTFTR。该算法结合转向模型与虚通道切换的思想,通过严格控制报文在无故障路径与容错路径中的转向,使用少量的容错虚通道与额外跳步实现高维胖树中的无死锁容错。实验结果表明,在单点故障情况下,VTFTR算法的容错路径较对比算法有2~4个跳步的减少,在4 096个节点规模的网络中,当叶交换机故障数量为10时,在故障叶交换机不同的分布情况下,该算法能够以1.4%~2.0%的吞吐率下降作为代价来保持全网无故障节点之间的互连。相似文献

4.

Mesh网络耐故障虫孔路由 总被引：1，自引：1，他引：0

段新明杨愚鲁《计算机科学》2007,34(11):29-31

耐故障是互连网络设计中的一个重要问题。本文提出了一种新的耐故障路由算法,并将其应用于使用虫孔交换技术的Mesh网络。由于使用了较低的路由限制,这一算法具有很强的自适应性,可以在各种不同故障域的Mesh网络中保持路由的连通性和无死锁性;由于使用了最小限度的虚拟通道,这一算法所需的缓冲器资源很少,非常适宜构建低成本的耐故障互连网络;由于根据本地故障信息进行绕行故障节点的决策,这一算法的路由决策速度较快并且易于在互连网络中实现。最后网络仿真试验显示,这一算法具有良好的平滑降级使用的性能。相似文献

5.

The W-Network: A low-cost fault-tolerant multistage interconnection network for fine-grain multiprocessing

Kevin B. Theobald 《Concurrency and Computation》1996,8(6):415-428

Large-scale multiprocessors require an efficient interconnection network to achieve good performance. This network, like the rest of the system, should be fault-tolerant (able to continue operating even when there are hardware failures). This paper presents the W-Network, a low-cost fault-tolerant MIN which is well-suited to a large multiprocessor running fine-grain parallel programs. It tolerates all single faults without any increases in latency or decreases in band-width following a fault, because it behaves just like the fault-free network even when there is a single fault. It requires only one extra port per chip, which makes it practical for a VLSI implementation. In addition, extra ports can be added for replacing faulty processors with spares. 相似文献

6.

三维可重构阵列互连资源在线分布式容错方法

王敏王友仁张砦《计算机应用研究》2013,30(8):2360-2363

设计了一种新型三维可重构阵列结构, 并且对其互连资源在线分布式容错方法进行了研究。系统由相同的功能细胞和开关块以三维结构组成, 通过在线输入测试向量对互连线进行故障定位, 并且实现故障连线分层自修复。以四位加法/减法器电路为设计实例, 对可重构阵列功能和容错能力进行验证。实验结果表明该方法可有效完成容错, 且时间开销小、容错能力强、资源利用率高。相似文献

7.

On the Construction of Fault-Tolerant Cube-Connected Cycles Networks

《Journal of Parallel and Distributed Computing》1995,25(1):98-106

This paper presents a new approach to tolerating edge faults and node faults in (CCC) networks of Cube-Connected Cycles in a worst-case scenario. Our constructions of fault-tolerant CCC networks are obtained by adding extra edges to the CCC. The main objective is to reduce the cost of the fault-tolerant network by minimizing the degree of the network. Specifically, we have two main results. (i) We have created a fault tolerant CCC that can tolerate any single fault, either a node fault or an edge fault. When the dimension of the CCC is odd, the degree of the fault tolerant graph is 4. In the even case, there is a single node per cycle that is of degree 5 and the rest are of degree 4. (ii) We have created a fault-tolerant CCC, where every node has degree y + 2, which can tolerate any 2y − 1 cube-edge faults. Our constructions are extremely efficient for the case of edge faults-they result in healthy CCC networks that utilize all of the processors. 相似文献

8.

A multipath network with cross links

《Journal of Parallel and Distributed Computing》1988,5(2):185-193

A fault-tolerant multistage interconnection network, called the H-network, and a fault-tolerant control algorithm for this network are introduced. The novel feature of this network lies in its design, which has connections not only between switching elements of successive stages but also between switching elements of the same stage. The control algorithm is a simple modification of the destination tag algorithm, but it provides for fault tolerance and is dynamic in nature. It is shown that this design technique is effective in reducing hardware, improving fault tolerance, and giving better performance than other fault-tolerant networks with comparable hardware cost. 相似文献

9.

FRoots: A Fault Tolerant and Topology-Flexible Routing Technique

Theiss I. Lysne O. 《Parallel and Distributed Systems, IEEE Transactions on》2006,17(10):1136-1150

Existing solutions for fault-tolerant routing in interconnection networks either work for only one given regular topology, or require slow and costly network reconfigurations that do not allow full and continuous network access. In this paper, we present FRroots, a routing method for fault tolerance in topology-flexible network technologies. Our method is based on redundant paths, and can handle single dynamic faults without sending control messages other than those that are needed to inform the source nodes of the failing component. Used in a modus with local rerouting, the source nodes need not be informed and no control messages are necessary for the network to stay connected despite of a single fault. In fault-free networks under nonuniform traffic our routing method performs comparable to, or even better than, topology specific routing algorithms in regular networks like meshes and tori. FRoots does not require any other features in the switches or end nodes than a flexible routing table, and a modest number of virtual channels. For that reason, it can be directly applied to several present day technologies like InfiniBand and Advanced Switching. 相似文献

10.

Improved extra group network: a new fault-tolerant multistage interconnection network

Fathollah Bistouni Mohsen Jahanshahi 《The Journal of supercomputing》2014,69(1):161-199

Supersystems are shown to provide enough computational power to solve complex problems on a real-time basis. In all these systems, the computational parallelism is obtained from multiple processors. Multistage interconnection networks (MINs) play a vital role on the performance of these multiprocessor systems. This paper introduces a new fault-tolerant MIN named as improved extra group network (IEGN). IEGN is designed by existing extra group (EGN) network, which is a regular multipath network with limited fault tolerance. IEGN provides four times more paths between any source–destination pairs compared with EGN. The performance of IEGN has been evaluated in terms of permutation capability, fault tolerance, reliability, path length, and cost. It has also been proved that the IEGN can achieve better results in terms of fault tolerance, reliability, path length and cost-effectiveness, in comparison to known networks, namely, EGN, augmented baseline network, augmented shuffle-exchange network, fault-tolerant double tree, Benes network, and Replicated MIN. 相似文献

11.

Design of 4-disjoint gamma interconnection network layouts and reliability analysis of gamma interconnection Networks

S. Rajkumar Neeraj Kumar Goyal 《The Journal of supercomputing》2014,69(1):468-491

Multistage interconnection networks (MINs) are widely used for reliable data communication in a tightly coupled large-scale multiprocessor system. High reliability of MINs can be achieved using fault tolerance techniques. The fault tolerance is generally achieved by disjoint paths available through multiple connectivity options. The gamma interconnection network (GIN) is a class of fault tolerant MINs providing alternate paths for source–destination node pairs. Various 2-disjoint and 3-disjoint GIN architectures have been presented in the literature. In this paper, two new designs of 4-disjoint paths multistage interconnection networks, called 4-disjoint gamma interconnection networks (4DGIN-1 and 4DGIN-2) are proposed. The proposed 4DGINs provide four disjoint paths for each source–destination pair and can tolerate three switches/link failures in intermediate interconnection layers. Proposed designs are highly reliable GIN with higher fault-tolerant capability than other gamma networks at low cost. Terminal pair reliabilities of proposed designs and various other 2-disjoint and 3-disjoint GINs are evaluated, analyzed and compared. Reliability values of proposed designs are found higher. 相似文献

12.

Fault-tolerant, real-time communication in FDDI-based networks

Biao Chen Kamat S. Wei Zhao 《Computer》1997,30(4):83-90

The first high-speed network to meet the Safenet standard's bandwidth requirements, the Fiber Distributed Data Interface (FDDI) needs help to meet Safenet's fault tolerance requirement. Researchers have proposed a number of FDDI-based network architecture designs for improving fault tolerance. An architecture called FBRN (FDDI-Based Reconfigurable Network) provides enhanced fault tolerance by using (a) multiple FDDI networks to connect hosts, and (b) efficient fault detection and network configuration algorithms. To provide fault-tolerant real-time communication with the FBRN architecture, users must manage network resources properly. We sought to accomplish this by using a fault-tolerant, real-time management mechanism with online and offline components. We focused on achieving high performance by designing efficient and effective online and offline management algorithms to work around multiple faults 相似文献

13.

Characterization of spatial fault patterns in interconnection networks 总被引：1，自引：0，他引：1

M. Hoseiny Farahabady F. Safaei A. Khonsari M. Fathy 《Parallel Computing》2006,32(11-12):886

Parallel computers, such as multiprocessors system-on-chip (Mp-SoCs), multicomputers and cluster computers, are consisting of hundreds or thousands multiple processing units and components (such as routers, channels and connectors) connected via some interconnection network that collectively may undergo high failure rates. Therefore, these systems are required to be equipped with fault-tolerant mechanisms to ensure that the system will keep running in a degraded mode. Normally, the faulty components are coalesced into fault regions, which are classified into two major categories: convex and concave regions. In this paper, we propose the first solution to calculate the probability of occurrences of common fault patterns in torus and mesh interconnection networks which includes both convex (-shaped, □-shaped) and concave (L-shaped, T-shaped, +-shaped, H-shaped) regions. These results play a key role when studying, particularly, the performance analysis of routing algorithms proposed for interconnection networks under faulty conditions. 相似文献

14.

可重构机械臂反演时延分散容错控制 总被引：2，自引：1，他引：1

李元春陆鹏赵博《控制与决策》2012,27(3):446-450

针对存在模型参数不确定性的可重构机械臂系统执行器故障,提出一种基于反演设计与时延技术相结合的容错控制方法.该方法利用反演设计的基本思想,通过神经网络补偿子系统动力学模型中的参数不确定项和关联项.利用时延控制的逼近能力来补偿执行器的故障,使得故障发生时能及时实现容错控制.该方法具有不需要在线进行故障诊断的特点,仿真结果表明了所提出控制方法的有效性. 相似文献

15.

Fault-Tolerant Wormhole Routing with 2 Virtual Channels in Meshes

下载免费PDF全文

Ji-Peng Zhou 《计算机科学技术学报》2005,20(6):822-830

In wormhole meshes, a reliable routing is supposed to be deadlock-free and fault-tolerant. Many routing algorithms are able to tolerate a large number of faults enclosed by rectangular blocks or special convex, none of them, however, is capable of handling two convex fault regions with distance two by using only two virtual networks. In this paper, a fault-tolerant wormhole routing algorithm is presented to tolerate the disjointed convex faulty regions with distance two or no less, which do not contain any nonfaulty nodes and do not prohibit any routing as long as nodes outside faulty regions are connected in the mesh network. The processors＇ overlapping along the boundaries of different fault regions is allowed. The proposed algorithm, which routes the messages by X-Y routing algorithm in fault-free region, can tolerate convex fault-connected regions with only two virtual channels per physical channel, and is deadlock- and livelock-free. The proposed algorithm can be easily extended to adaptive routing. 相似文献

16.

胖树中的分布式动态容错路由 总被引：1，自引：0，他引：1

胡农达王达伟孙凝晖《计算机学报》2010,33(10)

面向云计算的超大规模互连网络增加了对网络容错的要求,容错已成为互连网络的重要问题.为了保证网络的高可用性和高性能,文中基于胖树网络拓扑提出了一种分布式的动态容错路由方法.该方法通过引入一套链路失效消息传播机制和一套基于链路失效信息的动态容错路由算法来实现胖树网络的分布式动态容错.相比已有方法,该方法不增加网络硬件和路由路径长度,并且具有高执行效率和高性能.实验结果表明,在m端口交换机构成的胖树中,该方法可以容忍任意m/2-1条失效链路并以高概率容忍更多条失效链路的组合,同时保持网络的高性能. 相似文献

17.

容错优先级混合式分配搜索算法 总被引：1，自引：0，他引：1

李俊曹万华阳富民涂刚卢炎生罗威《计算机研究与发展》2007,44(11):1912-1919

在实时系统中,由于任务未能及时产生正确结果将导致灾难性后果,容错对于实时系统的有效性及可靠性至关重要.基于最坏响应时间计算的可调度性分析,提出了一种容错优先级混合式分配搜索算法.这种算法通过允许替代任务既能运行在高优先级别上,又可运行在低优先级别上,有效地提高了系统的容错能力.通过实验测试,与目前所知的同类算法相比,在提高系统容错能力方面更为有效. 相似文献

18.

The deflection self-routing Delta network: a dynamically fault-tolerant high-radix multistage interconnection network

Jae-Hyun Park 《The Journal of supercomputing》2011,55(3):432-447

High-radix multistage interconnection networks are popular interconnection technologies for parallel supercomputers and cluster computers. In this paper, we presented a new dynamically fault-tolerant high-radix multistage interconnection network using a fully-adaptive self-routing. To devise the fully-adaptive self-routing for recovering the misrouting around link faults in such network, we introduce an abstract algebraic analysis of the topological structure of the high-radix Delta network. The presented interconnection network provides multiple paths by using all the links of all the stages of the network. We also present a mathematical analysis of the reliability of the interconnection network for quantitative comparison against other networks. The MTTF of 64×64 network proposed is 2.2 times greater than that of the cyclic Banyan network. The hardware cost of the proposed network is half that of the cyclic Banyan network and the 2D ring-Banyan network. 相似文献

19.

Fault-tolerant control systems design via subdivision of parameter region

Xiaozheng JIN Guanghong YANG 《控制理论与应用(英文版)》2009,7(2):127-133

This paper presents a linear matrix inequality （LMI） approach to solve the fault-tolerant control （FTC） problem of actuator faults. The range of actuator faults is considered as a parameter region and subdivided into several subregions to achieve a certain desired performance specification. Based on the integral quadratic constraint （IQC） approach, a passive fault-tolerant controller for the whole fault region and multiple fault-tolerant controllers for each fault subregion are designed for guaranteeing stability and improving performance of the FTC system, respectively. According to the estimation of parameters by FDI process, the corresponding subregion controller is chosen for the stability and optimal performance of closed-loop systems when the fault occurs. The case of incorrect estimation is also considered by comparing the performance index between the switched controller and the passive fault-tolerant controller. The proposed design technique is finally evaluated in the light of a simulation example. 相似文献

20.

Fault-tolerant hamiltonian cycles and paths embedding into locally exchanged twisted cubes

Weibei FAN Jianxi FAN Zhijie HAN Peng LI Yujie ZHANG Ruchuan WANG 《Frontiers of Computer Science》2021,15(3):153104

The foundation of information society is computer interconnection network, and the key of information exchange is communication algorithm. Finding interconnection networks with simple routing algorithm and high fault-tolerant performance is the premise of realizing various communication algorithms and protocols. Nowadays, people can build complex interconnection networks by using very large scale integration (VLSI) technology. Locally exchanged twisted cubes, denoted by (s + t + 1)-dimensional LeTQ_s,t, which combines the merits of the exchanged hypercube and the locally twisted cube. It has been proved that the LeTQ_s,t has many excellent properties for interconnection networks, such as fewer edges, lower overhead and smaller diameter. Embeddability is an important indicator to measure the performance of interconnection networks. We mainly study the fault tolerant Hamiltonian properties of a faulty locally exchanged twisted cube, LeTQ_s,t − ( f_v + f_e), with faulty vertices f_v and faulty edges f_e. Firstly, we prove that an LeTQ_s,t can tolerate up to s−1 faulty vertices and edges when embedding a Hamiltonian cycle, for s≥2, t≥3, and s≤t. Furthermore, we also prove another result that there is a Hamiltonian path between any two distinct fault-free vertices in a faulty LeTQ_s,twith up to (s − 2) faulty vertices and edges. That is, we show that LeTQ_s,t is (s−1)-Hamiltonian and (s−2)- Hamiltonian-connected. The results are proved to be optimal in this paper with at most (s − 1)-fault-tolerant Hamiltonicity and (s − 2) fault-tolerant Hamiltonian connectivity of LeTQ_s,t. 相似文献