排序方式: 共有7条查询结果,搜索用时 15 毫秒
1
1.
2.
3.
The problem of tolerating faulty nodes in hypercubes has been studied by many researchers either by using spares or by reconfiguration. Algorithms for tolerating faulty nodes and links in hypercubes are presented. The algorithms are based on using general spanning trees (GST), complete unbalanced spanning trees (CUST), and balanced spanning trees (BST) for reconfiguring the hypercube to avoid faulty nodes and links. The algorithms contain two phases: the first phase involves the construction of the spanning tree and the second one is for reconfiguring the hypercube should a faulty node be detected. The reconfiguration process consists of two basic steps. First, the faulty node is disconnected from the spanning tree. Then, a new spanning tree is constructed by reconnecting the children of the faulty node to the spanning tree. One hundred percent single fault correction (avoidance) and almost 100 percent fault correction (avoidance) of double and triple faults are achieved by the proposed algorithms for hypercubes having a dimension of n⩾6. Simulation results for the algorithm under more than three faults also are presented. For any k faulty nodes (1⩽k⩽2n-1), the reconfiguration algorithm may be applied k times to avoid these k faulty nodes as long as no combination of any two of the faults results in a blocking situation. The proposed reconfiguration algorithms tolerate all possible single-link faults. The reconfiguration algorithms are extended to tolerate (k⩽n-1) multiple faults, causing blocking situation, with a backtracking 相似文献
4.
Failures in computer systems can be often tracked down to software anomalies of various kinds. In many scenarios, it might be difficult, unfeasible, or unprofitable to carry out extensive debugging activity to spot the cause of anomalies and remove them. In other cases, taking corrective actions may led to undesirable service downtime. In this article, we propose an alternative approach to cope with the problem of software anomalies in cloud‐based applications, and we present the design of a distributed autonomic framework that implements our approach. It exploits the elastic capabilities of cloud infrastructures, and relies on machine learning models, proactive rejuvenation techniques, and a new load balancing approach. By putting together all these elements, we show that it is possible to improve both availability and performance of applications deployed to heterogeneous cloud regions and subject to frequent failures. Overall, our study demonstrates the viability of our approach, thus opening the way towards its adoption, and encouraging further studies and practical experiences to evaluate and improve it. 相似文献
5.
This study addresses the use of fault injection for explicitly removing design/implementation faults in complex fault-tolerance algorithms and mechanisms (FTAM), viz, fault-tolerance deficiency faults. A formalism is introduced to represent the FTAM by a set of assertions. This formalism enables an execution tree to be generated, where each path from the root to a leaf of the tree is a well-defined formula. The set of well-defined formulas constitutes a useful framework that fully characterizes the test sequence. The input patterns of the test sequence (fault and activation domains) then are determined to fewer specific structural criteria over the execution tree (activation of proper sets of paths). This provides a framework for generating a functional deterministic test for programs that implement complex FTAM. This methodology has been used to extend a debugging tool aimed at testing fault tolerance protocols developed by BULL France. It has been applied successfully to the injection of faults in the inter-replica protocol that supports the application-level fault-tolerance features of the architecture of the ESPRIT-funded Delta-4 project. The results of these experiments are analyzed in detail. In particular, even though the target protocol had been independently verified formally, the application of the proposed testing strategy revealed two fault-tolerance deficiency faults 相似文献
6.
Shurbanov Vladimir Avresky Dimiter Mehra Pankaj Watson W. 《The Journal of supercomputing》2002,22(2):161-173
This paper investigates the performance implications of several end-to-end flow control protocols in clusters based on the ServerNet® system-area network. The Static Window (SW) and Packet Pair (PP) flow control protocols are studied. Based on them, the Simplified Packet Pair (SPP) and the Alternating Static Window (ASW) protocols are defined. Previously, it has been proven that PP is stable for store-and-forward networks based on Rate Allocation Servers. The applicability of PP to wormhole-routing networks is studied. Simulation results for the performance characteristics are obtained and evaluated. It is shown that if high throughput is desired, ASW is the best method for controlling the average latency. On the other hand, if low throughput is acceptable, SPP can be applied to maintain low latencies. 相似文献
7.
1