首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
The cost of recovery in message logging protocols   总被引:1,自引:0,他引:1  
Past research in message logging has focused on studying the relative overhead imposed by pessimistic, optimistic and causal protocols during failure-free executions. In this paper, we give the first experimental evaluation of the performance of these protocols during recovery. Our results suggest that applications face a complex tradeoff when choosing a message logging protocol for fault tolerance. On the one hand, optimistic protocols can provide fast failure-free execution and good performance during recovery, but are complex to implement and can create orphan processes. On the other hand, orphan-free protocols either risk being slow during recovery (e.g. sender-based pessimistic and causal protocols) or incur a substantial overhead during failure-free execution (e.g. receiver-based pessimistic protocols). To address this tradeoff, we propose hybrid logging protocols, which are a new class of orphan-free protocols. We show that hybrid protocols perform within 2% of causal logging during failure-free execution and within 2% of receiver-based logging during recovery  相似文献   

2.
In the rollback recovery of large‐scale long‐running applications in a distributed environment, pessimistic message logging protocols enable failed processes to recover independently, though at the expense of logging every message synchronously during fault‐free execution. In contrast, coordinated checkpointing protocols avoid message logging, but they are poor in scalability with a sharply increased coordinating overhead as the system grows. With the aim of achieving efficient rollback recovery by trading off logging overhead and coordinating overhead, this paper suggests a partitioning of the system into clusters, and then presents a scheme to implement the conversion between these overheads. Using the proposed conversion, coordination can be introduced to reduce the unbearable logging overhead found in some systems, whereas proper logging can be employed to alleviate the unacceptable coordinating overhead in others. Furthermore, heuristics are introduced to address the issue of how to partition the system into clusters in order to speed up the recovery process and to improve recovery efficiency. Performance evaluation results indicate that our scheme can lower the overall system overhead effectively. Copyright © 2008 John Wiley & Sons, Ltd.  相似文献   

3.
A common approach to fault-tolerant software DSM is to take checkpoints with message logging. Our remote logging has low overhead because each node saves the coherence-related data into the memory of a remote node through a high-speed system area network. For more lightweight fault-tolerant DSM, in this paper, we mainly focused on eliminating shared memory checkpointing during failure-free execution. Each node independently takes the checkpoints of execution states and non-shared data only. When a node fails, it regenerates its pages from the remote copies in live nodes. In order to efficiently reconstruct pages, we also introduced a XOR-diffing technique. The diff logs, which have been created by XOR operations during failure-free execution, can be applicable to any version of remote copies either backward or forward for recovery. Our scheme reduces the checkpointing overhead and also alleviates the imbalance in execution times among nodes due to independent checkpointing. This research is supported by KISTEP under the National Research Laboratory program.  相似文献   

4.
A checkpoint of a process involved in a distributed computation is said to be useful if it is part of a consistent global checkpoint. In this paper, we present a quasi-synchronous checkpointing algorithm that makes every checkpoint useful. We also present an efficient asynchronous recovery algorithm based on the checkpointing algorithm. The checkpointing algorithm allows the processes to take checkpoints asynchronously and also forces the processes to take additional checkpoints in order to make every checkpoint useful. The recovery algorithm can handle concurrent failure of multiple processes. The recovery algorithm has no domino effect and a failed process needs only to roll back to its latest checkpoint and request the other processes to roll back to a consistent checkpoint. Messages are only selectively logged to cope with various types of message abnormalities that arise due to rollback and hence results in low message logging overhead. Unlike some existing algorithms, our algorithm does not use vector timestamps for tracking dependency between checkpoints and hence results in low message overhead during failure-free operation. Moreover, a process can asynchronously decide garbage checkpoints and delete them from the stable storage—garbage checkpoints are the checkpoints that are no longer required for the purpose of recovery.  相似文献   

5.
We define a simple variant of the Byzantine agreement (BA) problem, called the Failure Discovery (FD) problem, that roughly speaking, amounts to reaching BA provided that no failures are discovered. We show how a protocol for FD can be extended to one for BA, with no message overhead in the failure-free runs. We also show that, for so-calledbenign failures, if the FD protocol satisfies an additional property, the message-preserving extension to a BA protocol can be accomplished with minimal time overhead in the failure-free runs. Our results show that FD is a useful building block for BA; indeed, it has been used in this way in a companion paper (Hadzilacos and Halpern, 1993).The work of V. Hadzilacos was supported, in part, by a grant from the Natural Sciences and Engineering Research Council of Canada.  相似文献   

6.
Many distributed algorithms require knowledge of the causal relationships between events. Examples include optimistic recovery protocols, distributed debugging systems, and causal distributed shared memory. Determining causal relationships can be difficult, however, because there is no global clock and local clocks cannot be perfectly synchronized. Vector time is a useful abstraction for capturing the causal relationships between events and, unlike Lamport's logical clocks, allows identification of concurrent events. Some drawbacks of vector time include transmission and logging overhead, since the size of a vector clock is linear in the number of processes. This paper presents a technique to reduce these overheads for applications that dynamically create and destroy processes and log event information with attached vector timestamps. The reduction in logging overhead comes at the expense of a more complicated timestamp comparison protocol and more sophisticated data structures for maintaining vector time. Distributed process recovery mechanisms and debugging systems that require “on-the-fly” causality information can benefit directly from the proposed technique  相似文献   

7.
Determining order relationship between events of a distributed computation is a fundamental problem in distributed systems which has applications in many areas including debugging, visualization, checkpointing and recovery. Fidge/Mattern’s vector-clock mechanism captures the order relationship using a vector of size N in a system consisting of N processes. As a result, it incurs message and space overhead of N integers. Many distributed applications use synchronous messages for communication. It is therefore natural to ask whether it is possible to reduce the timestamping overhead for such applications. In this paper, we present a new approach for timestamping messages and events of a synchronously ordered computation, that is, when processes communicate using synchronous messages. Our approach depends on decomposing edges in the communication topology into mutually disjoint edge groups such that each edge group either forms a star or a triangle. We show that, to accurately capture the order relationship between synchronous messages, it is sufficient to use one component per edge group in the vector instead of one component per process. Timestamps for events are only slightly bigger than timestamps for messages. Many common communication topologies such as ring, grid and hypercube can be decomposed into edge groups, resulting in almost 50% improvement in both space and communication overheads. We prove that the problem of computing an optimal edge decomposition of a communication topology is NP-complete in general. We also present a heuristic algorithm for computing an edge decomposition whose size is within a factor of two of the optimal. We prove that, in the worst case, it is not possible to timestamp messages of a synchronously ordered computation using a vector containing fewer than components when N ≥ 2. Finally, we show that messages in a synchronously ordered computation can always be timestamped in an offline manner using a vector of size at most . An earlier version of this paper appeared in 2002 Proceedings of the IEEE International Conference on Distributed Computing Systems (ICDCS). The author V. K. Garg was supported in part by the NSF Grants ECS-9907213, CCR-9988225, an Engineering Foundation Fellowship. This work was done while the author C. Skawratananond was a Ph.D. student at the University of Texas at Austin.  相似文献   

8.
Distance labeling schemes are composed of a marker algorithm for labeling the vertices of a graph with short labels, coupled with a decoder algorithm allowing one to compute the distance between any two vertices directly from their labels (without using any additional information). As applications for distance labeling schemes concern mainly large and dynamically changing networks, it is of interest to study distributed dynamic labeling schemes. The current paper considers the problem on dynamic trees, and proposes efficient distributed schemes for it. The paper first presents a labeling scheme for distances in the dynamic tree model, with amortized message complexity O(log2 n) per operation, where n is the size of the tree at the time the operation takes place. The protocol maintains O(log2 n) bit labels. This label size is known to be optimal even in the static scenario. A more general labeling scheme is then introduced for the dynamic tree model, based on extending an existing static tree labeling scheme to the dynamic setting. The approach fits a number of natural tree functions, such as distance, separation level, and flow. The main resulting scheme incurs an overhead of an O(log n) multiplicative factor in both the label size and amortized message complexity in the case of dynamically growing trees (with no vertex deletions). If an upper bound on n is known in advance, this method yields a different tradeoff, with an O(log2 n/log log n) multiplicative overhead on the label size but only an O(log n/log log n) overhead on the amortized message complexity. In the fully dynamic model the scheme also incurs an increased additive overhead in amortized communication, of O(log2 n) messages per operation.  相似文献   

9.
Efficient checkpointing and resumption of multicomputer applications is essential if multicomputers are to support time-sharing and the automatic resumption of jobs after a system failure. We present a checkpointing scheme that is transparent, imposes overhead only during checkpoints, requires minimal message logging, and allows for quick resumption of execution from a checkpointed image. Furthermore, the checkpointing algorithm allows each processorp to continue running the application being checkpointed except during the time thatp is actively taking a local snapshot, and requires no global stop or freeze of the multicomputer. Since checkpointing multicomputer applications poses requirements different from those posed by checkpointing general distributed systems, existing distributed checkpointing schemes are inadequate for multicomputer checkpointing. Our checkpointing scheme makes use of special properties of wormhole routing networks to satisfy this new set of requirements.  相似文献   

10.
Abstract. We consider the problem of designing a minimum cost access network to carry traffic from a set of endnodes to a core network. Trunks are available in K types reflecting economies of scale . A trunk type with a high initial overhead cost has a low cost per unit bandwidth and a trunk type with a low overhead cost has a high cost per unit bandwidth. We formulate the problem as an integer program. We first use a primal—dual approach to obtain a solution whose cost is within O(K 2 ) of optimal. Typically the value of K is small. This is the first combinatorial algorithm with an approximation ratio that is polynomial in K and is independent of the network size and the total traffic to be carried. We also explore linear program rounding techniques and prove a better approximation ratio of O(K) . Both bounds are obtained under weak assumptions on the trunk costs. Our primal—dual algorithm is motivated by the work of Jain and Vazirani on facility location [7]. Our rounding algorithm is motivated by the facility location algorithm of Shmoys et al. [12].  相似文献   

11.
In fault-tolerant computing, the approach of causal message logging provides on-demand stable logging and enables the independent recovery of nodes. It imposes the requirement that the dependency between non-deterministic events needs to be known for nodes. The dependency information is disseminated through messages. A central problem in traditional causal message logging is that if the message logging progress of nodes cannot be effectively tracked, some redundant dependency information will be piggybacked on the messages. In this paper, a dependency mining based approach is proposed. It tries to detect the message logging progress of nodes only from the dependency between non-deterministic events, without needing to piggyback an additional dependency vector or dependency matrix on each message.  相似文献   

12.
一种基于移动计算环境的因果日志卷回恢复算法   总被引:2,自引:0,他引:2  
由于移动节点的不可靠和无线网络连接的脆弱性,研究移动计算系统容错机制具有重要意义.对可以跨区移动、随时可以与网络断开的自治性很强的移动节点来说,异步的卷回恢复是一种重要的容错手段.现有的移动计算环境下的卷回恢复算法都无法完全实现一致的异步卷回恢复.基于因果消息日志,提出一种新的移动计算环境的卷回恢复算法:通过先行图来记录节点间的消息依赖关系,将异步检查点、基于发送方的暂存消息日志和先行图全部在移动支持站上存储和处理,为移动节点提供一种透明的容错服务,完全消除依赖关系在移动节点之间造成的影响.用形式化的方法证明了系统的一致性.仿真结果表明,在卷回开销达到最低的同时,也显著降低了无错运行时的通信和存储开销.  相似文献   

13.
Message logging is an attractive solution to provide fault tolerance for message‐passing applications because it is more scalable than coordinated checkpointing. Sender‐based message logging is a well‐known optimization that allows the saving of message payload in the sender memory. Thus, only message reception events have to be logged reliably by using an event logger. This paper proposes solutions to further improve message logging protocol scalability. In existing works on message logging, the event logger has always been considered as a centralized process. We propose a distributed event logger that takes advantage of multi‐core processors that are to be executed in parallel with application processes, leveraging the volatile memory of the nodes to save events reliably. We also propose the combination of our distributed event logger and O2P, an active optimistic message logging protocol using a gossip‐based protocol to disseminate information on new stable events. Our distributed event logger and O2P are implemented in the Open MPI library. Our results show the following: (i) distributed event logging improves message logging protocol scalability and (ii) using O2P with a distributed event logger provides an efficient and scalable fault‐tolerant solution for message‐passing applications. Copyright © 2011 John Wiley & Sons, Ltd.  相似文献   

14.
具有O(n)消息复杂度的协调检查点设置算法   总被引:9,自引:0,他引:9       下载免费PDF全文
协调检查点设置及回卷恢复技术作为一种有效的容错手段,已广泛地运用在集群等并行/分布计算机系统中.为了进一步降低协调检查点设置的时间和空间开销,提出了一种基于消息计数的协调检查点设置算法.该算法无须对底层消息通道的FIFO特性进行假设,并使同步阶段引入的控制消息复杂度由通常的O(n2)降低到O(n),有效地提高了系统的效率和扩展性.  相似文献   

15.
In a distributed computing system, message logging is widely used for providing nodes with recoverability. To reduce the piggyback overhead of traditional causal message logging, we present a zoning causal message logging approach in this paper. The crux of the approach is to control the propagation of dependency information: the nodes in the system are divided into zones, and by a message fragment mechanism, the dependency information of a node is only visible in the zone scope. Simulation results show that the piggyback overhead of the proposed approach is lower than that of traditional causal message logging.  相似文献   

16.
一种面向移动计算的低代价透明检查点恢复协议   总被引:2,自引:0,他引:2       下载免费PDF全文
移动计算系统中的检查点恢复协议面临着许多与传统分布式系统所不同的问题.在目前已出现的支持移动计算的检查点恢复机制中,基于建立全局一致的检查点的方法不能确保错误的独立恢复;基于m-MSS-m通信的消息日志方法其移动站之间交换的消息需通过移动基站的转发.提出了一种基于消息日志的支持移动站之间直接通信(m-m)的容错协议并给出了相应的算法及正确性证明.与m-MSS-m通信相比,m-m通信有利于降低信道冲突;减少消息传递延迟.仿真结果表明,所设计的协议比传统协议具有更小的无错误状态下引入负载和错误恢复时间.  相似文献   

17.
现有的回卷恢复容错技术存在同步约束和阻塞问题,其时间开销随系统节点规模的增大而剧增。为此,提出一种基于并发性发掘的低开销回卷恢复实现方法。利用消息传递附带跟踪消息依赖的策略解除消息日志中的同步约束,解析进程负载以发掘进程负载的并发性,构建进程负载并发执行的实现架构,采用数据缓存策略和多线程技术实现进程内部各负载的并发执行,以降低故障恢复开销。3个NASNPB2.3标准性能检测程序的实验结果表明,该方法可使检查点开销从0.63S、3.19S、1.21S分别降低到0.18S、O.67S、0.19S,日志开销率从13.4%、3.5%、18.3%分别降低到0.7%、0.1%、1.0%。  相似文献   

18.
The perfectly synchronized round-based model provides the powerful abstraction of crash-stop failures with atomic and synchronous message delivery. This abstraction makes distributed programming very easy. We describe a technique to automatically transform protocols devised in the perfectly synchronized round-based model into protocols for the crash, send omission, general omission or Byzantine models. Our transformation is achieved using a round shifting technique with a constant time complexity overhead. The overhead depends on the target model: crashes, send omissions, general omissions or Byzantine failures. Rather surprisingly, we show that no other automatic non-uniform transformation from a weaker model, say from the traditional crash-stop model (with no atomic message delivery), onto an even stronger model than the general-omission one, say the send-omission model, can provide a better time complexity performance in a failure-free execution.  相似文献   

19.
Casual message-logging protocols have several attractive properties: they introduce no blocking, send no additional messages over those sent by the application, and never create orphans. Causal message logging, however, does require the casual effects of the deliveries of messages to be tracked. The information concerning causality tracking is piggybacked on application messages, and the amount of such information can become large. In this paper we study the cost of tracking causality in causal message-logging protocols. One can track causality as accurately as possible, but to do so requires piggybacking a considerable amount of additional information. One can reduce the amount of piggybacked information on each message by reducing the accuracy of causality tracking. But then, causal message logging may piggyback the reduced amount of information on more messages. We specify six different methods of tracking causality, each representing a natural choice based on the specification of causal message logging. We describe how these six methods can be implemented and compare them in terms of how large of a piggyback load they impose. This load depends on the application that is using causal message logging. We characterize some applications for which a given method has the smallest piggyback load, and study using simulation the size of the piggyback load for two different models of applications. Received: July 1999 / Accepted: July 2001  相似文献   

20.
Basic message processing tasks, such as well-formedness checking and grammar validation, common in Web service messaging, can be off-loaded from the service providers’ own infrastructures. The traditional ways to alleviate the overhead caused by these tasks is to use firewalls and gateways. However, these single processing point solutions do not scale well. To enable effective off-loading of common processing tasks, we introduce the Prefix Automata SyStem — PASS, a middleware architecture which distributively processes XML payloads of web service SOAP messages during their routing towards Web servers. PASS is based on a network of automata, where PASS-nodes independently but cooperatively process parts of the SOAP message XML payload. PASS allows autonomous and pipelined in-network processing of XML documents, where parts of a large message payload are processed by various PASS-nodes in tandem or simultaneously. The non-blocking, non-wasteful, and autonomous operation of PASS middleware is achieved by relying on the prefix nature of basic XML processing tasks, such as well-formedness checking and DTD validation. These properties ensure minimal distributed processing management overhead. We present necessary and sufficient conditions for outsourcing XML document processing tasks to PASS, as well as provide guidelines for rendering suitable applications to be PASS processable. We demonstrate the advantages of migrating XML document processing, such as well-formedness checking, DTD parsing, and filtering to the network via event driven simulations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号