首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In fault-tolerant computing, the approach of causal message logging provides on-demand stable logging and enables the independent recovery of nodes. It imposes the requirement that the dependency between non-deterministic events needs to be known for nodes. The dependency information is disseminated through messages. A central problem in traditional causal message logging is that if the message logging progress of nodes cannot be effectively tracked, some redundant dependency information will be piggybacked on the messages. In this paper, a dependency mining based approach is proposed. It tries to detect the message logging progress of nodes only from the dependency between non-deterministic events, without needing to piggyback an additional dependency vector or dependency matrix on each message.  相似文献   

2.
Casual message-logging protocols have several attractive properties: they introduce no blocking, send no additional messages over those sent by the application, and never create orphans. Causal message logging, however, does require the casual effects of the deliveries of messages to be tracked. The information concerning causality tracking is piggybacked on application messages, and the amount of such information can become large. In this paper we study the cost of tracking causality in causal message-logging protocols. One can track causality as accurately as possible, but to do so requires piggybacking a considerable amount of additional information. One can reduce the amount of piggybacked information on each message by reducing the accuracy of causality tracking. But then, causal message logging may piggyback the reduced amount of information on more messages. We specify six different methods of tracking causality, each representing a natural choice based on the specification of causal message logging. We describe how these six methods can be implemented and compare them in terms of how large of a piggyback load they impose. This load depends on the application that is using causal message logging. We characterize some applications for which a given method has the smallest piggyback load, and study using simulation the size of the piggyback load for two different models of applications. Received: July 1999 / Accepted: July 2001  相似文献   

3.
Summary. This paper formulates necessary and sufficient conditions on the information required for enforcing causal ordering in a distributed system with asynchronous communication. The paper then presents an algorithm for enforcing causal message ordering. The algorithm allows a process to multicast to arbitrary and dynamically changing process groups. We show that the algorithm is optimal in the space complexity of the overhead of control information in both messages and message logs. The algorithm achieves optimality by transmitting the bare minimum causal dependency information specified by the necessity conditions, and using an encoding scheme to represent and transmit this information. We show that, in general, the space complexity of causal 0message ordering in an asynchronous system is , where is the number of nodes in the system. Although the upper bound on space complexity of the overhead of control information in the algorithm is , the overhead is likely to be much smaller on the average, and is always the least possible. Received: January 1996 / Accepted: February 1998  相似文献   

4.
This paper presents three garbage collection schemes for causal message logging with independent checkpointing. The first scheme allows each process to autonomously remove useless log information in its volatile storage by piggybacking only some additional information without requiring any extra message and forced checkpoint. Additionally, it supports faster output commit than traditional schemes. The second scheme enables each process to remove a part of log information in the storage if more empty space is required. It reduces the number of processes participating in the garbage collection by using the size of the log information of each process. The third scheme is a hybrid scheme having the advantages of the two proposed schemes. Simulation results show that the third scheme significantly reduces the garbage collection overhead compared with the traditional schemes regardless of specific communication patterns of distributed applications.  相似文献   

5.
Fault-tolerance techniques based on checkpointing and message logging have been increasingly used in real-world applications to reduce service down-time. Most industrial applications have chosen pessimistic logging because it allows fast and localized recovery. The price that they must pay, however, is the high failure-free overhead. In this paper, we introduce the concept of K-optimistic logging where K is the degree of optimism that can be used to fine-tune the trade-off between failure-free overhead and recovery efficiency. Traditional pessimistic logging and optimistic logging then become the two extremes in the entire spectrum spanned by K-optimistic logging. Our results generalize several previously known protocols.Our approach is to prove that only dependencies on those states that may be lost upon a failure need to be tracked on-line, and so transitive dependency tracking can be performed with a variable-size vector. The size of the vector piggy-backed on a message then indicates the number of processes whose failures may revoke the message, and K corresponds to the upper bound on the vector size. Furthermore, the parameter K is dynamically tunable in response to changing system characteristics.  相似文献   

6.
一种基于移动计算环境的因果日志卷回恢复算法   总被引:2,自引:0,他引:2  
由于移动节点的不可靠和无线网络连接的脆弱性,研究移动计算系统容错机制具有重要意义.对可以跨区移动、随时可以与网络断开的自治性很强的移动节点来说,异步的卷回恢复是一种重要的容错手段.现有的移动计算环境下的卷回恢复算法都无法完全实现一致的异步卷回恢复.基于因果消息日志,提出一种新的移动计算环境的卷回恢复算法:通过先行图来记录节点间的消息依赖关系,将异步检查点、基于发送方的暂存消息日志和先行图全部在移动支持站上存储和处理,为移动节点提供一种透明的容错服务,完全消除依赖关系在移动节点之间造成的影响.用形式化的方法证明了系统的一致性.仿真结果表明,在卷回开销达到最低的同时,也显著降低了无错运行时的通信和存储开销.  相似文献   

7.
The cost of recovery in message logging protocols   总被引:1,自引:0,他引:1  
Past research in message logging has focused on studying the relative overhead imposed by pessimistic, optimistic and causal protocols during failure-free executions. In this paper, we give the first experimental evaluation of the performance of these protocols during recovery. Our results suggest that applications face a complex tradeoff when choosing a message logging protocol for fault tolerance. On the one hand, optimistic protocols can provide fast failure-free execution and good performance during recovery, but are complex to implement and can create orphan processes. On the other hand, orphan-free protocols either risk being slow during recovery (e.g. sender-based pessimistic and causal protocols) or incur a substantial overhead during failure-free execution (e.g. receiver-based pessimistic protocols). To address this tradeoff, we propose hybrid logging protocols, which are a new class of orphan-free protocols. We show that hybrid protocols perform within 2% of causal logging during failure-free execution and within 2% of receiver-based logging during recovery  相似文献   

8.
现有的回卷恢复容错技术存在同步约束和阻塞问题,其时间开销随系统节点规模的增大而剧增。为此,提出一种基于并发性发掘的低开销回卷恢复实现方法。利用消息传递附带跟踪消息依赖的策略解除消息日志中的同步约束,解析进程负载以发掘进程负载的并发性,构建进程负载并发执行的实现架构,采用数据缓存策略和多线程技术实现进程内部各负载的并发执行,以降低故障恢复开销。3个NASNPB2.3标准性能检测程序的实验结果表明,该方法可使检查点开销从0.63S、3.19S、1.21S分别降低到0.18S、O.67S、0.19S,日志开销率从13.4%、3.5%、18.3%分别降低到0.7%、0.1%、1.0%。  相似文献   

9.
A common approach to fault-tolerant software DSM is to take checkpoints with message logging. Our remote logging has low overhead because each node saves the coherence-related data into the memory of a remote node through a high-speed system area network. For more lightweight fault-tolerant DSM, in this paper, we mainly focused on eliminating shared memory checkpointing during failure-free execution. Each node independently takes the checkpoints of execution states and non-shared data only. When a node fails, it regenerates its pages from the remote copies in live nodes. In order to efficiently reconstruct pages, we also introduced a XOR-diffing technique. The diff logs, which have been created by XOR operations during failure-free execution, can be applicable to any version of remote copies either backward or forward for recovery. Our scheme reduces the checkpointing overhead and also alleviates the imbalance in execution times among nodes due to independent checkpointing. This research is supported by KISTEP under the National Research Laboratory program.  相似文献   

10.
In the rollback recovery of large‐scale long‐running applications in a distributed environment, pessimistic message logging protocols enable failed processes to recover independently, though at the expense of logging every message synchronously during fault‐free execution. In contrast, coordinated checkpointing protocols avoid message logging, but they are poor in scalability with a sharply increased coordinating overhead as the system grows. With the aim of achieving efficient rollback recovery by trading off logging overhead and coordinating overhead, this paper suggests a partitioning of the system into clusters, and then presents a scheme to implement the conversion between these overheads. Using the proposed conversion, coordination can be introduced to reduce the unbearable logging overhead found in some systems, whereas proper logging can be employed to alleviate the unacceptable coordinating overhead in others. Furthermore, heuristics are introduced to address the issue of how to partition the system into clusters in order to speed up the recovery process and to improve recovery efficiency. Performance evaluation results indicate that our scheme can lower the overall system overhead effectively. Copyright © 2008 John Wiley & Sons, Ltd.  相似文献   

11.
The decentralized peer-to-peer (P2P) technique has been widely used to implement scalable file sharing systems. It organizes nodes in a system into a structured or unstructured network. The advantages of the unstructured P2P systems are that they have lower maintenance complexity and can better adapt to node heterogeneity as well as network dynamics. However, the search process in unstructured systems is not as efficient as in structured P2P systems because the same search message may go through a node multiple times. To facilitate the complex search and improve the search efficiency, we propose a novel approach of assigning identifications to nodes in an unstructured system. Our method can prevent a node from receiving duplicate search messages and retain the low maintenance overhead for the system. The performance evaluations demonstrate that the proposed approach can improve the search efficiency of unstructured P2P systems while keeping the maintenance overhead at a comparable or even lower level, compared with the traditional unstructured systems.  相似文献   

12.
一种面向移动计算的低代价透明检查点恢复协议   总被引:2,自引:0,他引:2       下载免费PDF全文
移动计算系统中的检查点恢复协议面临着许多与传统分布式系统所不同的问题.在目前已出现的支持移动计算的检查点恢复机制中,基于建立全局一致的检查点的方法不能确保错误的独立恢复;基于m-MSS-m通信的消息日志方法其移动站之间交换的消息需通过移动基站的转发.提出了一种基于消息日志的支持移动站之间直接通信(m-m)的容错协议并给出了相应的算法及正确性证明.与m-MSS-m通信相比,m-m通信有利于降低信道冲突;减少消息传递延迟.仿真结果表明,所设计的协议比传统协议具有更小的无错误状态下引入负载和错误恢复时间.  相似文献   

13.
An optimal causal message ordering algorithm for asynchronous distributed systems was proposed by Kshemkalyani and Singhal and its optimality was proven theoretically. For a system of n processes, although the space complexity of this algorithm was shown to be O(n/sup 2/) integers, it was expected that the actual space overhead would be much less than n/sup 2/. It is difficult to determine the behavior of this algorithm by a theoretical analysis. We measure the overheads of two different implementations of the optimal causal message ordering algorithm via simulation under a wide range of system conditions. The optimal algorithm is seen to display significantly less message space overhead and log space overhead than the canonical Raynal-Schiper-Toueg algorithm.  相似文献   

14.
Many distributed algorithms require knowledge of the causal relationships between events. Examples include optimistic recovery protocols, distributed debugging systems, and causal distributed shared memory. Determining causal relationships can be difficult, however, because there is no global clock and local clocks cannot be perfectly synchronized. Vector time is a useful abstraction for capturing the causal relationships between events and, unlike Lamport's logical clocks, allows identification of concurrent events. Some drawbacks of vector time include transmission and logging overhead, since the size of a vector clock is linear in the number of processes. This paper presents a technique to reduce these overheads for applications that dynamically create and destroy processes and log event information with attached vector timestamps. The reduction in logging overhead comes at the expense of a more complicated timestamp comparison protocol and more sophisticated data structures for maintaining vector time. Distributed process recovery mechanisms and debugging systems that require “on-the-fly” causality information can benefit directly from the proposed technique  相似文献   

15.
Due to its low latency,byte-addressable,non-volatile,and high density,persistent memory (PM) is expected to be used to design a high-performance storage system.However,PM also has disadvantages such as limited endurance,thereby proposing challenges to traditional index technologies such as B+ tree.B+ tree is originally designed for dynamic random access memory (DRAM)-based or disk-based systems and has a large write amplification problem.The high write amplification is detrimental to a PM-based system.This paper proposes WO-tree,a write-optimized B+ tree for PM.WO-tree adopts an unordered write mechanism for the leaf nodes,and the unordered write mechanism can reduce a large number of write operations caused by maintaining the entry order in the leaf nodes.When the leaf node is split,WO-tree performs the cache line flushing operation after all write operations are completed,which can reduce frequent data flushing operations.WO-tree adopts a partial logging mechanism and it only writes the log for the leaf node.The inner node recognizes the data inconsistency by the read operation and the data can be recovered using the leaf node information,thereby significantly reducing the logging overhead.Furthermore,WO-tree adopts a lock-free search for inner nodes,which reduces the locking overhead for concurrency operation.We evaluate WO-tree using the Yahoo!Cloud Serving Benchmark(YCSB) workloads.Compared with traditional B+ tree,wB-tree,and Fast-Fair,the number of cache line flushes caused by WO-tree insertion operations is reduced by 84.7%,22.2%,and 30.8%,respectively,and the execution time is reduced by 84.3%,27.3%,and 44.7%,respectively.  相似文献   

16.
将智能手机设备加入基于非结构化P2P网络的资源共享系统中能够满足人们对资源共享的多样化、便利性、高频性、实时性、高效性等要求,但是该系统网络规模的扩张和网络节点互异性的加大,必将导致系统资源搜索效率的降低、冗余信息的剧增以及网络更加不稳定。为了解决这些问题,文中设计了一种改进的基于节点兴趣和Q-learning的资源搜索机制。首先将节点根据兴趣相似度进行兴趣聚类,划分兴趣集,然后根据兴趣集中节点的能力值构建兴趣树,该结构避免了消息环路的产生,极大地降低了冗余信息;在资源搜索中,兴趣树内采用洪泛算法转发消息,兴趣树之间采用基于Q-learning的消息转发机制,不断强化最可能获取目标资源的路径,查询消息优先在这些路径上传播。另外,针对“热点”资源问题,设计了自适应热点资源索引机制,减少了重复路径搜索,进一步减少了冗余消息量;针对节点失效的问题,给出了根节点冗余机制和捎带检测的策略方法,分别解决了根节点失效和普通节点失效导致的兴趣树的不完整性问题,分析表明该方法能够减少消息冗余量。仿真实验结果表明,与GBI-BI算法和Interest CN算法相比,所提搜索算法能够提高命中率,缩短响应时间,减少冗余信息,具有较好的综合性能,最终解决了由于智能手机设备加入P2P网络导致的资源搜索效率下降、网络流量开销大的问题。  相似文献   

17.
王准  陈俊亮 《计算机学报》1998,21(8):730-737
消息日志是用于多进程、分布式系统中状态恢复的一种方法。本文针对传统的消息日志方法仅仅适用于确定性进程的局限性,提出一种新的消息日志思想,充分考虑到不确定性的存在在容错方面的积极作用,主张在满足应用进程一致性语义的基础上,在一定程度上允许不确定性现象的存在。从而以新的角度看待单一进程和分布式并发系统中存在的不确定性所带来的状态重建不能完全复原的问题。这样,消息日志亦能适用于某些不满足确定性条件的进程  相似文献   

18.
本文将物理层认证技术应用于智能电表系统中,物理层消息身份认证方案利用物理层信道响应的时空唯一性,进行发送信息的认证,其可利用物理层的同步头作为消息认证的信道响应提前源,避免了额外的开销和上层的复杂计算,同时物理层消息身份认证方案同传统的消息身份认证机制结合,在实现快速认证的同时最小化包传输开销,通过在IEEE802.15.4g标准下的分析和仿真,验证我们提出的方案可以满足智能电表实时控制的要求.  相似文献   

19.
A decentralized Grid resource discovery solution is presented in this paper under predefined resource taxonomy, in which information nodes with the same type of registered resources are organized together to form resource information communities (RIC), and efficient navigation between different communities is achieved by a DHT P2P based bootstrap network. Periodical topology maintenance communications are used to piggyback and disseminate popular data in bootstrap network to achieve better load balance. The performance of RIC-based Grid resource discovery is evaluated by simulation under different cases, and overhead is also studied.  相似文献   

20.
A desired attribute in safety-critical embedded real-time systems is a system time and event synchronization capability on which predictable communication can be established. Focusing on bus-based communication protocols, we present a novel, efficient, and low-cost start-up and restart synchronization approach for TDMA environments. This approach utilizes information about a node's message length that forms a unique sequence to achieve synchronization such that communication overhead can be avoided. We present a fault-tolerant initial synchronization protocol with a bounded start-up time. The protocol avoids start-up collisions by deterministically postponing retries after a collision. We also present a resynchronization strategy that incorporates recovering nodes into synchronization.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号