期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A simple and fast asynchronous consensus protocol based on a weak failure detector 总被引：2，自引：0，他引：2

Michel Hurfin Michel Raynal 《Distributed Computing》1999,12(4):209-223

Summary. The Consensus problem is a fundamental paradigm for fault-tolerant asynchronous systems. It abstracts a family of problems known as Agreement (or Coordination) problems. Any solution to consensus can serve as a basic building block for solving such problems (e.g., atomic commitment or atomic broadcast). Solving consensus in an asynchronous system is not a trivial task: it has been proven (1985) by Fischer, Lynch and Paterson that there is no deterministic solution in asynchronous systems which are subject to even a single crash failure. To circumvent this impossibility result, Chandra and Toueg have introduced the concept of unreliable failure detectors (1991), and have studied how these failure detectors can be used to solve consensus in asynchronous systems with crash failures. This paper presents a new consensus protocol that uses a failure detector of the class . Like previous protocols, it is based on the rotating coordinator paradigm and proceeds in asynchronous rounds. Simplicity and efficiency are the main characteristics of this protocol. From a performance point of view, the protocol is particularly efficient when, whether failures occur or not, the underlying failure detector makes no mistake (a common case in practice). From a design point of view, the protocol is based on the combination of three simple mechanisms: a voting mechanism, a small finite state automaton which manages the behavior of each process, and the possibility for a process to change its mind during a round. Received: August 1997 / Accepted: March 1999 相似文献

2.

Consensus-based fault-tolerant total order multicast

Fritzke U. Jr Ingels P. Mostefaoui A. Raynal M. 《Parallel and Distributed Systems, IEEE Transactions on》2001,12(2):147-156

While total order broadcast (or atomic broadcast) primitives have received a lot of attention, this paper concentrates on total order multicast to multiple groups in the context of asynchronous distributed systems in which processes may suffer crash failures. “Multicast to Multiple Groups” means that each message is sent to a subset of the process groups composing the system, distinct messages possibly having distinct destination groups. “Total Order” means that all message deliveries must be totally ordered. This paper investigates a consensus-based approach to solve this problem and proposes a corresponding protocol to implement this multicast primitive. This protocol is based on two underlying building blocks, namely, uniform reliable multicast and uniform consensus. Its design characteristics lie in the two following properties. The first one is a minimality property, more precisely, only the sender of a message and processes of its destination groups have to participate in the total order multicast of the message. The second property is a locality property: No execution of a consensus has to involve processes belonging to distinct groups (i.e., consensus is executed on a “per group” basis). This locality property is particularly useful when one is interested in using the total order multicast primitive in large-scale distributed systems. In addition to a correctness proof, an improvement that reduces the cost of the protocol is also suggested 相似文献

3.

Open consensus

Romain Boichat Svend Frlund Rachid Guerraoui 《Concurrency and Computation》2001,13(14):1215-1245

This paper presents the abstraction of open consensus and argues for its use as an effective component for building reliable agreement protocols in practical asynchronous systems where processes and links can crash and recover. The specification of open consensus has a decoupled, on‐demand and re‐entrant flavour that make its use very efficient, especially in terms of forced logs, which are known to be major sources of overhead in distributed systems. We illustrate the use of open consensus as a basic building block to develop a modular, yet efficient, total‐order broadcast protocol. Finally, we describe our Java implementation of our open‐consensus abstraction and we convey our efficiency claims through some practical performance measures. Copyright © 2001 John Wiley & Sons, Ltd. 相似文献

4.

Early stopping in Global Data Computation

Delporte-Gallet C. Fauconnier H. Helary J.-M. Raynal M. 《Parallel and Distributed Systems, IEEE Transactions on》2003,14(9):909-921

The Global Data Computation problem consists of providing each process with the same vector (with one entry per process) such that each entry is filled by a value provided by the corresponding process. This paper presents a protocol that solves this problem in an asynchronous distributed system where processes can crash, but equipped with a perfect failure detector. This protocol requires that processes execute asynchronous computation rounds. The number of rounds is upper bounded by min(f+2, t+1, n), where n, t, and f represent the total number of processes, the maximum number of processes that can crash, and the number of processes that actually crash, respectively. This value is a lower bound for the number of rounds when t相似文献

5.

Solving vector consensus with a wormhole 总被引：1，自引：0，他引：1

Neves N.F. Correia M. Verissimo P. 《Parallel and Distributed Systems, IEEE Transactions on》2005,16(12):1120-1131

This paper presents a solution to the vector consensus problem for Byzantine asynchronous systems augmented with wormholes. Wormholes prefigure a hybrid distributed system model, embodying the notion of an enhanced part of the system with "good" properties otherwise not guaranteed by the "normal" weak environment. A protocol built for this type of system runs in the asynchronous part, where f out of n/spl ges/3f+1 processes might be corrupted by malicious adversaries. However, sporadically, processes can rely on the services provided by the wormhole for the correct execution of simple operations. One of the nice features of this setting is that it is possible to keep the protocol completely time-free and, in addition, to circumvent the FLP impossibility result by hiding all time-related assumptions in the wormhole. Furthermore, from a performance perspective, it leads to the design of a protocol with a good time complexity. 相似文献

6.

A Non-Forced-Write Atomic Commit Protocol for Cluster File Systems

下载免费PDF全文

邵冰清张军伟郑彩平张浩刘振军许鲁《计算机科学技术学报》2014,29(2):303-315

Distributed metadata consistency is one of the critical issues of metadata clusters in distributed file systems. Existing methods to maintain metadata consistency generally need several log forced write operations. Since synchronous disk IO is very inefficient, the average response time of metadata operations is greatly increased. In this paper, an asynchronous atomic commit protocol （ACP） named Dual-Log （DL） is presented. It does not need any log forced write operations. Optimizing for distributed metadata operations involving only two metadata servers, DL mutually records the redo log in counterpart metadata servers by transferring through the low latency network. A crashed metadata server can redo the metadata operation with the redundant redo log. Since the latency of the network is much lower than the latency of disk IO, DL can improve the performance of distributed metadata service significantly. The prototype of DL is implemented based on local journal. The performance is tested by comparing with two widely used protocols, EP and S2PC-MP, and the results show that the average response time of distributed metadata operations is reduced by about 40%-60%, and the recovery time is only I second under 10 thousands uncompleted distributed metadata operations. 相似文献

7.

Distributed agreement in tile self-assembly

Aaron Sterling 《Natural computing》2011,10(1):337-355

Laboratory investigations have shown that a formal theory of fault-tolerance will be essential to harness nanoscale self-assembly as a medium of computation. Several researchers have voiced an intuition that self-assembly phenomena are related to the field of distributed computing. This paper formalizes some of that intuition. We construct tile assembly systems that are able to simulate the solution of the wait-free consensus problem in some distributed systems. (For potential future work, this may allow binding errors in tile assembly to be analyzed, and managed, with positive results in distributed computing, as a “blockage” in our tile assembly model is analogous to a crash failure in a distributed computing model.) We also define a strengthening of the “traditional” consensus problem, to make explicit an expectation about consensus algorithms that is often implicit in distributed computing literature. We show that solution of this strengthened consensus problem can be simulated by a two-dimensional tile assembly model only for two processes, whereas a three-dimensional tile assembly model can simulate its solution in a distributed system with any number of processes. 相似文献

8.

Synchronous atomic broadcast for redundant broadcast channels 总被引：4，自引：3，他引：1

Flaviu Cristian 《Real-Time Systems》1990,2(3):195-212

We propose a synchronous atomic broadcast protocol for distributed real-time systems based on redundant broadcast channels. The protocol can tolerate a finite number f of concurrent processor crash failures, channel adapter performance failures and channel omission failures. Its message cost is optimal: when no failures occur only f+1 messages are sent per broadcast. The cost implications of providing tolerance to other failure classes are also investigated. 相似文献

9.

Computing global functions in asynchronous distributed systems with perfect failure detectors

Helary J.-M. Hurfin M. Mostefaoui A. Raynal M. Tronel F. 《Parallel and Distributed Systems, IEEE Transactions on》2000,11(9):897-909

A Global Data is a vector with one entry per process. Each entry must be filled with an appropriate value provided by the corresponding process. Several distributed computing problems amount to compute a function on a global data. This paper proposes a protocol to solve such problems in the context of asynchronous distributed systems where processes may fail by crashing. The main problem that has to be solved lies in computing the global data and in providing each noncrashed process with a copy of it, despite the possible crash of some processes. To be consistent, the global data must contain, at least, all the values provided by the processes that do not crash. This defines the Global Data Computation (GDC) problem. To solve this problem, processes execute a sequence of asynchronous rounds during which they construct, in a decentralized way, the value of the global data and eventually each process gets a copy of it. To cope with process crashes, the protocol uses a perfect failure detector. The proposed protocol has been designed to be time efficient: it allows early decision. Let t be the maximum number of processes that may crash, t相似文献

10.

Reconfigurable distributed storage for dynamic networks

Gregory Chockler Seth Gilbert Vincent Gramoli Peter M. Musial Alex A. Shvartsman 《Journal of Parallel and Distributed Computing》2009

This paper presents a new algorithm for implementing a reconfigurable distributed shared memory in an asynchronous dynamic network. The algorithm guarantees atomic consistency (linearizability) in all executions in the presence of arbitrary crash failures of the processing nodes, message delays, and message loss. The algorithm incorporates a classic quorum-based algorithm for read/write operations, and an optimized consensus protocol, based on Fast Paxos for reconfiguration, and achieves the design goals of: (i) allowing read and write operations to complete rapidly and (ii) providing long-term fault-tolerance through reconfiguration, a process that evolves the quorum configurations used by the read and write operations. The resulting algorithm tolerates dynamism. We formally prove our algorithm to be correct, we present its performance and compare it to existing reconfigurable memories, and we evaluate experimentally the cost of its reconfiguration mechanism. 相似文献

11.

Global data computation in chordal rings

Xianbing Wang Yong Meng Teo 《Journal of Parallel and Distributed Computing》2009

Existing Global Data Computation (GDC) protocols for asynchronous systems are round-based algorithms designed for fully connected networks. In this paper, we discuss GDC in asynchronous chordal rings, a non-fully connected network. The virtual links approach to solve the consensus problem may be applied to GDC for non-fully connected networks, but it incurs high message overhead. To reduce the overhead, we propose a new non-round-based GDC protocol for asynchronous chordal rings with perfect failure detectors. The main advantage of the protocol is that there is no notion of rounds. Every process creates two messages initially, with one message traversing in a clockwise direction and visiting each and every process in the chordal ring. The second message traverses in a counterclockwise direction. When there is direct connection between two processes, a message is sent directly. Otherwise, the message is sent via virtual links. When the two messages return, the process decides according to the information maintained by the two messages. The perfect failure detector of a process need only detect the crash of neighboring processes, and the crash information is disseminated to all other processes. Analysis and comparison with two virtual links approaches show that our protocol reduces message complexity significantly. 相似文献

12.

Cover2

《Parallel and Distributed Systems, IEEE Transactions on》2007,18(4):c2-c2

Unreliable failure detectors are abstract devices that, when added to asynchronous distributed systems, enable solving distributed computing problems (e.g., consensus) that otherwise would be impossible to solve in these systems. This paper focuses on two classes of failure detectors defined by Chandra and Toueg, namely, the classes denoted diamP (eventually perfect) and diamS (eventually strong). Both classes include failure detectors that eventually detect permanently all process crashes, but while the failure detectors of diamP eventually make no erroneous suspicions, the failure detectors of diamS are only required to eventually not suspect a single correct process. Informally, in a one-shot agreement problem, a new problem instance is created each time the processes propose new values to be decided on (e.g., consensus is one-shot). In such a context, this paper addresses the following question related to the comparative power of these classes, namely: "Are there one-shot agreement problems that can be solved in asynchronous distributed systems with reliable links but prone to process crash failures augmented with op, but cannot be solved when those systems are augmented with diamS?" Surprisingly, the paper shows that the answer to this question is "no." An important consequence of this result is that diamP cannot be the weakest class of failure detectors that enables solving one-shot agreement problems in unreliable asynchronous distributed systems 相似文献

13.

A protocol to achieve independence in constant rounds

Gennaro R. 《Parallel and Distributed Systems, IEEE Transactions on》2000,11(7):636-647

Independence is a fundamental property needed to achieve security in fault-tolerant distributed computing. In practice, distributed communication networks are neither fully synchronous or fully asynchronous, but rather loosely synchronized. By this, we mean that in a communication protocol, messages at a given round may depend on messages from other players at the same round. These possible dependencies among messages create problems if we need n players to announce independently chosen values. This task is called simultaneous broadcast. In this paper, we present the first constant round protocol for simultaneous broadcast in a reasonable computation model (which includes a common shared random string among the players). The protocol is provably secure under general cryptographic assumptions. In the process, we develop a new and stronger formal definition for this problem. Previously known protocols for this task required either O(log n) or expected constant rounds to complete (depending on the computation model considered) 相似文献

14.

移动Agent系统的一种分布式死锁检测方案

孟学军张焕国《小型微型计算机系统》2007,28(2):274-278

传统的分布式死锁解决方案不适合于实体在网络中自由移动的MAS系统.本文描述了一种移动Agent系统的分布式死锁算法,使用专职Agent从事死锁检测和解决.该方案的特点是地点参考、拓扑独立、容错、异步操作.文中建立了StochasticPetri Net模型,并使用仿真试验给出它和Diffusion Computation算法的性能比较. 相似文献

15.

Simple and efficient oracle-based consensus protocols for asynchronous Byzantine systems

Friedman R. Mostefaoui A. Raynal M. 《Dependable and Secure Computing, IEEE Transactions on》2005,2(1):46-56

This paper is on the consensus problem in asynchronous distributed systems where (up to f) processes (among n) can exhibit a Byzantine behavior, i.e., can deviate arbitrarily from their specification. One way to solve the consensus problem in such a context consists of enriching the system with additional oracles that are powerful enough to cope with the uncertainty and unpredictability created by the combined effect of Byzantine behavior and asynchrony. This paper presents two kinds of Byzantine asynchronous consensus protocols using two types of oracles, namely, a common coin that provides processes with random values and a failure detector oracle. Both allow the processes to decide in one communication step in favorable circumstances. The first is a randomized protocol for an oblivious scheduler model that assumes n > 6f. The second one is a failure detector-based protocol that assumes n > tif. These protocols are designed to be particularly simple and efficient in terms of communication steps, the number of messages they generate in each step, and the size of messages. So, although they are not optimal in the number of Byzantine processes that can be tolerated, they are particularly efficient when we consider the number of communication steps they require to decide and the number and size of the messages they use. In that sense, they are practically appealing. 相似文献

16.

一种基于TTCB的容侵原子多播协议研究*

周华孟相如张立《计算机应用研究》2008,25(7):2174-2176

对基于可信实时计算基（TTCB）的分布式容侵系统的体系结构进行了研究,针对分布式容侵系统中的一致性问题,实现了利用TTCB服务的容侵原子多播协议。在进程总数大于两倍恶意进程个数的条件下,该协议可以满足正确结果的一致性要求,达到了入侵容忍的目的。最后证明了该协议算法的正确性。相似文献

17.

Coordinator log transaction execution protocol

James W. Stamos Flaviu Cristian 《Distributed and Parallel Databases》1993,1(4):383-408

The coordinator log transaction execution protocol proposed in this paper centralizes logging on a per transaction basis and exploits piggybacking to provide the semantics of a distributed atomic commit at a minimal cost. The protocol eliminates two rounds of messages (one phase) from the presumed commit protocol and dramatically reduces the number of log forces needed for distributed atomic commit. We compare the coordinator log transaction execution protocol to existing protocols, explain when it is desirable, and discuss how it affects the write ahead log protocol and the database crash recovery algorithm. Recommended by: Tamer Ozsu 相似文献

18.

A group membership algorithm with a practical specification

Franceschetti M. Bruck J. 《Parallel and Distributed Systems, IEEE Transactions on》2001,12(11):1190-1200

Presents a solvable specification and gives an algorithm for the group membership problem in asynchronous systems with crash failures. Our specification requires processes to maintain a consistent history in their sequences of views. This allows processes to order failures and recoveries in time and simplifies the programming of high level applications. Previous work has proven that the group membership problem cannot be solved in asynchronous systems with crash failures. We circumvent this impossibility result building a weaker, yet nontrivial specification. We show that our solution is an improvement upon previous attempts to solve this problem using a weaker specification. We also relate our solution to other methods and give a classification of progress properties that can be achieved under different models 相似文献

19.

Adaptive broadcast by fault-tolerant spanning tree switching

Sushanta Karmakar Arobinda Gupta 《Journal of Parallel and Distributed Computing》2010

Adaptation is a desirable requirement in a distributed system as it helps the system to perform efficiently under different environments. For many problems, more than one protocol exists, such that one protocol performs better in one environment while the other performs better in another. In such cases, adaptive distributed systems can be designed by dynamically switching between the protocols as the environment changes. Distributed protocol switching is also important for performance enhancement, or fault-tolerance of a distributed system. In this work, we illustrate distributed protocol switching by providing a distributed algorithm for adaptive broadcast that dynamically switches from a BFS tree to a DFS tree. The proposed switching algorithm can also handle arbitrary crash failures. It ensures that switching eventually terminates in spite of failures and the desired tree (DFS tree) results as the output. We also investigate the properties that can be guaranteed on the delivery of broadcast messages under specific failure conditions. We show that under no failure, each broadcast message is eventually correctly delivered to all the nodes in spite of switching. Under arbitrary crash fault, we ensure that switching eventually terminates with the desired tree as the broadcast topology. We also investigate the specific delivery guarantees that can be provided when a single crash fault happens, both during switching and when no switching is in progress. 相似文献

20.

Scaled consensus for asynchronous high‐order discrete‐time multiagent systems

Yuhua Cheng Quan Zhou Libing Bai Xilin Zhang Gen Qiu 《国际强度与非线性控制杂志
》2020,30(1):443-456

This paper investigates the distributed scaled consensus problem of multiple agents with high‐order dynamics under the asynchronous setting, where each agent measures the neighbors' information at certain discrete time instants according to its own clock rather than the whole discrete process and all agents' clocks are independent of each other. Assume that the communication topology can be arbitrarily switched and the information transfer between agents has a time‐varying delay. Under the designed asynchronous distributed control protocol, it is shown that the agents with the same scale will reach a common final state, while the agents with different scales will reach different final states. Moreover, an effective parameters selection strategy is presented for a large number of gain parameters in high‐order multiagent systems based on novel model transformation techniques. Simulation examples are provided to demonstrate the high‐order scaled consensus performances for the agents in the presence of asynchronous setting. 相似文献