期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Failure detection and consensus in the crash-recovery model 总被引：2，自引：0，他引：2

Marcos Kawazoe Aguilera Wei Chen Sam Toueg 《Distributed Computing》2000,13(2):99-125

Summary. We study the problems of failure detection and consensus in asynchronous systems in which processes may crash and recover, and links may lose messages. We first propose new failure detectors that are particularly suitable to the crash-recovery model. We next determine under what conditions stable storage is necessary to solve consensus in this model. Using the new failure detectors, we give two consensus algorithms that match these conditions: one requires stable storage and the other does not. Both algorithms tolerate link failures and are particularly efficient in the runs that are most likely in practice – those with no failures or failure detector mistakes. In such runs, consensus is achieved within time and with 4 n messages, where is the maximum message delay and n is the number of processes in the system. Received: May 1998 / Accepted: November 1999 相似文献

2.

Non-blocking atomic commit in asynchronous distributed systems with failure detectors

Rachid Guerraoui 《Distributed Computing》2002,15(1):17-25

This paper addresses the Non-Blocking Atomic Commit (NB-AC) problem in asynchronous distributed systems augmented with failure detectors. We first show that, in these systems, NB-AC and Consensus are incomparable. Roughly speaking, there is a failure detector that solves NB-AC but not Consensus and a failure detector that solves Consensus but not NB-AC. Then we introduce the Anonymously Perfect failure detector . We show that, to solve NB-AC, is necessary (while is not), whereas is sufficient when a majority of the processes are correct. We draw from our results some observations on the practical solvability of NB-AC. Received: August 2000 / Accepted: May 2001 相似文献

3.

Cover2

《Parallel and Distributed Systems, IEEE Transactions on》2007,18(4):c2-c2

Unreliable failure detectors are abstract devices that, when added to asynchronous distributed systems, enable solving distributed computing problems (e.g., consensus) that otherwise would be impossible to solve in these systems. This paper focuses on two classes of failure detectors defined by Chandra and Toueg, namely, the classes denoted diamP (eventually perfect) and diamS (eventually strong). Both classes include failure detectors that eventually detect permanently all process crashes, but while the failure detectors of diamP eventually make no erroneous suspicions, the failure detectors of diamS are only required to eventually not suspect a single correct process. Informally, in a one-shot agreement problem, a new problem instance is created each time the processes propose new values to be decided on (e.g., consensus is one-shot). In such a context, this paper addresses the following question related to the comparative power of these classes, namely: "Are there one-shot agreement problems that can be solved in asynchronous distributed systems with reliable links but prone to process crash failures augmented with op, but cannot be solved when those systems are augmented with diamS?" Surprisingly, the paper shows that the answer to this question is "no." An important consequence of this result is that diamP cannot be the weakest class of failure detectors that enables solving one-shot agreement problems in unreliable asynchronous distributed systems 相似文献

4.

Possibility and impossibility results in a shared memory environment

Gadi Taubenfeld Shlomo Moran 《Acta Informatica》1996,33(1):1-20

We focus on unreliable asynchronous shared memory model which support only atomic read and write operations. For such a model we provide a necessary condition for the solvability of problems in the presence of multiple undetectable crash failures. Also, by using game-theoretical notions, a necessary and sufficient condition is provided, for the solvability of problems in the presence of multiple undetectable initial failures (i.e., processes may fail only prior to the execution). Our results imply that many problems such as consensus, choosing a leader, ranking, matching and sorting are unsolvable in the presence of a single crash failure, and that variants of these problems are solvable in the presence of a single crash failure, and that variants of these problems are solvable in the presence of t−1 crash failures but not in the presence of t crash failures. We show that a shared memory model can simulate various message passing models, and hence our impossibility results hold also for those message passing models. Our results extend and generalize previously known impossibility results for various asynchronous models. Received: October 26, 1990/November 28, 1994 相似文献

5.

Synchronous Byzantine quorum systems 总被引：2，自引：0，他引：2

Rida A. Bazzi 《Distributed Computing》2000,13(1):45-52

Summary. Quorum systems have been used to implement many coordination problems in distributed systems such as mutual exclusion, data replication, distributed consensus, and commit protocols. Malkhi and Reiter recently proposed quorum systems that can tolerate Byzantine failures; they called these systems Byzantine quorum systems and gave some examples of such quorum systems. In this paper, we propose a new definition of Byzantine quorums that is appropriate for synchronous systems. We show how these quorums can be used for data replication and propose a general construction of synchronous Byzantine quorums using standard quorum systems. We prove tight lower bounds on the load of synchronous Byzantine quorums for various patterns of failures and we present synchronous Byzantine quorums that have optimal loads that match the lower bounds for two failure patterns. Received: June 1998 / Accepted: August 1999 相似文献

6.

Global data computation in chordal rings

Xianbing Wang Yong Meng Teo 《Journal of Parallel and Distributed Computing》2009

Existing Global Data Computation (GDC) protocols for asynchronous systems are round-based algorithms designed for fully connected networks. In this paper, we discuss GDC in asynchronous chordal rings, a non-fully connected network. The virtual links approach to solve the consensus problem may be applied to GDC for non-fully connected networks, but it incurs high message overhead. To reduce the overhead, we propose a new non-round-based GDC protocol for asynchronous chordal rings with perfect failure detectors. The main advantage of the protocol is that there is no notion of rounds. Every process creates two messages initially, with one message traversing in a clockwise direction and visiting each and every process in the chordal ring. The second message traverses in a counterclockwise direction. When there is direct connection between two processes, a message is sent directly. Otherwise, the message is sent via virtual links. When the two messages return, the process decides according to the information maintained by the two messages. The perfect failure detector of a process need only detect the crash of neighboring processes, and the crash information is disseminated to all other processes. Analysis and comparison with two virtual links approaches show that our protocol reduces message complexity significantly. 相似文献

7.

An adaptive collect algorithm with applications

Hagit Attiya Arie Fouren Eli Gafni 《Distributed Computing》2002,15(2):87-96

Summary. In a shared-memory distributed system, n independent asynchronous processes communicate by reading and writing to shared variables. An algorithm is adaptive (to total contention) if its step complexity depends only on the actual number, k, of active processes in the execution; this number is unknown in advance and may change in different executions of the algorithm. Adaptive algorithms are inherently wait-free, providing fault-tolerance in the presence of an arbitrary number of crash failures and different processes' speed. A wait-free adaptive collect algorithm with O(k) step complexity is presented, together with its applications in wait-free adaptive alg orithms for atomic snapshots, immediate snapshots and renaming. Received: August 1999 / Accepted: August 2001 相似文献

8.

Anonymous asynchronous systems: the case of failure detectors

François Bonnet Michel Raynal 《Distributed Computing》2013,26(3):141-158

Due to the multiplicity of loci of control, a main issue distributed systems have to cope with lies in the uncertainty on the system state created by the adversaries that are asynchrony, failures, dynamicity, mobility, etc. Considering message-passing systems, this paper considers the uncertainty created by the net effect of asynchrony and process crash failures in systems where the processes are anonymous (i.e., processes have no identity and locally execute the same algorithm). Trivially, agreement problems such as consensus, that cannot be solved in non-anonymous asynchronous systems prone to process failures, cannot be solved either if the system is anonymous. The paper investigates failure detectors that allow processes to circumvent this impossibility. It has several contributions. It first presents four failure detectors (denoted AP, ${\overline{AP}}$ , AΩ, and AΣ) and show that they are the “identity-free” counterparts of perfect failure detectors, eventual leader failure detectors, and quorum failure detectors, respectively. AΣ is new and showing that AΣ and Σ have the same computability power in a non-anonymous system is not trivial. The paper also shows that the notion of failure detector reduction is related to the computation model. Then, the paper presents and proves correct a uniform anonymous consensus algorithm based on the failure detector pair (AΩ, AΣ) (“uniform” means here that not only processes have no identity, but no process is aware of the total number of processes). This new algorithm is not a simple “straightforward extension” of an algorithm designed for non-anonymous systems. To benefit from AΣ, it uses a novel broadcast facility which encapsulates an AΣ-based message exchange pattern that provides the processes with an interesting intersection property on the set of messages they have exchanged. Finally, the paper discusses the notions of failure detector hierarchy, weakest failure detector for anonymous consensus, and the implementation of identity-free failure detectors in anonymous systems. 相似文献

9.

Atomic broadcast in asynchronous crash-recovery distributed systems and its use in quorum-based replication

Rodrigues L. Raynal M. 《Knowledge and Data Engineering, IEEE Transactions on》2003,15(5):1206-1217

Atomic broadcast is a fundamental problem of distributed systems: It states that messages must be delivered in the same order to their destination processes. This paper describes a solution to this problem in asynchronous distributed systems in which processes can crash and recover. A consensus-based solution to atomic broadcast problem has been designed by Chandra and Toueg for asynchronous distributed systems where crashed processes do not recover. We extend this approach: it transforms any consensus protocol suited to the crash-recovery model into an atomic broadcast protocol suited to the same model. We show that atomic broadcast can be implemented requiring few additional log operations in excess of those required by the consensus. The paper also discusses how additional log operations can improve the protocol in terms of faster recovery and better throughput. To illustrate the use of the protocol, the paper also describes a solution to the replica management problem in asynchronous distributed systems in which processes can crash and recover. The proposed technique makes a bridge between established results on weighted voting and recent results on the consensus problem. 相似文献

10.

Early consensus in an asynchronous system with a weak failure detector 总被引：2，自引：0，他引：2

André Schiper 《Distributed Computing》1997,10(3):149-157

Summary. Consensus is one of the most fundamental problems in the context of fault-tolerant distributed computing. The problem consists, given a set Ω of processes having each an initial value v _i, in deciding among Ω on a common value v. In 1985, Fischer, Lynch and Paterson proved that the consensus problem is not solvable in an asynchronous system subject to a single process crash. In 1991, Chandra and Toueg showed that, by augmenting the asynchronous system model with a well defined unreliable failure detector, consensus becomes solvable. They also give an algorithm that solves consensus using the ◊? failure detector. In this paper we propose a new consensus algorithm, also using the ◊? failure detector, that is more efficient than the Chandra-Toueg consensus algorithm. We measure efficiency by introducing the notion of latency degree, which defines the minimal number of communication steps needed to solve consensus. The Chandra-Toueg algorithm has a latency degree of 3 (it requires at least three communication steps), whereas our early consensus algorithm requires only two communication steps (latency degree of 2). We believe that this is an interesting result, which adds to our current understanding of the cost of consensus algorithms based on ◊?. Received: April 1995 / Accepted: October 1996 相似文献

11.

Access cost for asynchronous Byzantine quorum systems

Rida A. Bazzi 《Distributed Computing》2001,14(1):41-48

Summary. Quorum systems have been used to implement many coordination problems in distributed systems. In this paper, we study the cost of accessing quorums in asynchronous systems. We formally define the asynchronous access cost of quorum systems and argue that the asynchronous access cost and not the size of a quorum is the right measure of message complexity of protocols using quorums in asynchronous systems. We show that previous quorum systems proposed in the literature have a very high asynchronous access cost. We propose a reformulation of the definition of Byzantine quorum systems that captures the requirement for non-blocking access to quorums in asynchronous systems. We present new Byzantine quorum systems with low asynchronous access cost whose other performance parameters match those of the best Byzantine quorum systems proposed in the literature. In particular, we present a construction for the disjoint failure pattern that outperforms previously proposed systems for that pattern. Received: September 1999 / Accepted: September 2000 相似文献

12.

Asynchronous bounded lifetime failure detectors

Roy Friedman 《Information Processing Letters》2005,94(2):85-91

A failure detector provides processes with a single primitive that, each time it is invoked, returns to the invoking process information related to failures. This research note extends failure detectors by allowing processes to invoke an additional primitive whose effect is to limit the time scope of some properties offered by the failure detector. This simple addition makes possible to weaken the definition of failure detector classes without weakening their power. Two distributed computing problems are used to illustrate the benefit of such an approach, namely the consensus problem and the construction of an atomic register. 相似文献

13.

Fault-tolerance and self-stabilization: impossibility results and solutions using self-stabilizing failure detectors

JOFFROY BEAUQUIER SYNNÖVE KEKKONEN-MONETA 《International journal of systems science》2013,44(11):1177-1187

This paper focuses on protocols that are simultaneously resilient to permanent failures (crash faults) and transient failures (memory and message corruption). First, we show that asynchronous round-based and fault-tolerant protocols cannot be transformed into protocols that are simultaneously fault-tolerant and self-stabilizing (ftss), as is otherwise possible in the synchronous mode of computation. Secondly, we show that it is impossible to find the number of processes (i.e. the size) on a family of networks, as it has been proven for the ring network. Finally, we present a ftss protocol for solving ring size by assuming that each process accesses a failure detector. We also propose two self-stabilizing implementations for the failure detector that differ in their degree of tolerance to transient failures. 相似文献

14.

Computable obstructions to wait-free computability

John Havlicek 《Distributed Computing》2000,13(2):59-83

相似文献

15.

Synchronous atomic broadcast for redundant broadcast channels 总被引：4，自引：3，他引：1

Flaviu Cristian 《Real-Time Systems》1990,2(3):195-212

We propose a synchronous atomic broadcast protocol for distributed real-time systems based on redundant broadcast channels. The protocol can tolerate a finite number f of concurrent processor crash failures, channel adapter performance failures and channel omission failures. Its message cost is optimal: when no failures occur only f+1 messages are sent per broadcast. The cost implications of providing tolerance to other failure classes are also investigated. 相似文献

16.

Byzantine quorum systems 总被引：12，自引：0，他引：12

Dahlia Malkhi Michael Reiter 《Distributed Computing》1998,11(4):203-213

Summary. Quorum systems are well-known tools for ensuring the consistency and availability of replicated data despite the benign failure of data repositories. In this paper we consider the arbitrary (Byzantine) failure of data repositories and present the first study of quorum system requirements and constructions that ensure data availability and consistency despite these failures. We also consider the load associated with our quorum systems, i.e., the minimal access probability of the busiest server. For services subject to arbitrary failures, we demonstrate quorum systems over servers with a load of , thus meeting the lower bound on load for benignly fault-tolerant quorum systems. We explore several variations of our quorum systems and extend our constructions to cope with arbitrary client failures. Received: October 1996 / Accepted June 1998 相似文献

17.

Consensus-based fault-tolerant total order multicast

Fritzke U. Jr Ingels P. Mostefaoui A. Raynal M. 《Parallel and Distributed Systems, IEEE Transactions on》2001,12(2):147-156

While total order broadcast (or atomic broadcast) primitives have received a lot of attention, this paper concentrates on total order multicast to multiple groups in the context of asynchronous distributed systems in which processes may suffer crash failures. “Multicast to Multiple Groups” means that each message is sent to a subset of the process groups composing the system, distinct messages possibly having distinct destination groups. “Total Order” means that all message deliveries must be totally ordered. This paper investigates a consensus-based approach to solve this problem and proposes a corresponding protocol to implement this multicast primitive. This protocol is based on two underlying building blocks, namely, uniform reliable multicast and uniform consensus. Its design characteristics lie in the two following properties. The first one is a minimality property, more precisely, only the sender of a message and processes of its destination groups have to participate in the total order multicast of the message. The second property is a locality property: No execution of a consensus has to involve processes belonging to distinct groups (i.e., consensus is executed on a “per group” basis). This locality property is particularly useful when one is interested in using the total order multicast primitive in large-scale distributed systems. In addition to a correctness proof, an improvement that reduces the cost of the protocol is also suggested 相似文献

18.

Early stopping in Global Data Computation

Delporte-Gallet C. Fauconnier H. Helary J.-M. Raynal M. 《Parallel and Distributed Systems, IEEE Transactions on》2003,14(9):909-921

The Global Data Computation problem consists of providing each process with the same vector (with one entry per process) such that each entry is filled by a value provided by the corresponding process. This paper presents a protocol that solves this problem in an asynchronous distributed system where processes can crash, but equipped with a perfect failure detector. This protocol requires that processes execute asynchronous computation rounds. The number of rounds is upper bounded by min(f+2, t+1, n), where n, t, and f represent the total number of processes, the maximum number of processes that can crash, and the number of processes that actually crash, respectively. This value is a lower bound for the number of rounds when t相似文献

19.

On the power of shared object types to implement one-resilient Consensus

Wai-Kau Lo Vassos Hadzilacos 《Distributed Computing》2000,13(4):219-238

Abstract. In this paper we study the ability of shared object types to implement Consensus in asynchronous shared-memory systems where at most one process may crash. More specifically, we consider the following question: Let and be a set of object types that can be used to solve one-resilient Consensus among n processes. Can always be used to solve one-resilient Consensus among n - 1 processes? We prove that for n = 3 the answer is negative, even if consists only ofdeterministic types. (This strengthens an earlier result by the first author proving the same fact for nondeterministic types.) We also prove that, in contrast, for the answer to the above question is affirmative. Received: July 1997 / Accepted: May 2000 相似文献

20.

Implementing the Omega failure detector in the crash-recovery failure model

Cristian Martín Mikel Larrea Ernesto Jiménez 《Journal of Computer and System Sciences》2009,75(3):178-189

Unreliable failure detectors are mechanisms providing information about process failures, that allow to solve several problems in asynchronous systems, e.g., Consensus. A particular failure detector, Omega, provides an eventual leader election functionality. This paper addresses the implementation of Omega in the crash-recovery failure model. We first propose an algorithm assuming that processes are reachable from the correct process that crashes and recovers a minimum number of times. Then, we propose two algorithms which assume only that processes are reachable from some correct process. Besides this, one of the algorithms requires the membership to be known a priori, while the other two do not. 相似文献