首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 578 毫秒
1.
Many problems in distributed computing are impossible to solve when no information about process failures is available. It is common to ask what information about failures is necessary and sufficient to circumvent some specific impossibility, e.g., consensus, atomic commit, mutual exclusion, etc. This paper asks what information about failures is necessary to circumvent any impossibility and sufficient to circumvent some impossibility. In other words, what is the minimal yet non-trivial failure information. We present an abstraction, denoted U{\Upsilon} , that provides very little information about failures. In every run of the distributed system, U{\Upsilon} eventually informs the processes that some set of processes in the system cannot be the set of correct processes in that run. Although seemingly weak, for it might provide random information for an arbitrarily long period of time, and it eventually excludes only one set of processes (among many) that is not the set of correct processes in the current run, U{\Upsilon} still captures non-trivial failure information. We show that U{\Upsilon} is sufficient to circumvent the fundamental wait-free set-agreement impossibility. While doing so, (a) we disprove previous conjectures about the weakest failure detector to solve set-agreement and (b) we prove that solving set-agreement with registers is strictly weaker than solving n + 1-process consensus using n-process consensus. We show that U{\Upsilon} is the weakest stable non-trivial failure detector: any stable failure detector that circumvents some wait-free impossibility provides at least as much information about failures as U{\Upsilon} does. Our results are generalized, from the wait-free to the f-resilient case, through an abstraction Uf{\Upsilon^f} that we introduce and prove minimal to solve any problem that cannot be solved in an f-resilient manner, and yet sufficient to solve f-resilient f-set-agreement.  相似文献   

2.
The power of an object type T can be measured as the maximum number n of processes that can solve consensus using only objects of T and registers. This number, denoted cons(T), is called the consensus power of T. This paper addresses the question of the weakest failure detector to solve consensus among a number k > n of processes that communicate using shared objects of a type T with consensus power n. In other words, we seek for a failure detector that is sufficient and necessary to “boost” the consensus power of a type T from n to k. It was shown in Neiger (Proceedings of the 14th annual ACM symposium on principles of distributed computing (PODC), pp. 100–109, 1995) that a certain failure detector, denoted Ω n , is sufficient to boost the power of a type T from n to k, and it was conjectured that Ω n was also necessary. In this paper, we prove this conjecture for one-shot deterministic types. We first show that, for any one-shot deterministic type T with cons(T) ≤ n, Ω n is necessary to boost the power of T from n to n + 1. Then we go a step further and show that Ω n is also the weakest to boost the power of (n + 1)-ported one-shot deterministic types from n to any k > n. Our result generalizes, in a precise sense, the result of the weakest failure detector to solve consensus in asynchronous message-passing systems (Chandra et al. in J ACM 43(4):685–722, 1996). As a corollary, we show that Ω t is the weakest failure detector to boost the resilience level of a distributed shared memory system, i.e., to solve consensus among n > t processes using (t − 1)-resilient objects of consensus power t. This paper is a revised and extended version of a paper that appeared in the Proceedings of the 17th International Symposium on Distributed Computing (DISC 2003), entitled “On failure detectors and type boosters.”  相似文献   

3.
The concept of unreliable failure detector was introduced by Chandra and Toueg as a mechanism that provides information about process failures. This mechanism has been used to solve several agreement problems, such as the consensus problem. In this paper, algorithms that implement failure detectors in partially synchronous systems are presented. First two simple algorithms of the weakest class to solve the consensus problem, namely the Eventually Strong class (?S), are presented. While the first algorithm is wait-free, the second algorithm is f-resilient, where f is a known upper bound on the number of faulty processes. Both algorithms guarantee that, eventually, all the correct processes agree permanently on a common correct process, i.e. they also implement a failure detector of the class Omega (Ω). They are also shown to be optimal in terms of the number of communication links used forever. Additionally, a wait-free algorithm that implements a failure detector of the Eventually Perfect class (?P) is presented. This algorithm is shown to be optimal in terms of the number of bidirectional links used forever.  相似文献   

4.
Determining the “weakest” failure detectors is a central topic in solving many agreement problems such as Consensus, Non-Blocking Atomic Commit and Election in asynchronous distributed systems. So far, this has been studied extensively for several such fundamental problems. It is stated that Perfect Failure Detector P is the weakest failure detector to solve the Election problem with any number of faulty processes. In this paper, we introduce Modal failure detector M and show that to solve Election, M is the weakest failure detector to solve election when the number of faulty processes is less than ⌈n/2⌉. We also show that it is strictly weaker than P.
Sung Hoon ParkEmail:
  相似文献   

5.
This paper considers the fault-tolerant quorum-based mutual exclusion problem in a message-passing asynchronous system and determines a failure detector to solve the problem. This failure detector, which we call the modal failure detector star, and which we denote by M ?, is strictly weaker than the perfect failure detector P but strictly stronger than the eventually perfect failure detector ?P. The paper shows that at any environment, the problem is solvable with M ?. In addition, we make an analysis of our algorithm performance in terms of the number of messages and synchronization delay.  相似文献   

6.
Failure detectors have been shown to be a very useful mechanism to solve the consensus problem in the crash failure model, for which a number of communication-efficient algorithms have been proposed. In this paper we deal with the definition, implementation and use of communication-efficient failure detectors in the general omission failure model, where processes can fail by crashing and by omitting messages when sending and/or receiving. We first define a new failure detector class for this model in terms of completeness and accuracy properties. Then we propose an algorithm that implements a failure detector of the proposed class in a communication-efficient way, in the sense that only a linear number of links are used to send messages forever. We also explain how the well-known consensus algorithm of Chandra and Toueg can be adapted in order to use the proposed failure detector.  相似文献   

7.
has been shown to be the weakest realistic failure detector class needed to solve the consensus problem in an asynchronous distributed system prone to f<n process crashes in which communication is by message-passing. However, the only protocol that is known to meet this bound is based on three layers of protocol construction, and is therefore not efficient. This paper presents a surprisingly simple and very efficient direct message-passing implementation of a -based consensus protocol, and proves its correctness.  相似文献   

8.
Existing consensus protocols suffer from slowdowns caused by the failures of processes and the mistakes made by the underlying oracles. In this paper, we propose two novel techniques to circumvent such slowdowns in failure-detector-based consensus protocols. The first technique guarantees the Round-Zero-Degradation (RZD) property (an extension of the Zero-Degradation property) in order to avoid the slowdown caused by a failed coordinator process. The second technique, named “Look-Ahead”, helps speed up the execution of the consensus protocol by making use of the messages delivered before their receivers enter the corresponding phases or rounds. The first technique is effective only when the underlying failure detector makes no or few mistakes, while the second technique always works well regardless of the performance of the failure detector. Moreover, Look-Ahead is a general technique and can be applied to consensus protocols based on any kind of oracle. By applying the two proposed techniques, several consensus protocols are developed. The simulation results show that the RZD technique is effective even if the error rate of the failure detector reaches about 15%, while the Look-Ahead technique can always improve the performance in all cases.  相似文献   

9.
It is conjectured that the only way a failure detector (FD) can help solving n-process tasks is by providing k-set consensus for some ${k\in\{1,\ldots,n\}}$ among all the processes. It was recently shown by Zieli??ski that any FD that allows for solving a given n-process task that is unsolvable read-write wait-free, also solves (n ? 1)-set consensus. In this paper, we provide a generalization of Zieli??ski??s result. We show that any FD that solves a colorless task that cannot be solved read-write k-resiliently, also solves k-set consensus. More generally, we show that every colorless task ${\mathcal{T}}$ can be characterized by its set consensus number: the largest ${k\in\{1,\ldots,n\}}$ such that ${\mathcal{T}}$ is solvable (k ? 1)-resiliently. A task ${\mathcal{T}}$ with set consensus number k is, in the failure detector sense, equivalent to k-set consensus, i.e., a FD solves ${\mathcal{T}}$ if and only if it solves k-set consensus. As a corollary, we determine the weakest FD for solving k-set consensus in every environment, i.e., for all assumptions on when and where failures might occur.  相似文献   

10.
Early consensus in an asynchronous system with a weak failure detector   总被引:2,自引:0,他引:2  
Summary.  Consensus is one of the most fundamental problems in the context of fault-tolerant distributed computing. The problem consists, given a set Ω of processes having each an initial value v i , in deciding among Ω on a common value v. In 1985, Fischer, Lynch and Paterson proved that the consensus problem is not solvable in an asynchronous system subject to a single process crash. In 1991, Chandra and Toueg showed that, by augmenting the asynchronous system model with a well defined unreliable failure detector, consensus becomes solvable. They also give an algorithm that solves consensus using the ◊? failure detector. In this paper we propose a new consensus algorithm, also using the ◊? failure detector, that is more efficient than the Chandra-Toueg consensus algorithm. We measure efficiency by introducing the notion of latency degree, which defines the minimal number of communication steps needed to solve consensus. The Chandra-Toueg algorithm has a latency degree of 3 (it requires at least three communication steps), whereas our early consensus algorithm requires only two communication steps (latency degree of 2). We believe that this is an interesting result, which adds to our current understanding of the cost of consensus algorithms based on ◊?. Received: April 1995 / Accepted: October 1996  相似文献   

11.
Failure detection and consensus in the crash-recovery model   总被引:2,自引:0,他引:2  
Summary. We study the problems of failure detection and consensus in asynchronous systems in which processes may crash and recover, and links may lose messages. We first propose new failure detectors that are particularly suitable to the crash-recovery model. We next determine under what conditions stable storage is necessary to solve consensus in this model. Using the new failure detectors, we give two consensus algorithms that match these conditions: one requires stable storage and the other does not. Both algorithms tolerate link failures and are particularly efficient in the runs that are most likely in practice – those with no failures or failure detector mistakes. In such runs, consensus is achieved within time and with 4 n messages, where is the maximum message delay and n is the number of processes in the system. Received: May 1998 / Accepted: November 1999  相似文献   

12.
This paper presents a simple proof that shows that the quorum failure detector class (denoted Σ) is the weakest failure detector class required to implement an atomic read/write register in an asynchronous message-passing system prone to an arbitrary number of process crashes. This proof is based on a new reduction algorithm in which all the variables are bounded.  相似文献   

13.
Due to the multiplicity of loci of control, a main issue distributed systems have to cope with lies in the uncertainty on the system state created by the adversaries that are asynchrony, failures, dynamicity, mobility, etc. Considering message-passing systems, this paper considers the uncertainty created by the net effect of asynchrony and process crash failures in systems where the processes are anonymous (i.e., processes have no identity and locally execute the same algorithm). Trivially, agreement problems such as consensus, that cannot be solved in non-anonymous asynchronous systems prone to process failures, cannot be solved either if the system is anonymous. The paper investigates failure detectors that allow processes to circumvent this impossibility. It has several contributions. It first presents four failure detectors (denoted AP, ${\overline{AP}}$ , , and ) and show that they are the “identity-free” counterparts of perfect failure detectors, eventual leader failure detectors, and quorum failure detectors, respectively. is new and showing that and Σ have the same computability power in a non-anonymous system is not trivial. The paper also shows that the notion of failure detector reduction is related to the computation model. Then, the paper presents and proves correct a uniform anonymous consensus algorithm based on the failure detector pair (, ) (“uniform” means here that not only processes have no identity, but no process is aware of the total number of processes). This new algorithm is not a simple “straightforward extension” of an algorithm designed for non-anonymous systems. To benefit from , it uses a novel broadcast facility which encapsulates an -based message exchange pattern that provides the processes with an interesting intersection property on the set of messages they have exchanged. Finally, the paper discusses the notions of failure detector hierarchy, weakest failure detector for anonymous consensus, and the implementation of identity-free failure detectors in anonymous systems.  相似文献   

14.
Summary. The Consensus problem is a fundamental paradigm for fault-tolerant asynchronous systems. It abstracts a family of problems known as Agreement (or Coordination) problems. Any solution to consensus can serve as a basic building block for solving such problems (e.g., atomic commitment or atomic broadcast). Solving consensus in an asynchronous system is not a trivial task: it has been proven (1985) by Fischer, Lynch and Paterson that there is no deterministic solution in asynchronous systems which are subject to even a single crash failure. To circumvent this impossibility result, Chandra and Toueg have introduced the concept of unreliable failure detectors (1991), and have studied how these failure detectors can be used to solve consensus in asynchronous systems with crash failures. This paper presents a new consensus protocol that uses a failure detector of the class . Like previous protocols, it is based on the rotating coordinator paradigm and proceeds in asynchronous rounds. Simplicity and efficiency are the main characteristics of this protocol. From a performance point of view, the protocol is particularly efficient when, whether failures occur or not, the underlying failure detector makes no mistake (a common case in practice). From a design point of view, the protocol is based on the combination of three simple mechanisms: a voting mechanism, a small finite state automaton which manages the behavior of each process, and the possibility for a process to change its mind during a round. Received: August 1997 / Accepted: March 1999  相似文献   

15.
Solving agreement problems deterministically, such as consensus and k-set agreement, in asynchronous distributed systems prone to an unbounded number of process failures has been shown to be impossible. To circumvent this impossibility, unreliable failure detectors for the crash failure model have been widely studied. These are oracles that provide information on failures. The exact nature of such information is defined by a set of abstract properties that a particular class of failure detectors satisfy. The weakest class of such failure detectors that allow to solve consensus is Ω. This paper considers failure detector classes from the literature that solve k-set agreement in the crash failure model, and studies their relative power. It shows that the family of failure detector classes (1 ≤ xn), and (0 ≤ y ≤ n), can be “added” to provide a failure detector of the class Ω z (1 ≤ z ≤ n, a generalization of Ω). It also characterizes the power of such an “addition”, namely, , can construct Ω z iff y + z > t, and can construct Ω z iff x + z > t + 1, where t is the maximum number of processes that can crash in a run. As an example, the paper shows that, while allows solving 2-set agreement (but not consensus) and allows solving t-set agreement (but not (t − 1)-set agreement), a system with failure detectors of both classes can solve consensus for any value of t. More generally, the paper studies the failure detector classes , and Ω z , and shows which reductions among these classes are possible and which are not. The paper also presents a message-passing Ω k -based k-set agreement protocol and shows that Ω k is not enough to solve (k − 1)-set agreement. In that sense, it can be seen as a step toward the characterization of the weakest failure detector class that allows solving the k-set agreement problem. An extended abstract of this paper has appeared in the proceedings of PODC 2006 [20]. This work has been supported partially by a grant from LAFMI (Franco-Mexican Lab in Computer Science), the European Network of Excellence ReSIST and PAPIIT-UNAM.  相似文献   

16.
This paper presents a family of agreement problems called Managed Agreement, which is parameterized by the number of aristocrat nodes in the system; NBAC is a special case of this family when all nodes are aristocrats while Consensus is a special case of this family when there are no aristocrats. The paper also presents a parameterized family of failure detectors F(A) such that F(A) is the weakest failure detector class that enables solving Managed Agreement with a set A of aristocrats in an asynchronous environment.  相似文献   

17.
Cover2     
Unreliable failure detectors are abstract devices that, when added to asynchronous distributed systems, enable solving distributed computing problems (e.g., consensus) that otherwise would be impossible to solve in these systems. This paper focuses on two classes of failure detectors defined by Chandra and Toueg, namely, the classes denoted diamP (eventually perfect) and diamS (eventually strong). Both classes include failure detectors that eventually detect permanently all process crashes, but while the failure detectors of diamP eventually make no erroneous suspicions, the failure detectors of diamS are only required to eventually not suspect a single correct process. Informally, in a one-shot agreement problem, a new problem instance is created each time the processes propose new values to be decided on (e.g., consensus is one-shot). In such a context, this paper addresses the following question related to the comparative power of these classes, namely: "Are there one-shot agreement problems that can be solved in asynchronous distributed systems with reliable links but prone to process crash failures augmented with op, but cannot be solved when those systems are augmented with diamS?" Surprisingly, the paper shows that the answer to this question is "no." An important consequence of this result is that diamP cannot be the weakest class of failure detectors that enables solving one-shot agreement problems in unreliable asynchronous distributed systems  相似文献   

18.
在有故障发生的情况下,使用不可靠的故障检测器无法解决无阻塞原子提交问题。本文减弱无阻塞原子提交问题的非平凡性条件,得到一个较弱的问题,再用一个扩展的心跳故障检测器在包含进程故障和链路故障的异步消息传递系统中解决弱化后的问题。  相似文献   

19.
We address the question of the weakest failure detector to circumvent the impossibility of $(2n-2)$ -renaming in a system of up to $n$ participating processes. We derive that in a restricted class of eventual failure detectors there does not exist a single weakest oracle, but a weakest family of oracles $\zeta _n$ : every two oracles in $\zeta _n$ are incomparable, and every oracle that allows for solving renaming provides at least as much information about failures as one of the oracles in $\zeta _n$ . As a by product, we obtain one more evidence that renaming is strictly easier to solve than set agreement.  相似文献   

20.
A failure detector provides processes with a single primitive that, each time it is invoked, returns to the invoking process information related to failures. This research note extends failure detectors by allowing processes to invoke an additional primitive whose effect is to limit the time scope of some properties offered by the failure detector. This simple addition makes possible to weaken the definition of failure detector classes without weakening their power. Two distributed computing problems are used to illustrate the benefit of such an approach, namely the consensus problem and the construction of an atomic register.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号