期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Non-blocking atomic commit in asynchronous distributed systems with failure detectors

Rachid Guerraoui 《Distributed Computing》2002,15(1):17-25

This paper addresses the Non-Blocking Atomic Commit (NB-AC) problem in asynchronous distributed systems augmented with failure detectors. We first show that, in these systems, NB-AC and Consensus are incomparable. Roughly speaking, there is a failure detector that solves NB-AC but not Consensus and a failure detector that solves Consensus but not NB-AC. Then we introduce the Anonymously Perfect failure detector . We show that, to solve NB-AC, is necessary (while is not), whereas is sufficient when a majority of the processes are correct. We draw from our results some observations on the practical solvability of NB-AC. Received: August 2000 / Accepted: May 2001 相似文献

2.

Computing global functions in asynchronous distributed systems with perfect failure detectors

Helary J.-M. Hurfin M. Mostefaoui A. Raynal M. Tronel F. 《Parallel and Distributed Systems, IEEE Transactions on》2000,11(9):897-909

A Global Data is a vector with one entry per process. Each entry must be filled with an appropriate value provided by the corresponding process. Several distributed computing problems amount to compute a function on a global data. This paper proposes a protocol to solve such problems in the context of asynchronous distributed systems where processes may fail by crashing. The main problem that has to be solved lies in computing the global data and in providing each noncrashed process with a copy of it, despite the possible crash of some processes. To be consistent, the global data must contain, at least, all the values provided by the processes that do not crash. This defines the Global Data Computation (GDC) problem. To solve this problem, processes execute a sequence of asynchronous rounds during which they construct, in a decentralized way, the value of the global data and eventually each process gets a copy of it. To cope with process crashes, the protocol uses a perfect failure detector. The proposed protocol has been designed to be time efficient: it allows early decision. Let t be the maximum number of processes that may crash, t相似文献

3.

Quorum-based mutual exclusion in asynchronous distributed systems with unreliable failure detectors

Sung-Hoon Park Seon-Hyong Lee 《The Journal of supercomputing》2014,67(2):469-484

This paper considers the fault-tolerant quorum-based mutual exclusion problem in a message-passing asynchronous system and determines a failure detector to solve the problem. This failure detector, which we call the modal failure detector star, and which we denote by M ^?, is strictly weaker than the perfect failure detector P but strictly stronger than the eventually perfect failure detector ?P. The paper shows that at any environment, the problem is solvable with M ^?. In addition, we make an analysis of our algorithm performance in terms of the number of messages and synchronization delay. 相似文献

4.

Algorithms for distributed termination detection

Friedemann Mattern 《Distributed Computing》1987,2(3):161-175

The termination problem for distributed computations is analyzed in the general context of asynchronous communication. In the underlying computational model it is assumed that messages take an arbitrary but finite time and do not necessarily obey the FIFO rule. Time diagrams are used as a graphic means of representing the overall communication scheme, giving a clear insight into the difficulties involved (e.g., lack of global state or time, inconsistent time cuts) and suggesting possible solutions.Several efficient algorithms for the solution of the termination problem are presented. They are all based on the idea of message counting but have a number of different characteristics. The methods are discussed and compared with other known solutions.Friedemann Matternreceived the Diploma in computer science from the University of Bonn, West Germany, in 1983.He is now a research scientist in the Department of Computer Science at the University of Kaiserslautern and is currently completing his Ph.D. His primary research interests include distributed algorithms, programming language design, and compiler construction. The author can be reached by electronic mail via mattern @ incas.uucp or mattern % uklirb.uucp @ Germany.csne.This work has been supported by the Deutsche Forschungsgemeinschaft (DFG) as part of the SFB124 research project VLSI-Design and Parallelism 相似文献

5.

Mutual exclusion in asynchronous systems with failure detectors

《Journal of Parallel and Distributed Computing》2005,65(4):492-505

相似文献

6.

The Doomsday distributed termination detection protocol

M. J. Livesey R. Morrison D. S. Munro 《Distributed Computing》2007,19(5-6):419-431

Distributed termination detection (DTD) algorithms are important since they detect globally stable states in distributed computations. Here we introduce a new DTD mechanism, the Doomsday protocol together with its proof of correctness. Doomsday is generic since it forms the basis for a number of new and existing DTD algorithms for which the correctness proof may be reused. The paper describes the Doomsday protocol, provides its formal proof, derives one new DTD algorithm and shows how other hitherto unrelated algorithms, Dijkstra–Scholten, Task Balancing and Credit Recovery, can be derived from the protocol. The paper concludes by examining various properties of the protocol in the context of existing DTD algorithms. This work was supported in part by Visiting Fellowship grant EPSRC GR/R84481/01 “The Doomsday Protocol” and by Australian Research Council ARC Linkage International Grant LX0349049 “Extending a Family of Garbage Collectors”. 相似文献

7.

An efficient delay-optimal distributed termination detection algorithm

Nihar R. Mahapatra Shantanu Dutt 《Journal of Parallel and Distributed Computing》2007

Distributed termination detection is a fundamental problem in parallel and distributed computing and numerous schemes with different performance characteristics have been proposed. These schemes, while being efficient with regard to one performance metric, prove to be inefficient in terms of other metrics. A significant drawback shared by all previous methods is that, on most popular topologies, they take Ω(P)

Ω (P)

time to detect and signal termination after its actual occurrence, where P is the total number of processing elements. Detection delay is arguably the most important metric to optimize, since it is directly related to the amount of idling of computing resources and to the delay in the utilization of results of the underlying computation. In this paper, we present a novel termination detection algorithm that is simultaneously optimal or near-optimal with respect to all relevant performance measures on any topology. In particular, our algorithm has a best-case detection delay of Θ(1)

Θ (1)

and a finite optimal worst-case detection delay on any topology equal in order terms to the time for an optimal one-to-all broadcast on that topology (which we accurately characterize for an arbitrary topology). On k-ary n -cube tori and meshes, the worst-case delay is Θ(D)

Θ (D)

, where D is the diameter of the target topology. Further, our algorithm has message and computational complexities of Θ(MD+P)

Θ (MD + P)

in the worst case and, for most applications, Θ(M+P)

Θ (M + P)

in the average case—the same as other message-efficient algorithms, and an optimal space complexity of Θ(P)

Θ (P)

, where M is the total number of messages used by the underlying computation. We also give a scheme using counters that greatly reduces the constant associated with the average message and computational complexities, but does not suffer from the counter-overflow problems of other schemes. Finally, unlike some previous schemes, our algorithm does not rely on first-in first-out (FIFO) ordering for message communication to work correctly. 相似文献

8.

Distributed termination detection algorithm for distributed computations

《Information Processing Letters》1986,22(6):311-314

A fully distributed and symmetric algorithm for solving the distributed termination problem is presented along with its correctness arguments. The algorithm does not make use of time-stamps and clock-synchronization and is very simple. 相似文献

9.

基于灰色预测的分布式系统动态故障检测服务

田东毛太平吴长泽《计算机工程与设计》2007,28(24):5915-5918

针对已有故障检测服务不能有效满足分布式系统需求问题,设计了一种适用于分布式系统的动态故障检测服务.根据分布式系统的特点,在定义分布式系统模型的基础上,提出了动态故障检测服务架构.结合心跳策略和灰色预测方法,设计了一种动态心跳机制,并给出了预测模型和动态预测策略,提出了基于该动态心跳机制的分布式系统的故障检测算法.最后,仿真实验验证了该算法的正确性和有效性. 相似文献

10.

On the performance of distributed Neyman-Pearson detection systems

Ming Xiang Junwei Zhao 《IEEE transactions on systems, man, and cybernetics. Part A, Systems and humans : a publication of the IEEE Systems, Man, and Cybernetics Society》2001,31(1):78-83

The performance of a distributed Neyman-Pearson detection system is considered. We assume that the decision rules of the sensors are given and that decisions from different sensors are mutually independent conditioned on both hypotheses. The purpose of decision fusion is to improve the performance of the overall system, and we are interested to know under what conditions can a better performance be achieved at fusion center, and under what conditions cannot. We assume that the probabilities of detection and false alarm of the sensors can be different. By comparing the probability of detection at fusion center with that of each of the sensors, with the probability of false alarm at fusion center constrained equal to that of the sensor, we give conditions for a better performance to be achieved at fusion center 相似文献

11.

Towards the construction of distributed detection programs,with an application to distributed termination

Jean-Michel Hélary Michel Raynal 《Distributed Computing》1994,7(3):137-147

Summary Methodological design of distributed programs is necessary if one is to master the complexity of parallelism. The class of control programs, whose purpose is to observe or detect properties of an underlying program, plays an important role in distributed computing. The detection of a property generally rests upon consistent evaluations of a predicate; such a predicate can be global, i.e. involve states of several processes and channels of the observed program. Unfortunately, in a distributed system, the consistency of an evaluation cannot be trivially obtained. This is a central problem in distributed evaluations. This paper addresses the problem of distributed evaluation, used as a basic tool for solution of general distributed detection problems. A new evaluation paradigm is put forward, and a general distributed detection program is designed, introducing the iterative scheme ofguarded waves sequence. The case of distributed termination detection is then taken to illustrate the proposed methodological design. Jean-Michel Hélary is currently professor of Computer Science at the University of Rennes, France. He received a first Ph.D. degree in Numerical Analysis in 1968, then another Ph.D. Degree in Computer Science in 1988. His research interests include distributed algorithms and protocols, specially the methodological aspects. He is a member of an INRIA research group working at IRISA (Rennes) on distributed algorithms and applications. Professor Jean-Michel Hélary has published several papers on these subjects, and is co-author of a book with Michel Raynal. He serves as a PC member in an international conference. Michel Raynal is currently professor of Computer Science at the University of Rennes, France. He received the Ph.D. degree in 1981. His research interests include distributed algorithms, operating systems, protocols and parallelism. He is the head of an INRIA research group working at IRISA (Rennes) on distributed algorithms and applications. Professor Michel Raynal has organized several international conferences and has served as a PC member in many international workshops, conferences and symposia. Over the past 9 years, he has written 7 books that constitute an introduction to distributed algorithms and distributed systems (among them: Algorithms for Mutual Exclusion, the MIT Press, 1986, and Synchronization and Control of Distributed Programs, Wiley, 1990, co-authored with J.M. Hélary). He is currently involved in two european Esprit projects devoted to large scale distributed systems.This work was supported by French Research Program C³ on Parallelism and Distributed ComputingAn extended abstract has been presented to ISTCS '92 [12] 相似文献

12.

Deadlock detection in distributed systems 总被引：1，自引：0，他引：1

Singhal M. 《Computer》1989,22(11):37-48

The author describes a series of deadlock detection techniques based on centralized, hierarchical, and distributed control organizations. The point of view is that of practical implications. An up-to-date and comprehensive survey of deadlock detection algorithms is presented, their merits and drawbacks are discussed, and their performances (delays as well as message complexity) are compared. Related issues such as correctness of the algorithms, performance of the algorithms, and deadlock resolution, which require further research are examined 相似文献

13.

Distributed termination detection for dynamic systems

D.M. Dhamdhere Sridhar R. Iyer E. Kishore Kumar Reddy 《Parallel Computing》1997,22(14):2025-2045

A symmetric algorithm for detecting the termination of a distributed computation is presented. The algorithm does not require global information concerning the system and does not assume any communication features, barring finite delays in the delivery of messages. It permits dynamic creation and destruction of processes participating in the computation, and also permits destruction of a process by external processes, such as the OS kernel. It also provides for external processes spontaneously joining an ongoing computation. Proofs of safety and liveness are provided. 相似文献

14.

Inheritance on processes,exemplified on distributed termination detection

Kristine Stougaard Thomsen 《International journal of parallel programming》1987,16(1):17-52

相似文献

15.

A knowledge-theoretic analysis of uniform distributed coordination and failure detectors

Joseph Y. Halpern Aleta Ricciardi 《Distributed Computing》2005,17(3):223-236

It is shown that, in a precise sense, if there is no bound on the number of faulty processes in a system with unreliable but fair communication, Uniform Distributed Coordination (UDC) can be attained if and only if a system has perfect failure detectors. This result is generalized to the case where there is a bound t on the number of faulty processes. It is shown that a certain type of generalized failure detector is necessary and sufficient for achieving UDC in a context with at most t faulty processes. Reasoning about processes knowledge as to which other processes are faulty plays a key role in the analysis.Received: 15 June 2000, Accepted: 15 April 2004, Published online: 26 July 2004A preliminary version of this paper appeared in the 18th ACM Symposium on Principles of Distributed Computing, 1999, pp. 73-82. Work supported in part by NSF under grants IRI-96-25901 and CTC-0208535, by ONR under grant N00014-02-1-0455, by the DoD Multidisciplinary University Research Initiative (MURI) program administered by the ONR under grants N00014-97-0505 and N00014-01-1-0795, and by a Guggenheim and a Fulbright Fellowship. Sabbatical support from CWI and the Hebrew University of Jerusalem is also gratefully acknowledged. 相似文献

16.

Derivation of a termination detection algorithm for distributed computations

Edsger W. Dijkstra W.H.J. Feijen A.J.M. van Gasteren 《Information Processing Letters》1983,16(5):217-219

相似文献

17.

On adaptive detectors for two-input systems

Kanefsky M. Thomas J. 《Automatic Control, IEEE Transactions on》1965,10(4):427-433

相似文献

18.

Termination detection for dynamically distributed systems with non-first-in-first-out communication

Ten-Hwang Lai 《Journal of Parallel and Distributed Computing》1986,3(4)

We propose a new algorithm for detecting termination of distributed systems. The algorithm works correctly whether the system is static or dynamic, whether the interprocess communication is synchronous or asynchronous, and whether the communication channels are first-in-first-out or not. The algorithm requires, in the worst case, O(nm) control messages in all, where n is the number of processes in the system and m is the total number of messages transmitted during the operation of the system. After the system terminates, the algorithm is able to detect the termination using only O(n) control messages; it is optimal if the system concerned is static. 相似文献

19.

The derivation of graph marking algorithms from distributed termination detection protocols

《Science of Computer Programming》1988,10(2):107-137

We show that on-the-fly garbage collection algorithms can be obtained by transforming distributed termination detection protocols. Virtually all known on-the-fly garbage collecting algorithms are obtained by applying the transformation. The approach leads to a novel and insightful derivation of, e.g., the concurrent garbage collection algorithms of Dijkstra et al. and of Hudak and Keller. The approach also leads to several new, highly parallel algorithms for concurrent garbage collection. We also analyze a garbage collecting system due to Hughes from our current perspective. 相似文献

20.

Anonymous asynchronous systems: the case of failure detectors

François Bonnet Michel Raynal 《Distributed Computing》2013,26(3):141-158

Due to the multiplicity of loci of control, a main issue distributed systems have to cope with lies in the uncertainty on the system state created by the adversaries that are asynchrony, failures, dynamicity, mobility, etc. Considering message-passing systems, this paper considers the uncertainty created by the net effect of asynchrony and process crash failures in systems where the processes are anonymous (i.e., processes have no identity and locally execute the same algorithm). Trivially, agreement problems such as consensus, that cannot be solved in non-anonymous asynchronous systems prone to process failures, cannot be solved either if the system is anonymous. The paper investigates failure detectors that allow processes to circumvent this impossibility. It has several contributions. It first presents four failure detectors (denoted AP, ${\overline{AP}}$ , AΩ, and AΣ) and show that they are the “identity-free” counterparts of perfect failure detectors, eventual leader failure detectors, and quorum failure detectors, respectively. AΣ is new and showing that AΣ and Σ have the same computability power in a non-anonymous system is not trivial. The paper also shows that the notion of failure detector reduction is related to the computation model. Then, the paper presents and proves correct a uniform anonymous consensus algorithm based on the failure detector pair (AΩ, AΣ) (“uniform” means here that not only processes have no identity, but no process is aware of the total number of processes). This new algorithm is not a simple “straightforward extension” of an algorithm designed for non-anonymous systems. To benefit from AΣ, it uses a novel broadcast facility which encapsulates an AΣ-based message exchange pattern that provides the processes with an interesting intersection property on the set of messages they have exchanged. Finally, the paper discusses the notions of failure detector hierarchy, weakest failure detector for anonymous consensus, and the implementation of identity-free failure detectors in anonymous systems. 相似文献