首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到10条相似文献,搜索用时 125 毫秒
1.
Reaching agreement on the identity of correctly functioning processors of a distributed system in the presence of random communication delays, failures and processor joins is a fundamental problem in fault-tolerant distributed systems. Assuming a synchronous communication network that is not subject to partition occurrences, we specify the processor-group membership problem and we propose three simple protocols for solving it. The protocols provide all correct processors with consistent views of the processor-group membership and guarantee bounded processor failure detection and join delays. Flaviu Cristian is a computer scientist at the IBM Almaden Research Center in San Jose, California. He received his PhD from the University of Grenoble, France, in 1979. After carrying out research in operating systems and programming methodology in France and working on the specification, design, and verification of fault-tolerant software in England, he joined IBM in 1982. Since then he has worked in the area of fault-tolerant distributed systems and protocols. He has participated in the design and implementation of a highly available distributed system prototype at the Almaden Research Center, has reviewed and consulted for several fault-tolerant distributed system designs, both in Europe and the American divisions of IBM, and is now a technical leader in the design of a new Air Traffic Control System for the US which must satisfy very stringent availability requirements.  相似文献   

2.
Modelling knowledge and action in distributed systems   总被引:1,自引:0,他引:1  
We present a formal model that captures the subtle interaction between knowledge and action in distributed systems. We view a distributed system as a set ofruns, where a run is a function from time toglobal states and a global state is a tuple consisting of anenvironment state and alocal state for earch process in the system. This model is a generalization of those used in many previous papers.Actions in this model are associated with functions from global states to global states. Aprotocol is a function from local states to actions. We extend the standard notion of a protocol by definingknowledge-based protocols, ones in which a process' actions may depend explicitly on its knowledge. Knowledge-based protocols provide a natural way of describing how actions should take place in a distributed system. Finally, we show how the notion of one protocolimplementing another can be captured in our model. Joseph Y. Halpern received a B.Sc. in mathematics from the University of Toronto in 1975, and a Ph.D. in mathematics from Harvard University in 1981. In between, he spent two years as the head of the Mathematics Department at Bawku Secondary School, in Ghana. After a year as a visiting scientist at MIT, he joined IBM in 1982. He is currently the manager of the Mathematics and Related Computer Science Department at the IBM Almaden Research Center, and a consulting professor in the Computer Science Department at Stanford. His major research interests are reasoning about knowledge, distributed computation, and logics of programs. He was program chairman and organizer of the first conference of Theoretical Aspects of Reasoning About Knowledge, program chairman of the Fifth ACM Symposium on Principles of Distributed Computing, and was the co-recipient (with Ronald Fagin) of the MIT Publisher's Prize for the Best Paper Paper at the 1985 International Joint Conference on Artificial Intelligence. Ronald Fagin is manager of the Foundations of Computer Science group at the IBM Almaden Research Center. He received his B.A. degree in mathematics from Dartmouth College in 1967 and his Ph.D. in mathematics, specializing in mathematical logic, from the University of California at Berkeley in 1973. He joined IBM in 1973 at the Thomas J. Watson Research Center. In 1975, he transferred to the San Jose Research Laboratory (now the IBM Almaden Research Center) where most of his research has centered on applications of logic to computer science. In particular, he has done research on the theory of relational databases and, more recently, on theories of knowledge and belief. He has received three IBM Outstanding Innovation Awards for his contributions to relational database theory, extendible hashing, and reasoning about knowledge. He was co-recipient (with Joe Halpern) of the MIT Press Publisher's Prize for the Best Paper at the 1985 International Joint Conference on Artificial Interlligence.Some material in this paper appeared in preliminary form in Halpern and Fagin (1985). An abridged version of the paper appeared in Vogt F (ed) Proceeding of Concurrency 88 (Lecture Notes in Computer Science Vol. 335) Springer-Verlag, 1988, pp 18–32  相似文献   

3.
This paper proposes the integration of internal and external clock synchronization by a combination of a fault-tolerant distributed algorithm for clock state correction with a central algorithm for clock rate correction. By means of hardware and simulation experiments it is shown that this combination improves the precision of the global time base in a distributed single cluster system while reducing the need for high-quality oscillators. Simulation results have shown that the rate-correction algorithm contributes not only in the internal clock synchronization of a single cluster system, but it can be used for external clock synchronization of a multi-cluster system with a reference clock. Therefore, deployment of the rate-correction algorithm integrates internal and external clock synchronization in one mechanism. Experimental results show that a failure in the clock rate correction will not hinder the distributed fault-tolerant clock state synchronization algorithm, since the state correction operates independently from the rate correction. The paper introduces new algorithms and presents experimental results on the achieved improvements in the precision measured in a time-triggered system. Results of simulation experiments of the new algorithms in single-cluster and multi-cluster configurations are also presented. Hermann Kopetz (Fellow, IEEE) received the Ph.D. degree in physics ísub auspiciis praesidentis from the University of Vienna, Vienna, Austria, in 1968. He was Manager of the Computer Process Control Department at Voest Alpine, Linz, Austria, and Professor of Computer Process Control, Technical University of Berlin, Berlin, Germany. He is currently Professor of Real-Time Systems, Vienna University of Technology, Vienna, Austria, and a Visiting Professor at the University of California, Irvine, and the University of California, Santa Barbara. In 1993, he was offered a position as Director of the Max Planck Institute, Saarbrcken, Germany. Prof. Kopetz is the key architect of the Time-Triggered Architecture. Astrit Ademaj (IEEE member) received the Dipl-Ing. degree (1995) at the University of Prishtina, Kosova, and a doctoral degree (2003) in computer science from the Technical University of Vienna. He is currently working as Assistant Professor at the Technical University of Vienna and as a Visiting Lecturer at the University of Prishtina. His research interests are design and validation of communication systems for safety-critical and real-time applications. He is a member of the IEEE Computer Society. Alexander Hanzlik received a diploma (1995) and a doctoral degree (2004) in computer science from the Technical University of Vienna. From 1995 to 1998, he was concerned with voice communication system design for air traffic control for the Service de Navigation Aérienne (STNA). Since 1998, his focus is on embedded systems in the fields of telecommunication, automation and process control. Since 2001, Dr. Hanzlik is a member of the Real-Time Systems Group and works as a research assistant at the Technical University of Vienna. His main research activities deal with fault-tolerant clock synchronization in distributed systems and simulation. Currently, he is working on SIDERA, a simulation model for time-triggered, dependable real-time architectures.  相似文献   

4.
Summary We show how synchronized clocks can be realized in a distributed system as a byproduct of a common communication paradigm where processors periodically perform broadcasts. Our approach decouples theprecision concern of clock synchronization—limiting how much correct clocks can differ from each other—from theaccuracy concern—limiting the rate at which any correct clock may drift from real time. Given a system that guarantees only precision, we develop a protocol whereby high accuracy can be achieved on demand. In this manner, the lazy protocol we obtain incurs the cost of high accuracy only when needed while keeping the basic synchronization procedure extremely simple and cheap. Rogério Drummond is Associate Professor of Computer Science at the Universidade de Campinas (Unicamp), Brazil. He received his Ph.D. in computer science from Cornell University in 1986. He has previously worked on distributed fault-tolerant computing, such as the present paper. Currently, he heads the A_HAND project which aims to provide an object-oriented distributed programming environment for the development of very large software systems. Özalp Babaolu is Professor of Computer Science at the University of Bologna, Italy. His research interests include distributed algorithms, fault tolerance and parallel computing. He received a BS in electrical engineering from George Washington University, Washington, D.C. in 1976. From the University of California, Berkeley, he received a MS in 1977 and a Ph.D. in 1981, both in computer science. While at Berkeley, he designed and implemented the virtual memory extensions to BSD Unix. From 1981 to 1987 he was on the faculty at the Department of Computer Science, Cornell University.Partial support for this work was provided by the National Science Foundation under Grant DCR-86-01864, AT&T under a Foundation Grant, the Commission of the European Communities under the ESPRIT Programme Basic Research Action Number 3092 (Predictably Dependable Computing Systems) and the Italian Ministry of University and Research. Drummond was partially supported through a Fellowship from the CAPES Agency of the Government of Brazil  相似文献   

5.
Many algorithms in distributed systems assume that the size of a single message depends on the number of processors. In this paper, we assume in contrast that messages consist of a single bit. Our main goal is to explore how the one-bit translation of unbounded message algorithms can be sped up by pipelining. We consider two problems. The first is routing between two processors in an arbitrary network and in some special networks (ring, grid, hypercube). The second problem is coloring a synchronous ring with three colors. The routing problem is a very basic subroutine in many distributed algorithms; the three coloring problem demonstrates that pipelining is not always useful. Amotz Bar-Noy received his B.Sc. degree in Mathematics and Computer Science in 1981, and his Ph.D. degree in Computer Science in 1987, both from the Hebrew University of Jerusalem, Israel. Between 1987 and 1989 he was a post-doctoral fellow in the Department of Computer Science at Stanford University. He is currently a visiting scientist at the IBM Thomas J. Watson Research Center. His current research interests include the theoretical aspects of distributed and parallel computing, computational complexity and combinatorial optimization. Joseph (Seffi) Naor received his B.A. degree in Computer Science in 1981 from the Technion, Israel Institute of Technology. He received his M.Sc. in 1983 and Ph.D. in 1987 in Computer Science, both from the Hebrew University of Jerusalem, Israel. Between 1987 and 1988 he was a post-doctoral fellow at the University of Southern California, Los Angeles, CA. Since 1988 he has been a post-doctoral fellow in the Department of Computer Science at Stanford University. His research interests include combinatorial optimization, randomized algorithms, computational complexity and the theoretical aspects of parallel and distributed computing. Moni Naor received his B.A. in Computer Science from the Technion, Israel Institute of Technology, in 1985, and his Ph.D. in Computer Science from the University of California at Berkeley in 1989. He is currently a visiting scientist at the IBM Almaden Research Center. His research interests include computational complexity, data structures, cryptography, and parallel and distributed computation.Supported in part by a Weizmann fellowship and by contract ONR N00014-85-C-0731Supported by contract ONR N00014-88-K-0166 and by a grant from Stanford's Center for Integrated Systems. This work was done while the author was a post-doctoral fellow at the University of Southern California, Los Angeles, CAThis work was done while the author was with the Computer Science Division, University of California at Berkeley, and Supported by NSF grant DCR 85-13926  相似文献   

6.
时钟同步算法的分析和比较   总被引:2,自引:0,他引:2  
在许多分布式实时系统中.,要求整个分布式系统上的各个处理器时钟彼此同步,因而就要采取各种手段进行同步的处理。时钟同步算法保证了空间上分散的处理器时钟彼此同步。该文研究了当今基于软件实现的忍受故障的几种时钟同步算法:确定性、概率型和统计型同步算法并进行特性分析。本文提出了结构化分析的方法,有助于帮助分布式系统的设计者选择最合适的算法结构、系统硬件构成、故障模型、时钟同步质量等。  相似文献   

7.
Three main parameters characterize the efficiency of algorithms that solve the Consensus Problem: the ratio between the total number of processors and the maximum number of faulty processors (n andt, respectively), the number of rounds, and the upper bound on the size of any message. In this paper we present a trade-off between the number of faulty processors and the number of rounds by exhibiting a family of algorithms in which processors communicate by one-bit messages. Letk be a positive integer and lets=t 1/k . The family includes algorithms where the number of processors is less than , and the number of rounds is less than . This family is based on a very simple algorithm with the following complexity: (2t+1)(t+1) processors,t+1 rounds, and one-bit message size. Amotz Bar-Noy received the B.A. degree in mathematics and computer science in 1981, and the Ph.D. degree in computer science in 1987, both from the Hebrew University of Jerusalem, Israel. He was a post-doctoral fellow at Stanford University, California, from October 1987 to September 1989. Since October 1989 he has been a visiting scientist at IBM T.J. Watson Research Center, Yorktown Heights, New York. His research interests include communication networks, distributed algorithms, parallel algorithms and fault-tolerant computing. Danny Dolev received a B.Sc. in physics from the Hebrew University, Jerusalem, in 1971, an M.Sc. in applied mathematics from the Weizmann Institute of Science, Israel, in 1973, and Ph.D. in computer science in 1979. After two years as a post doctoral fellow at Stanford and a year as a visiting scientist at IBM, he joined the Hebrew University, Jerusalem, in 1982, and IBM Almaden Research Center at 1987. His major research interests are distributed computing, reliability of distributed systems, and algorithms.A preliminary version of this paper appeared in the Aegean Workshop on Computing (AWOC), pp 380–390, 1988. This work was carried out while Dr. Bar-Noy was visiting Stanford University. Supported in part by a Weizmann fellowship, by contract ONR N00014-88-K-0166, and by a grant of Stanford's Center for Integrated Systems  相似文献   

8.
Appraising fairness in languages for distributed programming   总被引:1,自引:0,他引:1  
The relations among various languages and models for distributed computation and various possible definitions of fairness are considered. Natural semantic criteria are presented which an acceptable notion of fairness should satisfy. These are then used to demonstrate differences among the basic models, the added power of the fairness notion, and the sensitivity of the fairness notion to irrelevant semantic interleavings of independent operations. These results are used to show that from the considerable variety of commonly used possibilities, only strong process fairness is appropriate forCSP if these criteria are adopted. We also show that under these criteria, none of the commonly used notions of fairness are fully aceptable for a model with an n-way synchronization mechanism. The notion of fairness most often mentioned for Ada is shown to be fully acceptable. For a model with nonblockingsend operations, some variants of common fairness definitions are appraised, and two are shown to satisfy the suggested criteria. Krzysztof R. Apt was born in 1949 in Poland. Received his Ph.D. in 1974 from Polish Academy of Sciences in Warsaw in mathematical logic. From 1974 until 1981 worked at various scientific institutions in the Netherlands and from 1981 until 1987 at C.N.R.S. in Paris, France. Spent 1985 as a visiting scientist at IBM Research Centre in Yorktown Heights, U.S.A. Currently holding an Endowed Professorship at the Department of Computer Sciences at the University of Texas at Austin; also a senior research scientist at the Centre for Mathematics and Computer Science in Amsterdam, the Netherlands. His research interests include program correctness and semantics, methodology of distributed computing, use of logic as a programming language and non-standard forms of reasoning. He has served on editorial boards of a number of journals and program committees of numerous conferences in computer science. Lectured in a dozen countries on four continents. Also, he has run two marathons and crossed Sumatra on a bicycle. Shmuel Katz received his B.A. in Mathematics and English Literature from U.C.L.A., and his M.Sc. and Ph.D. in Computer Science (1976) from the Weizmann Institute in Rehovot, Israel. From 1976 to 1981 he was a researcher at the IBM Israel Scientific Center. Presently, he is a Senior Lecturer in the Computer Science Department at the Technion in Haifa, Israel. In 1977–78, he visited for a year at the University of California, Berkeley, and in 1984–85 was at the University of Texas at Austin. He has also been a consultant for the MCC Software Technology Program. His research interests include the methodology of programming, specification methods, program verification and semantics, distributed programming, data structures, and programming languages. Nissim Francez received his B.A. in Mathematics and Philosophy from the Hebrew University in Jerusalem, and his M.Sc. and Ph.D. in computer science (1976) from the Weizmann Institute of Science, Rehovot, Israel. In 1976–77 he spent a postdoctoral year at Queen's university, Belfast, where he was introduced by C.A.R. Hoare to CSP. In 1977–78 he was an assistant professor at USC, Los Angeles. From 1978 he is with the Computer Science Department at the Technion. In 1982–83 he was on a sabbatical leave at IBM T.J. Watson Research Center. He has been a consultant for MCC's software technology program, working on multiparty activities in distributed systems. He had summer appointments in Harvard University, IBM T.J. Watson Research Center, Utrecht University, CWI (Amsterdam) and at MCC. He also served in several program committees. His research interests include program verification and the semantics of programming languages, mainly for concurrent and distributed programming. Is also interested in logic programming and recursive query evaluation and in compiler constration. He is the author of the first book onFairness. Unfortunately, he is incapable of Marathon running...  相似文献   

9.
时钟同步技术是分布式系统中非常活跃的研究领域之一,由于大多数分布式系统实际上是不同步的,因此需要采用容错时钟同步算法确保消息通信的有界延迟,而基于假设检验的时钟同步技术可以避免因错失对两个高概率不同步时钟进行同步调整而造成系统不正常使用的情况。该文讨论了时钟同步的假设检验问题。除了假设检验、两类错误概率,还给出了概率最小时钟偏差、时钟同步概率等概念。在时钟偏差的统计分布特性近似于服务正态分布的假设条件之下,提出了基于非中心t分布的时钟同步假设检验方案。最后,基于服务器和客户端之间双向消息通信传输模式,给出时钟偏差的估计和检验样本。  相似文献   

10.
Reliable Broadcast is a mechanism by which a processor in a distributed system disseminates a value to all other processors in the presence of both communication and processor failures. Protocols to achieve Reliable Broadcast are at the heart of most fault-tolerant applications. We characterize the execution time of Reliable Broadcast protocols as a function of the communication model. This model includes familiar communication structures such as fully-connected point-to-point graphs, linear chains, rings, broadcast networks (such as Ethernet) and buses. We derive a parameterized protocol that implements Reliable Broadcast for any member within this class. We obtain lower bound results that show the optimality of our protocols. The lower bound results identify a time complexity gap between systems where processors may only fail to send messages, and systems where processors may fail both to send and to receive messages. The tradeoffs that our results reveal between performance, resiliency and network cost offer many new alternatives previously not considered in designing fault-tolerant systems. Özalp Babaoglu is Associate Professor in the Department of Computer Science at Cornell University, Ithaca, New York. His research interests include distributed systems, fault tolerance, performance evaluation and modeling. He received a BS in electrical engineering from George Washington University, Washington, D.C. in 1976. From the University of California, Berkeley, he received a MS in 1977 and a PhD in 1981, both in computer science. While at Berkeley, he designed and implemented the virtual memory extensions to VAX Unix that came to be known as 3. Obsd. Pat Stephenson is a Doctoral Candidate in the Computer Science Department at Cornell University, Ithaca, New York. His research interests include distributed systems and fault tolerance. In 1983, he received a B.A. (Mod.) in computer science from Trinity College, Dublin, Ireland. He received his MS in computer science from Cornell in 1986. He is currently working on new tradeoffs in the design of fault-tolerant algorithms. Rogério Drummond is Assistant Professor in the Computer Science Department at the Universidade Estadual de Campinas (UNICAMP), São Paulo, Brazil. His interests include distributed computing, fault tolerance and operating systems. He received a BS and a MS in computer science from the Universidade Estadual de Campinas in 1978 and 1980, respectively. In 1986 he received a PhD in computer science from Cornell University. He is currently working on integrated environments for the development of software and hardware.Partial support for this work was provided by the National Science Foundation under Grants DCR-86-01864 and MCS-82-10356 and AT&T under a Foundation GrantSupported partially by the Defense Advanced Research Projects Agency (DoD) under ARPA order 5378, Contract MDA 903-85-C-0124, and partially by an IBM graduate fellowship. The views, opinions and findings contained in this report are solely those of the authors and should not be construed as an official Department of Defense position, policy, or decisionSupported partially by the CAPES and CNPq agencies of the Ministry of Education of Brazil  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号