期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Developing fault-tolerant distributed loops

A.A. Farrag 《Information Processing Letters》2010,111(2):97-101

Distributed loops are highly regular structures that have been applied to the design of many locally distributed systems. This family of networks includes many important configurations such as rings and circulant graphs, for examples. In this paper, we examine the problem of extending a distributed loop so as to tolerate any number of node failures. We study this problem when the parameters that define the loop are given numerically as constants, or symbolically as variables. Our results indicate that the (fault-tolerant) solutions obtained are efficient. 相似文献

2.

Developing fault-tolerant distributed loops

《Information Processing Letters》2011,111(2):97-101

Distributed loops are highly regular structures that have been applied to the design of many locally distributed systems. This family of networks includes many important configurations such as rings and circulant graphs, for examples. In this paper, we examine the problem of extending a distributed loop so as to tolerate any number of node failures. We study this problem when the parameters that define the loop are given numerically as constants, or symbolically as variables. Our results indicate that the (fault-tolerant) solutions obtained are efficient. 相似文献

3.

Task allocation in fault-tolerant distributed systems

Joseph A. Bannister Kishor S. Trivedi 《Acta Informatica》1983,20(3):261-281

相似文献

4.

Symbolic simulation of synchronous programs

David Garriou 《Electronic Notes in Theoretical Computer Science》2002,65(5)

相似文献

5.

Reconciling fault-tolerant distributed computing and systems-on-chip

Matthias Függer Ulrich Schmid 《Distributed Computing》2012,24(6):323-355

Classic distributed computing abstractions do not match well the reality of digital logic gates, which are the elementary building blocks of Systems-on-Chip (SoCs) and other Very Large Scale Integrated (VLSI) circuits: Massively concurrent, continuous computations undermine the concept of sequential processes executing sequences of atomic zero-time computing steps, and very limited computational resources at gate-level make even simple operations prohibitively costly. In this paper, we introduce a modeling and analysis framework based on continuous computations and zero-bit message channels, and employ this framework for the correctness & performance analysis of a distributed fault-tolerant clocking approach for Systems-on-Chip (SoCs). Starting out from a “classic” distributed Byzantine fault-tolerant tick generation algorithm, we show how to adapt it for direct implementation in clockless digital logic, and rigorously prove its correctness and derive analytic expressions for worst case performance metrics like synchronization precision and clock frequency. Rather than on absolute delay values, both the algorithm’s correctness and the achievable synchronization precision depend solely on the ratio of certain path delays. Since these ratios can be mapped directly to placement & routing constraints, there is typically no need for changing the algorithm when migrating to a faster implementation technology and/or when using a slightly different layout in an SoC. 相似文献

6.

Ensuring fault-tolerant computations in distributed control systems

V. I. Klepikov 《Automation and Remote Control》2013,74(12):2112-2121

This paper demonstrates that the fault tolerance of distributed control systems (DCSs) can be improved by scheduling of processes representing functional segments with guaranteed operation of the mechanisms of process reexecution and parallel execution based on checkpoints. Moreover, we suggest the methodological approach to assessing the fault tolerance level of DCSs, which proceeds from the probabilistic modeling of systems having the time triggered architecture (TTA). Finally, we derive numerical formulas for qualitative and quantitative estimation of the fault tolerance level for different modifications of DCSs at the design stage. 相似文献

7.

Error masking in computer programs

Janusz Laski Wojciech Szermer Piotr Luczycki 《Software Testing, Verification and Reliability》1995,5(2):81-105

Programming faults are defined in the framework of a program proof outline. A component C in program P is faulty if P cannot be proved correct with the current implementation of C but it can be proved using the design specification for C, which defines the role of C in the overall design of P. A programming error is a state that violates the implementation specification of C. The error is masked if the design specification is satisfied by the incorrect state. Given a passing test t, the probability of error masking is 1–s(t), where s(t) is the sensitivity of t. Dynamic mutation testing (DMT), a Monte Carlo method, is used to estimate s(t). DMT is extended to estimate the probability of the existence of a hidden fault in C for a passing test suite. That probabillity can be viewed as a metric that measures the quality of the passing test suite. 相似文献

8.

分布式容错计算机通讯系统的实现

金信苗《微计算机信息》2006,22(23):263-265

本文叙述了按照分布式动态更换通讯模式协议,简称DDCCPP(DistributedDynamicChangeCommunicationPatternProto-col)设计和实现的实用于分布式容错计算机的通讯系统,它为系统提供了可靠的硬核,实际运行结果令人满意。相似文献

9.

Specifications of distributed programs

Barbara Liskov William Weihl 《Distributed Computing》1986,1(2):102-118

This paper discusses informal specifications of distributed programs, that is, programs that reside at nodes connected by a network. Such programs often have performance requirements, such as high availability and concurrency, that make it difficult to specify their behavior. These requirements often have an effect on the functional behavior of a program, forcing designers to change their initial expectations. In this paper we show how to give user-oriented specifications of the functional behavior of programs with such requirements. We propose a structure for specifications that distinguishes expected and desirable effects from undesirable ones. We believe that this distinction is an important one for both users and implementers of a system, and that it makes the specifications easier to understand. We illustrate our approach by giving example specifications of several distributed programs that have been described in the literature. 相似文献

10.

Symbolic bounded synthesis

Rüdiger Ehlers 《Formal Methods in System Design》2012,40(2):232-262

Synthesizing finite-state systems from full linear-time temporal logic (LTL) is an ambitious way to tackle the challenge of constructing correct-by-construction systems. One particularly promising approach in this context is bounded synthesis, originally proposed by Schewe and Finkbeiner, which in turn builds upon Safraless synthesis, as described by Kupferman and Vardi. Previous implementations of these approaches performed the computation either in an explicit way or used symbolic data structures other than binary decision diagrams (BDDs). In this paper, we reconsider BDDs as state space representation and use it as data structure for bounded synthesis. The key to this construction is the application of two novel optimisation techniques that decrease the number of state bits in such a representation significantly. The first technique uses signalling bits to connect sub-games representing the safety- and non-safety parts of the specification. The second technique is based on a closer analysis of the step of building a safety game from a universal automaton and uses a sufficient condition to remove some so-called counters from the state space of the game. 相似文献

11.

Symbolic predictive analysis for concurrent programs

Chao Wang Sudipta Kundu Rhishikesh Limaye Malay Ganai Aarti Gupta 《Formal Aspects of Computing》2011,23(6):781-805

Predictive analysis aims at detecting concurrency errors during runtime by monitoring a concrete execution trace of a concurrent program. In recent years, various models based on the happens-before causality relations have been proposed for predictive analysis. However, these models often rely on only the observed runtime events and typically do not utilize the program source code. Furthermore, the enumerative algorithms they use for verifying safety properties in the predicted traces often suffer from the interleaving explosion problem. In this paper, we introduce a precise predictive model based on both the program source code and the observed execution events, and propose a symbolic algorithm to check whether a safety property holds in all feasible permutations of events of the given trace. Rather than explicitly enumerating and checking the interleavings, our method conducts the search using a novel encoding and symbolic reasoning with a satisfiability modulo theory solver. We also propose a technique to bound the number of context switches allowed in the interleavings during the symbolic search, to further improve the scalability of the algorithm. 相似文献

12.

分布式计算集群容错系统的设计与实现

万玮杨志义《计算机工程与设计》2005,26(10):2811-2813,2816

为了提高分布式计算集群系统的可靠性，增强系统的容错能力，使系统在局部出错的情况下仍能稳定正常运行，建立了一个容错系统模型，该模型采用两级容错机制即节点级容错和任务级容错。此模型为分布式计算集群系统下的容错的进一步研究建立了基础。相似文献

13.

Nagging: A scalable fault-tolerant paradigm for distributed search

Alberto Maria Segre Sean Forman Giovanni Resta Andrew Wildenberg 《Artificial Intelligence》2002,140(1-2):71-106

This paper describes nagging, a technique for parallelizing search in a heterogeneous distributed computing environment. Nagging exploits the speedup anomaly often observed when parallelizing problems by playing multiple reformulations of the problem or portions of the problem against each other. Nagging is both fault tolerant and robust to long message latencies. In this paper, we show how nagging can be used to parallelize several different algorithms drawn from the artificial intelligence literature, and describe how nagging can be combined with partitioning, the more traditional search parallelization strategy. We present a theoretical analysis of the advantage of nagging with respect to partitioning, and give empirical results obtained on a cluster of 64 processors that demonstrate nagging's effectiveness and scalability as applied to A^* search, β minimax game tree search, and the Davis–Putnam algorithm. 相似文献

14.

A fault-tolerant distributed subcube management scheme forhypercube multicomputer systems

Chen Y.-L. Liu J.-C. 《Parallel and Distributed Systems, IEEE Transactions on》1995,6(7):766-772

This paper proposes a fault-tolerant distributed subcube management scheme for hypercube multicomputer systems. Gracefully degradable subcube management is supported by a data structure, called the distributed subcube table (DST), and a fault-tolerant broadcast protocol, called the reliably synchronized broadcast (RSB). In an n-dimensional hypercube, DST is the collection of 2ⁿ local subcube tables (LSTs), DST={LST₀, LST, ..., LST_2-1 ⁿ}, where LST, is a bit-mapped table assigned to N_x, a fault-free node whose address is x. LST_x, ∀_x, is n+1 bits long, and it records the status (free/busy) of certain subcubes adjacent to N_x. The RSB diagnoses and avoids faults during interprocessor communication to prevent faulty nodes from being allocated for job execution. In addition to possessing a fault-tolerant design, our scheme can also achieve comparable or better performance than existing centralized schemes, as verified by extensive simulation 相似文献

15.

Combination of clock-state and clock-rate correction in fault-tolerant distributed systems

Hermann Kopetz Astrit Ademaj Alexander Hanzlik 《Real-Time Systems》2006,33(1-3):139-173

This paper proposes the integration of internal and external clock synchronization by a combination of a fault-tolerant distributed algorithm for clock state correction with a central algorithm for clock rate correction. By means of hardware and simulation experiments it is shown that this combination improves the precision of the global time base in a distributed single cluster system while reducing the need for high-quality oscillators. Simulation results have shown that the rate-correction algorithm contributes not only in the internal clock synchronization of a single cluster system, but it can be used for external clock synchronization of a multi-cluster system with a reference clock. Therefore, deployment of the rate-correction algorithm integrates internal and external clock synchronization in one mechanism. Experimental results show that a failure in the clock rate correction will not hinder the distributed fault-tolerant clock state synchronization algorithm, since the state correction operates independently from the rate correction. The paper introduces new algorithms and presents experimental results on the achieved improvements in the precision measured in a time-triggered system. Results of simulation experiments of the new algorithms in single-cluster and multi-cluster configurations are also presented. Hermann Kopetz (Fellow, IEEE) received the Ph.D. degree in physics ísub auspiciis praesidentis from the University of Vienna, Vienna, Austria, in 1968. He was Manager of the Computer Process Control Department at Voest Alpine, Linz, Austria, and Professor of Computer Process Control, Technical University of Berlin, Berlin, Germany. He is currently Professor of Real-Time Systems, Vienna University of Technology, Vienna, Austria, and a Visiting Professor at the University of California, Irvine, and the University of California, Santa Barbara. In 1993, he was offered a position as Director of the Max Planck Institute, Saarbrcken, Germany. Prof. Kopetz is the key architect of the Time-Triggered Architecture. Astrit Ademaj (IEEE member) received the Dipl-Ing. degree (1995) at the University of Prishtina, Kosova, and a doctoral degree (2003) in computer science from the Technical University of Vienna. He is currently working as Assistant Professor at the Technical University of Vienna and as a Visiting Lecturer at the University of Prishtina. His research interests are design and validation of communication systems for safety-critical and real-time applications. He is a member of the IEEE Computer Society. Alexander Hanzlik received a diploma (1995) and a doctoral degree (2004) in computer science from the Technical University of Vienna. From 1995 to 1998, he was concerned with voice communication system design for air traffic control for the Service de Navigation Aérienne (STNA). Since 1998, his focus is on embedded systems in the fields of telecommunication, automation and process control. Since 2001, Dr. Hanzlik is a member of the Real-Time Systems Group and works as a research assistant at the Technical University of Vienna. His main research activities deal with fault-tolerant clock synchronization in distributed systems and simulation. Currently, he is working on SIDERA, a simulation model for time-triggered, dependable real-time architectures. 相似文献

16.

Symbolic decision procedure for termination of linear programs 总被引：2，自引：0，他引：2

Bican Xia Lu Yang Naijun Zhan Zhihai Zhang 《Formal Aspects of Computing》2011,23(2):171-190

Tiwari proved that the termination of a class of linear programs is decidable in Tiwari (Proceedings of CAV’04. Lecture notes in computer science, vol 3114, pp 70–82, 2004). The decision procedure proposed therein depends on the computation of Jordan forms. Thus, people may draw a wrong conclusion from this procedure, if they simply apply floating-point computation to compute Jordan forms. In this paper, we first use an example to explain this problem, and then present a symbolic implementation of the decision procedure. Thus, the rounding error problem is therefore avoided. Moreover, we also show that the symbolic decision procedure is as efficient as the numerical one given in Tiwari (Proceedings of CAV’04. Lecture notes in computer science, vol 3114, pp 70–82, 2004). The complexity of former is max{O(n ⁶), O(n ^m+3)}, while that of the latter is O(n ^m+3), where n is the number of variables of the program and m is the number of its Boolean conditions. In addition, for the case when the characteristic polynomial of the assignment matrix is irreducible, we design a more efficient symbolic algorithm whose complexity is max(O(n ⁶), O(mn ³)). 相似文献

17.

Model-checking multi-threaded distributed Java programs

Scott D. Stoller 《International Journal on Software Tools for Technology Transfer (STTT)》2002,4(1):71-91

State-space exploration is a powerful technique for verification of concurrent software systems. Applying it to software systems written in standard programming languages requires powerful abstractions (of data) and reductions (of atomicity), which focus on simplifying the data and control, respectively, by aggregation. We propose a reduction that exploits a common pattern of synchronization, namely, the use of locks to protect shared data structures. This pattern of synchronization is particularly common in concurrent Java programs, because Java provides built-in locks. We describe the design of a new tool for state-less state-space exploration of Java programs that incorporates this reduction. We also describe an implementation of the reduction in Java PathFinder, a more traditional state-space exploration tool for Java programs. Published online: 2 October 2002 RID="*" ID="*"Present address: Computer Science Dept., SUNY at Stony Brook, Stony Brook, NY 11794-4400, USA. The author gratefully acknowledges the support of ONR under Grants N00014-99-1-0358 and N00014-01-1-0109 and the support of NSF under Grant CCR-9876058. 相似文献

18.

Privacy masking distributed saddle-point algorithm for dynamic economic dispatch

Xu Kaihui Li Jueyou Chen Guo 《Neural computing & applications》2023,35(11):8109-8123

Neural Computing and Applications - In smart grids, the goal of the dynamic economic dispatch problem (DEDP) is to obtain the optimal dispatch schedule for each generating unit in a set of periods... 相似文献

19.

Monitoring and debugging distributed realtime programs

Paul S. Dodd Chinya V. Ravishankar 《Software》1992,22(10):863-877

In this paper we describe the design and implementation of an integrated monitoring and debugging system for a distributed real-time computer system. The monitor provides continuous, transparent monitoring capabilities throughout a real-time system's lifecycle with bounded, minimal, predictable interference by using software support. The monitor is flexible enough to observe both high-level events that are operating system- and application-specific, as well as low-level events such as shared variable references. We present a novel approach to monitoring shared variable references that provides transparent monitoring with low overhead. The monitor is designed to support tasks such as debugging realtime applications, aiding real-time task scheduling, and measuring system performance. Since debugging distributed real-time applications is particularly difficult, we describe how the monitor can be used to debug distributed and parallel applications by deterministic execution replay. 相似文献

20.

Handling timing errors in distributed programs

Gordon A.J. Finkel R.A. 《IEEE transactions on pattern analysis and machine intelligence》1988,14(10):1525-1535

The authors describe a tool called TAP, which is defined to aid the programmer in discovering the causes of timing errors in running programs. TAP is similar to a postmortem debugger, using the history of interprocess communication to construct a timing graph, a directed graph where an edge joins node x to node y if event x directly precedes event y in time. The programmer can then use TAP to look at the graph to find the events that occurred in an unacceptable order. Because of the nondeterministic nature of distributed programs, the authors feel a history-keeping mechanism but always be active so that bugs can be dealt with as they occur. The goal is to collect enough information at run time to construct the timing graph if needed. Since it is always active, this mechanism must be efficient. The authors also describe experiments run using TAP and report the impact that TAP's history-keeping mechanism has on the running time of various distributed programs 相似文献