期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A distributed recovery block approach to fault-tolerant executionof application tasks in hypercubes

Kim K.H. Kavianpour A. 《Parallel and Distributed Systems, IEEE Transactions on》1993,4(1):104-111

An approach to fault-tolerant execution of real-time application tasks in hypercubes is proposed. The approach is based on the distributed recovery block (DRB) scheme and does not require special hardware mechanisms in support of fault tolerance. Each task is assigned to a pair of processors forming a DRB computing station for execution in a dual-redundant and self-checking mode. Assignment of all tasks in an application in such a form is called the full DRB mapping. The DRB scheme was developed as an approach to uniform treatment of hardware and software faults with the effect of fast forward recovery. However, if the system developer is concerned with hardware fault possibilities only, then forming DRB stations becomes a mechanical process not burdening the application software designer in any way. A procedure for converting an efficient nonredundant task-to-processor mapping into an efficient full DRB mapping is presented 相似文献

2.

基于RTEMS的软件容错系统设计

李小群《计算机应用研究》2009,26(3):911-913

在空间环境下运行的计算机系统,高空辐射可能引发各种各样的异常或错误而导致故障。为了提高系统的可靠性,同时尽可能减少对系统实时性能的影响,需要对其进行有效的容错。针对节点和应用软件的故障检测和故障恢复进行研究与分析,提出了多种灵活有效的软件容错策略与设计方案,并基于四节点的多机硬件体系结构和RTEMS软件操作系统,设计并实现了一个系统原型。运行结果显示,该方案有效地提高了嵌入式实时系统的可靠性。相似文献

3.

一种嵌入式实时系统软件能耗建模与分析的方法

祝义肖芳雄周航张广泉《计算机研究与发展》2014,51(4):848-855

随着嵌入式实时系统低能耗研究的不断深入,软件能耗已经成为影响系统的主要因素,并朝着定量分析方向发展.针对嵌入式实时系统缺乏有效的软件能耗建模与分析的方法,提出一种基于进程代数的嵌入式实时系统软件能耗建模与分析的方法.通过在时间通信顺序进程上扩展价格信息得到价格时间通信顺序进程,将嵌入式实时系统指令的功耗映射成价格时间通信顺序进程的价格,利用价格时间通信顺序进程对嵌入式实时系统软件能耗建模并进行量化分析,提出的最优路径算法可以对建模结果进行指令功耗可满足性检查,并计算当前最低能耗可达路径.该方法可以从很大程度上提高嵌入式实时系统软件能耗计算和分析的准确性,计算结果有助于嵌入式实时系统软件能耗的量化分析和优化设计. 相似文献

4.

Exploiting Redundancies to Enhance Schedulability in Fault-Tolerant and Real-Time Distributed Systems

Wei Luo Xiao Qin Xian-Chun Tan Ke Qin Manzanares A. 《IEEE transactions on systems, man, and cybernetics. Part A, Systems and humans : a publication of the IEEE Systems, Man, and Cybernetics Society》2009,39(3):626-639

In the past decades, distributed systems have been widely applied to real-time applications, most of which have fault-tolerance requirements to assure high reliability. Due to the stringent space constraints of real-time systems, the issue of schedulability becomes a major concern in the design of fault-tolerant and real-time distributed systems. Most existing real-time and fault-tolerant scheduling algorithms, which are based on the primary-backup scheme for periodic real-time tasks, introduce unnecessary redundancies by aggressively using active-backup copies. To solve this problem, we propose two novel fault-tolerant techniques, which are seamlessly integrated with fixed-priority-based scheduling algorithms. These techniques leverage redundancies to enhance schedulability in fault-tolerant and real-time distributed systems. Our fault-tolerant techniques make use of the primary-backup scheme to tolerate permanent hardware failures. The first technique (referred to as Tercos) terminates the execution of active-backup copies, when corresponding primary copies are successfully completed. Tercos is designed to reduce scheduling lengths in fault-free scenarios to enhance schedulability by virtue of executing portions of active-backup copies in passive forms. The second technique (referred to as Debus) uses a deferred-active-backup scheme to further minimize schedule lengths to improve the schedulability performance. Debus schedules active-backup copies as late as possible, while terminating active-backup copies when their primary copies are completed. Experimental results show that, compared with existing algorithms in literature, Tercos can significantly improve schedulability by up to 17.0% (with an average of 9.7%). Furthermore, empirical results reveal that Debus can enhance schedulability over Tercos by up to 12% (with an average of 7.8%). 相似文献

5.

软件容错技术在视频监控系统中的应用

包剑冀常鹏李义杰《微机发展》2005,15(7):108-110

现代视频监控系统是高度集成系统,对各种数据进行准确及时的传送和处理是视频监控系统的一个突出的要求。通过软件容错技术,实现了视频监控系统的解决方案。研究了软件容错技术的基本原理及特点,软件容错的基本方法是恢复块RB、多版本编程NVP技术及异常处理技术,以及实现软件容错所需的相关技术支持。在视频监控系统应用软件容错技术可以有效地保证系统的可靠性。文中的方案可以推广到相关控制系统。相似文献

6.

Reliability growth modeling and optimal release policy under fuzzy environment of an N-version programming system incorporating the effect of fault removal efficiency

P. K. Kapur Anshu Gupta P.C. Jha 《国际自动化与计算杂志》2007,4(4):369-379

Failure of a safety critical system can lead to big losses.Very high software reliability is required for automating the working of systems such as aircraft controller and nuclear reactor controller software systems.Fault-tolerant softwares are used to increase the overall reliability of software systems.Fault tolerance is achieved using the fault-tolerant schemes such as fault recovery (recovery block scheme),fault masking (N-version programming (NVP)) or a combination of both (Hybrid scheme).These softwares incorporate the ability of system survival even on a failure.Many researchers in the field of software engineering have done excellent work to study the reliability of fault-tolerant systems.Most of them consider the stable system reliability.Few attempts have been made in reliability modeling to study the reliability growth for an NVP system.Recently,a model was proposed to analyze the reliability growth of an NVP system incorporating the effect of fault removal efficiency.In this model,a proportion of the number of failures is assumed to be a measure of fault generation while an appropriate measure of fault generation should be the proportion of faults removed.In this paper,we first propose a testing efficiency model incorporating the effect of imperfect fault debugging and error generation.Using this model,a software reliability growth model (SRGM) is developed to model the reliability growth of an NVP system.The proposed model is useful for practical applications and can provide the measures of debugging effectiveness and additional workload or skilled professional required.It is very important for a developer to determine the optimal release time of the software to improve its performance in terms of competition and cost.In this paper,we also formulate the optimal software release time problem for a 3VP system under fuzzy environment and discuss a the fuzzy optimization technique for solving the problem with a numerical illustration. 相似文献

7.

双时滞系统的故障诊断和动态最优容错控制 总被引：2，自引：0，他引：2

李娟叶若红《控制理论与应用》2008,25(6):1021-1026

对含有状态时滞和控制时滞的线性时滞系统, 研究系统发生不可直接测量的传感器故障和执行故障时的故障诊断和最优容错控制问题. 首先基于时滞系统的线性变换, 利用Riccati矩阵方程和Sylvester方程设计了故障情况下的最优容错控制律, 并证明了最优容错控制律的存在唯一性. 然后通过构造一种新的含有故障的增广系统的降维状态观测器, 实现了故障的实时在线诊断和系统状态的观测, 解决了最优容错控制的物理不可实现问题. 最后利用故障诊断的结果给出了物理可实现的动态最优容错控制律. 仿真实例验证了故障诊断方法和动态最优容错控制方法的可行性和有效性. 相似文献

8.

Fault-tolerant model predictive control of a direct methanol-fuel cell system with actuator faults

《Control Engineering Practice》2017

This paper investigates fault tolerant model predictive control (MPC) of a direct methanol fuel cell (DMFC) system with several faults in the methanol feeding pump. An active FTMPC strategy with a hierarchal structural design is developed. The focus here is on fault detection and isolation (FDI) and the implementation of fault-tolerant strategies within the control algorithm. To this end, a model-based FDI scheme with virtual sensors is first developed by means of the real-time diagnosis of fault occurrence during operation. Thereby, several faults in the methanol pump are characterized and the information integrated into the MPC algorithm in each fault case. Strategies are presented to reconfigure the active fault-tolerant MPC to keep the DMFC system stable in case of a feeding failure. Moreover, economic, stability and lifetime characteristics are also integrated into the active fault-tolerant MPC. The proposed FDI and FTMPC scheme is tested experimentally in a DMFC test rig with a 5-cell DMFC stack to demonstrate the effectiveness and robustness of the designed approach. Several fault scenarios with the FTMPC are shown. Particularly in the case of fuel cells, fault tolerance is necessary to meet the goals of long-lasting system stability and efficiency. 相似文献

9.

An Overview of the Integrated Formalism RT-Z

Carsten Sühl 《Formal Aspects of Computing》2002,13(2):94-110

We present an integration of the formal specification languages Z and timed CSP, called RT-Z, incorporating their combined strengths in a coherent frame. To cope with complex systems, RT-Z is equipped with structuring constructs built on top of the integration, because both Z and timed CSP lack appropriate facilities. The formal semantics of RT-Z, based on the denotational semantics of Z and timed CSP, is a prerequisite for preciseness and mathematical rigour. RT-Z is intended to be used in the requirements definition and design phases of the system and software development process. The envisaged application area is the development of real-time embedded systems. Received September 2000 / Accepted in revised form June 2001 相似文献

10.

Foundations of a new software engineering method for real-time systems

Isabelle Perseil Laurent Pautet 《Innovations in Systems and Software Engineering》2008,4(3):195-202

The design of a fault-tolerant distributed, real-time, embedded system with safety-critical concerns requires the use of formal languages. In this paper, we present the foundations of a new software engineering method for real-time systems that enables the integration of semiformal and formal notations. This new software engineering method is mostly based upon the ”COntinuuM” co-modeling methodology that we have used to integrate architecture models of real-time systems (Perseil and Pautet in 12th International conference on engineering of complex computer systems, ICECCS, IEEE Computer Society, Auckland, pp 371–376, 2007) (so we call it “Method C”), and a model-driven development process (ISBN 978-0-387-39361-2 in: From model-driven design to resource management for distributed embedded systems, Springer, chap. MDE benefits for distributed, real time and embedded systems, 2006). The method will be tested in the design and development of integrated modular avionics (IMA) frameworks, with DO178, DO254, DO297, and MILS-CC requirements. 相似文献

11.

基于POOSL的系统性能建模与性能分析

刘付娥葛宁周祖成《微计算机信息》2007,23(16):295-298

本文介绍了面向对象的并行描述语言(POOSL,ParallelObject-OrientedSpecificationLanguage)的基本语义语法及相关的建模工具,并通过对一个基本的包交换系统的建模和分析,说明了利用POOSL进行软硬件系统的性能建模和性能分析的基本方法。相似文献

12.

Copilot: monitoring embedded systems

Lee Pike Nis Wegmann Sebastian Niller Alwyn Goodloe 《Innovations in Systems and Software Engineering》2013,9(4):235-255

Runtime verification (RV) is a natural fit for ultra-critical systems that require correct software behavior. Due to the low reliability of commodity hardware and the adversity of operational environments, it is common in ultra-critical systems to replicate processing units (and their hosted software) and incorporate fault-tolerant algorithms to compare the outputs, even if the software is considered to be fault-free. In this paper, we investigate the use of software monitoring in distributed fault-tolerant systems and the implementation of fault-tolerance mechanisms using RV techniques. We describe the Copilot language and compiler that generates monitors for distributed real-time systems, and we discuss two case-studies in which Copilot-generated monitors were used to detect onboard software and hardware faults and monitor air-ground data link messaging protocols. 相似文献

13.

容错优先级可提升的抢占阈值容错调度算法

丁万夫郭锐锋秦承刚刘娴郭凤钊《软件学报》2011,22(12):2894-2904

基于软件容错模型,提出了允许容错优先级提升的抢占阈值容错调度算法(extended fault-tolerantfixed-priority with preemption threshold,简称FT-FPPT*).该算法能够在抢占式容错调度算法(fault-tolerantfixed-priority preemptive,简称FT-FPP)和抢占阈值容错调度算法(fault-tolerant fixed-priority with preemptionthreshold,简称FT-FPPT)无法提高系统容错能力的情况下,进一步提高系统的容错能力.为了获得系统中任务优先级分配的最佳策略,基于任务最坏响应时间的可调度性分析,提出了一种最优的优先级配置搜索算法(priorityassignment search algorithm,简称PASA).经过深入分析和实验证明,与FT-FPPT算法相比,FT-FPPT*算法能够有效地提高硬实时系统的容错能力. 相似文献

14.

Experience with modularity in consul

Shivakant Mishra Larry L. Peterson Richard D. Schlichting 《Software》1993,23(10):1059-1075

The use of modularity in the design and implementation of complex software simplifies the development process, as well as facilitating the construction of customized configurations. This paper describes our experience using modularity in Consul, a communication substrate used for constructing fault-tolerant distributed programs. First, Consul is presented as a case study of how modularity is feasible in both the design and the implementation of such systems. Secondly, general lessons about modularity in fault-tolerant systems based on our experience with Consul are given. Issues that are addressed include deciding how the system is divided into various modules, dealing with problems that result when protocols are combined, and ensuring that the underlying object infrastructure provides adequate support. The key observation is that the modularization process is most affected by dependencies between modules, both direct dependencies caused by one module explicitly using another's operation and indirect dependencies where one module is affected by another without direct interaction. Although our observations are based on designing and implementing Consul, the lessons are applicable to any fault-tolerant distributed system. 相似文献

15.

异构分布式实时仿真系统的容错调度算法 总被引：1，自引：0，他引：1

刘云生张童张传富查亚兵《软件学报》2006,17(10):2040-2047

异构分布式实时仿真系统是一类特殊的实时系统,基于改进的SP(spare processor)容错模型(checkpoint-based spare processor,简称CSP)对其容错问题进行了研究.首先,根据仿真系统的特点提出了两个命题,这是后续工作的基础;而后,基于Markov链对仿真任务的最坏反应时间进行了分析,并提出了仿真任务的可调度性分析规则;最后,基于CSP容错模型和上述可调度分析规则提出了异构分布式实时仿真系统的容错调度算法CSP-RTFT.算法的仿真结果表明:该算法较之基于SP模型的算法SP-RTFT可获得更好的稳定性、更高的任务接收率;缺点是资源利用率比PB模型下的算法要低. 相似文献

16.

Architectural support for designing fault-tolerant open distributedsystems

Hariri S. Choudhary A. Sarikaya B. 《Computer》1992,25(6):50-62

An overview of the main techniques for designing fault-tolerant software and hardware systems is provided. The important features of the building blocks (computers, memories, buses, etc.) that can support an efficient implementation of fault-tolerant open distributed systems (FTODSs) are identified. Taking into account the features of these building blocks, an organization for FTODS is proposed. A distributed voting algorithm and a two-level hierarchy for permanent memory are key elements in this scheme. The algorithms needed for transferring files and synchronizing the concurrent activities of the computing modules and for recovery-are ISO standard protocols. Low-level voting and recovery algorithms that can run as a layer of software above the operating system make the open distributed system an attractive environment for applying fault-tolerant techniques 相似文献

17.

Timed Automata Patterns

Dong Jin Song Hao Ping Qin Shengchao Sun Jun Yi Wang 《IEEE transactions on pattern analysis and machine intelligence》2008,34(6):844-859

Timed Automata have proven to be useful for specification and verification of real-time systems. System design using Timed Automata relies on explicit manipulation of clock variables. A number of automated analyzers for Timed Automata have been developed. However, Timed Automata lack of composable patterns for high-level system design. Logic-based specification languages like Timed CSP and TCOZ are well suited for presenting compositional models of complex real-time systems. In this work, we define a set of composable Timed Automata patterns based on hierarchical constructs in timed enriched process algebras. The patterns facilitate hierarchical design of complex systems using Timed Automata. They also allow a systematic translation from Timed CSP/TCOZ models to Timed Automata so that analyzers for Timed Automata can be used to reason about TCOZ models. A prototype has been developed to support system design using Timed Automata patterns or, if given a TCOZ specification, to automate the translation from TCOZ to Timed Automata. 相似文献

18.

Intelligent fault-tolerant control using adaptive and learning methods

《Control Engineering Practice》2002,10(8):801-817

Stimulated by the growing demand for improving system performance and reliability, fault-tolerant system design has been receiving significant attention. This paper proposes a new fault-tolerant control methodology using adaptive estimation and control approaches based on the learning capabilities of neural networks or fuzzy systems. On-line approximation-based stable adaptive neural/fuzzy control is studied for a class of input–output feedback linearizable time-varying nonlinear systems. This class of systems is large enough so that it is not only of theoretical interest but also of practical applicability. Moreover, the fault-tolerance ability of the adaptive controller has been further improved by exploiting information estimated from a fault-diagnosis unit designed by interfacing multiple models with an expert supervisory scheme. Simulation examples for a fault-tolerant jet engine control problem are given to demonstrate the effectiveness of the proposed scheme. 相似文献

19.

The Real-Time Process Algebra (RTPA)

Yingxu Wang 《Annals of Software Engineering》2002,14(1-4):235-274

The real-time process algebra (RTPA) is a set of new mathematical notations for formally describing system architectures, and static and dynamic behaviors. It is recognized that the specification of software behaviors is a three-dimensional problem known as: (i) mathematical operations, (ii) event/process timing, and (iii) memory manipulations. Conventional formal methods in software engineering were designed to describe the 1-D (type (i)) or 2-D (types (i) and (iii)) static behaviors of software systems via logic, set and type theories. However, they are inadequate to address the 3-D problems in real-time systems. A new notation system that is capable to describe and specify the 3-D real-time behaviors, the real-time process algebra (RTPA), is developed in this paper to meet the fundamental requirements in software engineering.RTPA is designed as a coherent software engineering notation system and a formal engineering method for addressing the 3-D problems in software system specification, refinement, and implementation, particularly for real-time and embedded systems. In this paper, the RTPA meta-processes, algebraic relations, system architectural notations, and a set of fundamental primary and abstract data types are described. On the basis of the RTPA notations, a system specification method and a refinement scheme of RTPA are developed. Then, a case study on a telephone switching system is provided, which demonstrates the expressive power of RTPA on formal specification of both software system architectures and behaviors. RTPA elicits and models 32 algebraic notations, which are the common core of existing formal methods and modern programming languages. The extremely small set of formal notations has been proven sufficient for modeling and specifying real-time systems, their architecture, and static/dynamic behaviors in real-world software engineering environment. 相似文献

20.

An approach for machine-assisted verification of Timed CSP specifications

Thomas Göthel Sabine Glesner 《Innovations in Systems and Software Engineering》2010,6(3):181-193

The real-time process calculus Timed CSP is capable of expressing properties such as deadlock-freedom and real-time constraints. It is therefore well-suited to model and verify embedded software. However, proofs about Timed CSP specifications are not ensured to be correct since comprehensive machine-assistance for Timed CSP is not yet available. In this paper, we present our formalization of Timed CSP in the Isabelle/HOL theorem prover, which we have formulated as an operational coalgebraic semantics together with bisimulation equivalences and coalgebraic invariants. This allows for semi-automated and mechanically checked proofs about Timed CSP specifications. Mechanically checked proofs enhance confidence in verification because corner cases cannot be overlooked. We additionally apply our formalization to an abstract specification with real-time constraints. This is the basis for our current work, in which we verify a simple real-time operating system deployed on a satellite. As this operating system has to cope with arbitrarily many threads, we use verification techniques from the area of parameterized systems for which we outline their formalization. 相似文献