首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
可重构电子系统芯片级在线自主容错方法研究   总被引:4,自引:2,他引:2  
可重构电子系统芯片固定型故障的传统容错设计往往采用集中式控制方法,存在测试时间长、硬件资源利用率低、对外部控制器依赖性高等问题。因此,设计了一种具有分布式自主容错能力的可重构细胞阵列,通过将细胞内部查找表输出与参考值进行比较的方式进行循环检测,并利用冗余存储单元对故障查找表进行修复。以四位并行乘法器为例进行仿真验证,实验结果表明,新型可重构阵列的自主容错设计方法,比现有设计的硬件开销小,修复时间短,容错能力强,且设计复杂度不受阵列规模影响。  相似文献   

2.
The rapid increase in the use of microprocessor-based systems in critical areas, where failures imply risks to human lives, to the environment or to expensive equipment, significantly increased the need for dependable systems, able to detect, tolerate and eventually correct faults. The verification and validation of such systems is frequently performed via fault injection, using various forms and techniques. However, as electronic devices get smaller and more complex, controllability and observability issues, and sometimes real time constraints, make it harder to apply most conventional fault injection techniques. This paper proposes a fault injection environment and a scalable methodology to assist the execution of real-time fault injection campaigns, providing enhanced performance and capabilities. Our proposed solutions are based on the use of common and customized on-chip debug (OCD) mechanisms, present in many modern electronic devices, with the main objective of enabling the insertion of faults in microprocessor memory elements with minimum delay and intrusiveness. Different configurations were implemented starting from basic Components Off-The-Shelf (COTS) microprocessors, equipped with real-time OCD infrastructures, to improved solutions based on modified interfaces, and dedicated OCD circuitry that enhance fault injection capabilities and performance. All methodologies and configurations were evaluated and compared concerning performance gain and silicon overhead.  相似文献   

3.
Real-time control applications involve the interaction of multiple components. As systems become more complex their reliability tends to decrease, hence, fault tolerance must be incorporated to keep reliability within specified levels. A novel reconfiguration mechanism inspired by mechanisms that take place during the embryonic development of living beings is proposed in this paper. It is illustrated using an example that the rapid low-level fault-recovery characteristic of the embryonic system makes it a promising approach for real-time control applications.  相似文献   

4.
一种面向图的分布式软件动态配置和容错方法   总被引:1,自引:0,他引:1  
宋毅  刘云超 《计算机应用》2003,23(12):37-41
提出一种新的方法,通过动态配置对基于组件的分布式软件的容错提供支持。此方法采用面向图的GOP编程模型,将整个分布式软件的体系结构用一张逻辑图来描述,系统的动态配置可以通过执行图上预定义的一组操作来完成。检测到故障或异常的时候实施这种动态配置能够支持系统的容错。文中描述了此方法的基本模型、系统结构和基于CORBA的原型实现。  相似文献   

5.
A novel approach to hardware fault tolerance is demonstrated that takes inspiration from the human immune system as a method of fault detection. The human immune system is a remarkable system of interacting cells and organs that protect the body from invasion and maintains reliable operation even in the presence of invading bacteria or viruses. This paper seeks to address the field of electronic hardware fault tolerance from an immunological perspective with the aim of showing how novel methods based upon the operation of the immune system can both complement and create new approaches to the development of fault detection mechanisms for reliable hardware systems. In particular, it is shown that by use of partial matching, as prevalent in biological systems, high fault coverage can be achieved with the added advantage of reducing memory requirements. The development of a generic finite-state-machine immunization procedure is discussed that allows any system that can be represented in such a manner to be "immunized" against the occurrence of faulty operation. This is demonstrated by the creation of an immunized decade counter that can detect the presence of faults in real time  相似文献   

6.
设计了一种新型三维可重构阵列结构, 并且对其互连资源在线分布式容错方法进行了研究。系统由相同的功能细胞和开关块以三维结构组成, 通过在线输入测试向量对互连线进行故障定位, 并且实现故障连线分层自修复。以四位加法/减法器电路为设计实例, 对可重构阵列功能和容错能力进行验证。实验结果表明该方法可有效完成容错, 且时间开销小、容错能力强、资源利用率高。  相似文献   

7.
基于组件的分布式软件的动态配置和容错   总被引:1,自引:0,他引:1  
论文提出一种结构化新方法,它能通过动态配置支持基于组件的分布式软件的容错。采用面向图形的编程模型,基于组件的分布式软件的软件体系结构可用一个逻辑图来表示,该逻辑图可以精化为一个明确的对象并分布到网络中,软件的动态配置通过执行定义在图上的一系列操作来实现,发生错误时通过动态重配置软件来支持容错。论文描述了该方法的基本模型、系统结构及其在CORBA上的实现原型。  相似文献   

8.
基于进化硬件的自修复TMR系统设计及其可靠性分析   总被引:2,自引:0,他引:2  
将进化硬件与传统TMR容错设计思想相结合,提出了一种具有在线自修复功能的自修复TMR系统设计方法。该系统具有多重容错和修复机制:总体采用TMR,可自动检测到故障模块;系统中每个模块均采用组件备份法,可通过组件切换法快速修复模块故障;而模块中每个组件也可通过进化进行修复。因而具有更强的容错能力和更高的可靠性。以具有片内三模冗余的2 bit乘法器为例进行了验证。最后,给出了该系统的可靠性模型,推出了可靠性计算公式,从理论上对该系统的可靠性进行了分析。结果表明:该系统能有效修复stuck-at故障,具有更长的使用寿命和更高的可靠性。  相似文献   

9.
Next generation distributed real-time systems will be complex high-performance environments containing applications with a flexible structure, integrating a large number of nodes of heterogeneous nature characterized by multiple and decoupled software units scattered all over the distributed environment; they are expected to offer data-intensive capabilities through merging the processing power of large numbers of nodes. These systems will have increased dynamic behavior by suffering frequent reconfigurations or state transitions resulting, among others, from the changing nature of the processed data. Handling the dynamics of these systems in real-time is a complex problem that requires to impose some bounds to the structure of the system to really achieve timely response not only during normal operation but also in the event of reconfigurations. In this paper, we present an approach to achieve real-time reconfiguration in distributed real-time service-based systems modeled as graphs. A reconfiguration requires to search for a new schedulable/valid solution or state from a complete system graph that contains all tentative solutions; each of these solutions will have undergo a schedulability analysis to determine if it is a valid solution; if the system graph is too complex, the overall time required for the schedulability check can be exponential with respect to the size of services and service implementations; this may lead to an unbounded reconfiguration time. In this paper, we present an approach to reduce the complexity of the system graphs so that a summarizing one that contains valid solutions is analyzed and not the complete system graph. We have implemented this mechanism inside the iLAND service reconfiguration and composition components to validate the proposed concepts and ideas; the reduction of the space of solutions with the presented approach is very high, which dramatically decreases the computation time of the reconfiguration process.  相似文献   

10.
Partial reconfiguration capabilities must be exploited to obtain the maximum benefits from dynamically reconfigurable FPGAs. Partial reconfiguration process management still faces a set of open problems that have thus far made it impossible to take full advantage of partial and dynamic reconfiguration. The work presented in this article proposes a novel architecture, development workflow, and modelling approach for dynamically reconfigurable systems management using an object model that offers a global solution. This solution is built on a system-level middleware that provides the infrastructure and tools for communication between different components in heterogeneous embedded systems. Several experiments were performed to test and evaluate each part of our proposed solution, and the obtained results are presented. These results demonstrate the excellent performance of our proposed solution.  相似文献   

11.
王一拙  陈旭  计卫星  苏岩  王小军  石峰 《软件学报》2016,27(7):1789-1804
任务并行程序设计模型已成为并行程序设计的主流,其通过发掘任务并行性来提高并行计算机的系统性能.提出一种支持容错的任务并行程序设计模型,将容错技术融入到任务并行程序设计模型中,在保证性能的同时提高系统可靠性.该模型以任务为调度、执行、错误检测与恢复的基本单位,在应用级实现容错支持.采用一种Buffer-Commit计算模型支持瞬时错误的检测与恢复;采用应用级无盘检查点实现节点故障类型永久错误的恢复;采用一种支持容错的工作窃取任务调度策略获得动态负载均衡.实验结果表明,该模型以较低的性能开销提供了对硬件错误的容错支持.  相似文献   

12.
Circuit complexities reduce overall reliability and mean-time-between-failure rates of today's very large processing arrays. Our integrated, three-level hierarchy of reconfiguration methods provides reasonable levels of fault tolerance for such systems. Operating in a completely distributed fashion, the hierarchy does not require that any components be fault free. It significantly improves array reliability by using a combination of transient fault rollback techniques and local and global reconfiguration algorithms  相似文献   

13.
In highly automated aerospace and industrial systems where maintenance and repair cannot be carried out immediately, it is crucial to design control systems capable of ensuring desired performance when taking into account the occurrence of faults/failures on a plant/process; such a control technique is referred to as fault tolerant control (FTC). The control system processing such fault tolerance capability is referred to as a fault tolerant control system (FTCS). The objective of FTC is to maintain system stability and current performance of the system close to the desired performance in the presence of system component and/or instrument faults; in certain circumstances a reduced performance may be acceptable. Various control design methods have been developed in the literature with the target to modify or accommodate baseline controllers which were originally designed for systems operating under fault-free conditions. The main objective of this article is to develop a novel FTCS design method, which incorporates both reliability and dynamic performance of the faulty system in the design of a FTCS. Once a fault has been detected and isolated, the reconfiguration strategy proposed in this article will find possible structures of the faulty system that best preserve pre-specified performances based on on-line calculated system reliability and associated costs. The new reconfigured controller gains will also be synthesised and finally the optimal structure that has the ‘best’ control performance with the highest reliability will be chosen for control reconfiguration. The effectiveness of this work is illustrated by a heating system benchmark used in a European project entitled intelligent Fault Tolerant Control in Integrated Systems (IFATIS EU-IST-2001-32122).  相似文献   

14.
Dynamic software product lines (DSPLs) propose elaborated design and implementation principles for engineering highly configurable runtime-adaptive systems in a sustainable and feature-oriented way. For this, DSPLs add to classical software product lines (SPL) the notions of (1) staged (pre-)configurations with dedicated binding times for each individual feature, and (2) continuous runtime reconfigurations of dynamic features throughout the entire product life cycle. Especially in the context of safety- and mission-critical systems, the design of reliable DSPLs requires capabilities for accurately specifying and validating arbitrary complex constraints among configuration parameters and/or respective reconfiguration options. Compared to classical SPL domain analysis which is usually based on Boolean constraint solving, DSPL validation, therefore, further requires capabilities for checking temporal properties of reconfiguration processes. In this article, we present a comprehensive approach for modeling and automatically verifying essential validity properties of staged reconfiguration processes with complex binding time constraints during DSPL domain engineering. The novel modeling concepts introduced are motivated by (re-)configuration constraints apparent in a real-world industrial case study from the automation engineering domain, which are not properly expressible and analyzable using state-of-the-art SPL domain modeling approaches. We present a prototypical tool implementation based on the model checker SPIN and present evaluation results obtained from our industrial case study, demonstrating the applicability of the approach.  相似文献   

15.
A. Egan  D. Kutz  D. Mikulin  R. Melhem  D. Moss 《Software》1999,29(4):379-395
Even though real‐time systems have the stringent constraint of completing tasks before their deadlines, many existing real‐time operating systems do not implement fault tolerance capabilities. In this paper we summarize fault tolerant real‐time scheduling policy for dynamic tasks with ready times and deadlines. Our focus in this paper is the implementation, which includes fault‐tolerant scheduling, re‐scheduling, and recovery mechanisms in the FT‐RT‐Mach operating system, a fault‐tolerant version of RT‐Mach. A real‐time train control application is then implemented using the FT‐RT‐Mach operating system. Copyright © 1999 John Wiley & Sons, Ltd.  相似文献   

16.
As the integration of transistors on today’s embedded systems scales, so does the shrinking size of chips, thus making the on-chip communication a challenging issue on the VLSI designs. However, network on chips have emerged as a promising technology to tackle the on-chip communication constraints. Likewise, the reliability issues have become the salient problem, since regarding to the inaccessible failures of on-chip elements, there must be some levels of embedded fault tolerance techniques. In this paper, an innovated technique is revealed providing fault tolerance in the on-chip networks over single and multiple permanent switch failures. The experimental results achieved by the system simulation in SystemC TLM environment are validated with the mathematical analysis modeled for system reliability that we extend in this paper, which demonstrate the extensive reliability enhancement of this paradigm. Along with the system improvement, silicon area overhead is calculated utilizing VHDL low level simulation and Orion synthesis.  相似文献   

17.
As multiprocessor systems become more complex, their reliability will need to increase as well. In this paper we propose a novel technique which is applicable to a wide variety of distributed real-time systems, especially those exhibiting data parallelism. System-level fault tolerance involves reliability techniques incorporated within the system hardware and software whereas application-level fault tolerance involves reliability techniques incorporated within the application software. We assert that, for high reliability, a combination of system-level fault tolerance and application-level fault tolerance works best. In many systems, application-level fault tolerance can be used to bridge the gap when system-level fault tolerance alone does not provide the required reliability. We exemplify this with the RTHT target tracking benchmark and the ABF beamforming benchmark.  相似文献   

18.
This work presents a modular approach for the dynamic modeling and efficient simulation of complex robot systems composed of multiple robots constrained by multiple concurrent contacts. The modular nature of the algorithm enables existing open-chain models for individual robots and other mechanisms to be incorporated without significant reprogramming, while a general contact model allows both holonomic and nonholonomic constraints in the system. An example is provided to illustrate the algorithm's modularity and demonstrate its application. In addition to the development of the dynamic equations, this paper will discuss the implementation of the simulation algorithm in detail, including issues of computational complexity.  相似文献   

19.
Diagnosis of continuous valued systems in transient operatingregions   总被引:1,自引:0,他引:1  
The complexity of present day embedded systems (continuous processes controlled by digital processors), and the increased demands on their reliability motivate the need for monitoring and fault isolation capabilities in the embedded processors. This paper develops monitoring, prediction, and fault isolation methods for abrupt faults in complex dynamic systems. The transient behavior in response to these faults is analyzed in a qualitative framework using parsimonious topological system models. Predicted transient effects of hypothesized faults are captured in the form of signatures that specify future faulty behavior as higher order time-derivatives. The dynamic effects of faults are analyzed by a progressive monitoring scheme till transient analysis mechanisms have to be suspended in favor of steady state analysis. This methodology has been successfully applied to monitoring of the secondary sodium cooling loop of a fast breeder reactor  相似文献   

20.
Distributed computer systems based on a multimicrocomputer structure offer the best preconditions to improve the reliability of a system and to realize fault tolerance. Basic fault-tolerant system (BFS), described in this paper, is the implementation of a fault-tolerant multimicrocomputer system. The architecture of BFS is a partially meshed ring structure, based on previous work. This kind of ring structure is appropriate for system monitoring and reconfiguration mechanisms.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号