首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
This study addresses the use of fault injection for explicitly removing design/implementation faults in complex fault-tolerance algorithms and mechanisms (FTAM), viz, fault-tolerance deficiency faults. A formalism is introduced to represent the FTAM by a set of assertions. This formalism enables an execution tree to be generated, where each path from the root to a leaf of the tree is a well-defined formula. The set of well-defined formulas constitutes a useful framework that fully characterizes the test sequence. The input patterns of the test sequence (fault and activation domains) then are determined to fewer specific structural criteria over the execution tree (activation of proper sets of paths). This provides a framework for generating a functional deterministic test for programs that implement complex FTAM. This methodology has been used to extend a debugging tool aimed at testing fault tolerance protocols developed by BULL France. It has been applied successfully to the injection of faults in the inter-replica protocol that supports the application-level fault-tolerance features of the architecture of the ESPRIT-funded Delta-4 project. The results of these experiments are analyzed in detail. In particular, even though the target protocol had been independently verified formally, the application of the proposed testing strategy revealed two fault-tolerance deficiency faults  相似文献   

2.
Processor cores embedded in systems-on-a-chip (SoCs) are often deployed in critical computations, and when affected by faults they may produce dramatic effects. When hardware hardening is not cost-effective, software implemented hardware fault tolerance (SIHFT) can be a solution to increase SoCs’ dependability, but it increases the time for running the hardened application, as well as the memory occupation. In this paper we propose a method that eliminates the memory overhead, by exploiting a new approach to instruction hardening and control flow checking. The proposed method hardens an application online during its execution, without the need for introducing any change in its source code, and is non-intrusive, since it does not require any modification in the main processor’s architecture. The method has been tested with two widely used architectures: a microcontroller and a RISC processor, and proven to be suitable for hardening SoCs against transient faults and also for detecting permanent faults.  相似文献   

3.
4.
High fault tolerance for transient faults and low-power consumption are key objectives in the design of critical embedded systems. Systems like smart cards, PDAs, wearable computers, pacemakers, defibrillators, and other electronic gadgets must not only be designed for fault tolerance but also for ultra-low-power consumption due to limited battery life. In this paper, a highly accurate method of estimating fault tolerance in terms of mean time to failure (MTTF) is presented. The estimation is based on circuit-level simulations (HSPICE) and uses a double exponential current-source fault model. Using counters, it is shown that the transient fault tolerance and power dissipation of low-power circuits are at odds and allow for a power fault-tolerance tradeoff. Architecture and circuit level fault tolerance and low-power techniques are used to demonstrate and quantify this tradeoff. Estimates show that incorporation of these techniques results either in a design with an MTTF of 36 years and power consumption of 102 /spl mu/W or a design with an MTTF of 12 years and power consumption of 20 /spl mu/W. Depending on the criticality of the system and the power budget, certain techniques might be preferred over others, resulting in either a more fault tolerant or a lower power design, at the sacrifice of the alternative objective.  相似文献   

5.
This paper presents an approach for increasing the lifetime of systems implemented on SRAM-based FPGAs, by introducing fault tolerance properties enabling the system to autonomously manage the occurrence of both transient and permanent faults. On the basis of the foreseen mission time and application environment, the designer is supported in the implementation of a system able to reconfigure itself, either by reloading the correct configuration in case of transient faults, or by relocating part of the functionality in presence of permanent faults. The result is a system implementation offering good performance and correct functionality even when faults occur. The proposed approach is evaluated in a case study to highlight the overall characteristics of the final implementation.  相似文献   

6.
Due to the wide range of critical applications and resource constraints, sensor node gives unexpected responses, which leads to various kind of faults in sensor node and failure in wireless sensor networks. Many research studies focus only on fault diagnosis, and comparatively limited studies have been conducted on fault diagnosis along with fault tolerance in sensor networks. This paper reports a complete study on both 2 aspects and presents a fault tolerance approach using regressional learning with fault diagnosis in wireless sensor networks. The proposed method diagnose the different types of faulty nodes such as hard permanent, soft permanent, intermittent, and transient faults with better detection accuracy. The proposed method follows a fault tolerance phase where faulty sensor node values would be predicted by using the data sensed by the fault free neighbors. The experimental evaluation of the fault tolerance module shows promising results with R2 of more than 0.99. For the periodic fault such as intermittent fault, the proposed method also predict the possible occurrence time and its duration of the faulty node, so that fault tolerance can be achieved at that particular time period for better performance of the network.  相似文献   

7.
Autonomic workflow execution in the grid   总被引:1,自引:0,他引:1  
Mobile agents are being leveraged in both workflow management and grid computing contexts. The convergence of these two research streams supports execution in the grid where tasks are allowed to vary in their level of interdependence. The result is an expansion of grid applications beyond those which consist of homogeneous computations decomposed and performed in parallel to those which support the parallel execution of sequences of interdependent tasks that constitute a workflow. However, grid computation of critical workflows requires that the grid platform exhibits the autonomic characteristic of self-healing in order to ensure workflow execution. To address this issue, in this work, we first develop a model for dynamic fault tolerance technique selection, which can be embedded generically in a mobile agent workflow management system. We then augment an existing architecture for flexible fault tolerance in the grid with our model, thus allowing the system to optimally configure its fault tolerance mechanisms through awareness of the computational environment. The result is a foundation for autonomic workflow management in the grid.  相似文献   

8.
Cost minimization and execution-time reduction have become the most important issues in today’s real-time embedded system. Meanwhile, for the DSP (Digital Signal Processing) applications running on embedded system, loops inside them are the most critical part for performance optimization. To optimize the loop iteration patterns, we need to schedule the loop execution order. Due to the uncertainties within the execution time of tasks, we model varied execution times of tasks as random variables and propose a novel data graph model, called HPDFG (Heterogeneous Probabilistic Data-Flow Graph) to model DSP applications on embedded systems. A novel algorithm, LSHAPE, is proposed to minimize the cost and satisfy the timing constraints. First of all, we use the data mining methods to estimate the probabilistic distribution of the execution time variables. Second, we rotate the loops in the application to explore different possible execution patterns. Finally, we combine the list-scheduling and the dynamic programming to generate a near-optimal task allocation and a core-mode assignment. Experimental results demonstrate the effectiveness of our algorithm. Our approach can handle loops efficiently.  相似文献   

9.
Real-time computer systems are often used in harsh environments, such as aerospace, and in industry. Such systems are subject to many transient faults while in operation. Checkpointing enables a reduction in the recovery time from a transient fault by saving intermediate states of a task in a reliable storage facility, and then, on detection of a fault, restoring from a previously stored state. The interval between checkpoints affects the execution time of the task. Whereas inserting more checkpoints and reducing the interval between them reduces the reprocessing time after faults, checkpoints have associated execution costs, and inserting extra checkpoints increases the overall task execution time. Thus, a trade-off between the reprocessing time and the checkpointing overhead leads to an optimal checkpoint placement strategy that optimizes certain performance measures. Real-time control systems are characterized by a timely, and correct, execution of iterative tasks within deadlines. The reliability is the probability that a system functions according to its specification over a period of time. This paper reports on the reliability of a checkpointed real-time control system, where any errors are detected at the checkpointing time. The reliability is used as a performance measure to find the optimal checkpointing strategy. For a single-task control system, the reliability equation over a mission time is derived using the Markov model. Detecting errors at the checkpointing time makes reliability jitter with the number of checkpoints. This forces the need to apply other search algorithms to find the optimal number of checkpoints. By considering the properties of the reliability jittering, a simple algorithm is provided to find the optimal checkpoints effectively. Finally, the reliability model is extended to include multiple tasks by a task allocation algorithm  相似文献   

10.
唐奇明  许勇 《现代电子技术》2012,35(15):138-141
随着车联网技术的不断发展和应用,GPRS日臻普及和嵌入式系统的崛起,车辆网通信终端变得越来越智能化。采用Android操作系统和MD231GPRS模块,以S3C6410嵌入式处理器为核心,设计了一个基于车辆故障参数的GPRS远程传输的通信终端,能实现数据的处理、远程传输。通过通信终端能对车辆状态进行实时监控,出现故障时,可以根据故障数据进行准确的修理,减少车辆的抛锚时间。  相似文献   

11.
A software methodology for detecting hardware faults in VLIW data paths   总被引:1,自引:0,他引:1  
The proposed methodology aims to achieve processor data paths for VLIW architectures able to autonomously detect transient and permanent hardware faults while executing their applications. The approach, carried out on the compiled application software, provides the introduction of additional instructions for controlling the correctness of the computation with respect to failures in one of the data path functional units. The advantage of a software approach to hardware fault detection is interesting because it allows one to apply it only to the critical applications executed on the VLIW architecture, thus not causing a delay in the execution of noncritical tasks. Furthermore, by exploiting the intrinsic redundancy of this class of architectures no hardware modification is required on the data path so that no processor customization is necessary.  相似文献   

12.
13.
Fault injection spot-checks computer system dependability   总被引:2,自引:0,他引:2  
Computer-based systems are expected to be more and more dependable. For that, they have to operate correctly even in the presence of faults, and this fault tolerance of theirs must be thoroughly tested by the injection of faults both real and artificial. Users should start to request reports from manufacturers on the outcomes of such experiments, and on the mechanisms built into systems to handle faults. To inject artificial physical faults, fault injection offers a reasonably mature option today, with Swift tools being preferred for most applications because of their flexibility and low cost. To inject software bugs, although some promising ideas are being researched, no established technique yet exists. In any case, establishing computer system dependability benchmarks would make tests much easier and enable comparison of results across different machines  相似文献   

14.
Real-time computers are often used in embedded, life-critical applications where high reliability is important. A common approach to making such systems dependable is to vote on redundant processors executing multiple copies of the same task is described. The processors which make up such voted systems are subjected not only to independently occurring permanent and transient failure, but also to correlated transients brought about by electromagnetic interference from the operating environment. To counteract these transients, checkpointing and time redundancy are required, in addition to processor redundancy. This work analyzes the use of time and device redundancy in systems subject to correlated failure. The tradeoffs in checkpoint placement in such a system are found to be considerably different from those for non-redundant systems without real-time constraints. The authors compare fault-tolerant designs and without a rollback capability, accounting for the increased hardware-failure rate due to processor duplication when faults are detected in hardware, and the doubled execution times when detection is implemented in software  相似文献   

15.
周龙  何怡刚 《现代电子技术》2006,29(10):121-123
给出了用于模拟电路元件参数识别的多频传递函数法的过程,并对故障诊断方程的可解度进行了分析,在此基础上,将诊断方程的求解转化为非线性函数的优化问题,并运用改进的遗传算法来解决这个问题,算法实例表明该方法简化了故障诊断方程的求解过程,加速了容差电路故障元件的定位,有一定的应用价值。  相似文献   

16.
This paper presents a novel asynchronous design approach for multiple input multiple output (MIMO) satellite communication (SatCom) systems. One of the main challenges for MIMO SatCom systems is that these are prone to transient faults that typically are attributable to radiation hazards. Hence, instead of using conventional synchronous circuits, we conceive our design using asynchronous circuits since it inherently has a high tolerance to transient fault. Additionally, we adopt accelerated dual paths (ADP) design into our system. By carefully arranging the data flow between the two paths, the ADP design approach can help to further accelerate the asynchronous system and increase the reliability of the system by circumventing transient faults induced delay, as well as tolerating latch-ups and other permanent faults. The numerical results show that this design approach provides promising results. For example, the proposed design can decrease the delay overhead of the entire system from 43.5 to 19.8 % at the fault rate of 400/clock cycle.  相似文献   

17.
Radiation induced faults in digital systems have started gathering major attention in recent years due to increasing reliability concern for future technologies. For future technologies, multiple transient faults (MTF) originating from a single radiation hit are expected to occur more frequently. Further, due to continuous massive scaling in device geometry, a particle with moderate linear energy transfer (LET) values is expected to affect more than one module/device during striking. Additionally, incessant escalation in operating speed with evolution of technology has increased likelihood of multi-cycle transient (MCT) faults in digital circuits. This calls for novel solutions for concurrently tackling multi-cycle transient and multi-transient fault resiliency at a higher design abstraction level such as behavioral level. This paper proposes a novel approach for generating simultaneous multi-cycle transient and multiple transient fault resilient designs during high level synthesis (HLS) of application specific datapath processors using the framework of dual modular redundancy. Results of the proposed approach on benchmarks indicated generation of low cost MCT–MFT resilient designs during HLS within acceptable runtime.  相似文献   

18.
In this paper, various circuit and system level design challenges for nanometer-scale devices and single-electron transistors are discussed, with an emphasis to the functional robustness and fault tolerance point of view. A set of general guidelines is identified for the design of very high-density digital systems using inherently unreliable and error-prone devices. The fundamental principles of a highly regular, redundant, and scalable design approach based on fixed-weight neural networks and multiple-valued logic are presented. It is demonstrated that the proposed design technique offers significantly improved immunity to permanent and transient faults occurring at the transistor level, and that it results in graceful degradation of circuit performance in response to device failures.  相似文献   

19.
We present an approach to the synthesis of fault-tolerant hard real-time systems for safety-critical applications. We use checkpointing with rollback recovery and active replication for tolerating transient faults. Processes and communications are statically scheduled. Our synthesis approach decides the assignment of fault-tolerance policies to processes, the optimal placement of checkpoints and the mapping of processes to processors such that multiple transient faults are tolerated and the timing constraints of the application are satisfied. We present several design optimization approaches which are able to find fault-tolerant implementations given a limited amount of resources. The developed algorithms are evaluated using extensive experiments, including a real-life example.   相似文献   

20.
This paper discusses fault tolerance in discrete-time dynamic systems, such as finite-state controllers or computer simulations, with focus on the use of coding techniques to efficiently provide fault tolerance to linear finite-state machines (LFSMs). Unlike traditional fault tolerance schemes, which rely heavily-particularly for dynamic systems operating over extended time horizons-on the assumption that the error-correcting mechanism is fault free, we are interested in the case when all components of the implementation are fault prone. The paper starts with a paradigmatic fault tolerance scheme that systematically adds redundancy into a discrete-time dynamic system in a way that achieves tolerance to transient faults in both the state transition and the error-correcting mechanisms. By combining this methodology with low-complexity error-correcting coding, we then obtain an efficient way of providing fault tolerance to k identical unreliable LFSMs that operate in parallel on distinct input sequences. The overall construction requires only a constant amount of redundant hardware per machine (but sufficiently large k) to achieve an arbitrarily small probability of overall failure for any prespecified (finite) time interval, leading in this way to a lower bound on the computational capacity of unreliable LFSMs.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号