首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
本文阐述了在极少考虑硬件冗余的前提下,采用信息冗余、软件冗余及时间冗余的方式,借助人工智能工具,实现在微机扩展接口上,对运行中的故障动态自动诊断并容错的方法。  相似文献   

2.
In this paper we propose a hybrid solution to ensure results correctness when deploying several applications with different safety requirements on a single multi-core-based system. The proposed solution is based on lightweight hardware redundancy, implemented using smart watchdogs and voter logic, combined with software redundancy. Two techniques of software redundancy are used: the first one is software temporal triple modular redundancy, used for those tasks with low criticality and no real-time requirement. The second software redundancy technique is triple module redundancy for tasks with high criticality and real-time requirements, assisted by a hardware voter. A hypervisor is used to separate each task in the system in an independent resource partition, thus ensuring that no functional interference is occurring. The proposed solution has been evaluated through hardware and software fault injection on two hardware platforms, featuring a dual-core processor and a quad-core processor respectively. Results show a high fault tolerance achieved using the proposed architecture.  相似文献   

3.
A census of Tandem system availability between 1985 and 1990   总被引:1,自引:0,他引:1  
A census of customer outages reported to Tandem showing a clear improvement in the reliability of hardware and maintenance has been taken. It indicates that software is now the major source of reported outages (62%), followed by system operations (15%). This is a dramatic shift from the statistics for 1985. Even after discounting systematic underreporting of operations and environmental outages, the conclusion is clear: hardware faults and hardware maintenance are no longer a major source of outages. As the other components of the system become increasingly reliable, software necessarily becomes the dominant cause of outages. Achieving higher availability requires improvement in software quality and software fault tolerance, simpler operations, and tolerance of operational faults  相似文献   

4.
This paper presents a quantitative reliability analysis of a system designed to tolerate both hardware and software faults. The system achieves integrated fault tolerance by implementing N-version programming (NVP) on redundant hardware. The system analysis considers unrelated software faults, related software faults, transient hardware faults, permanent hardware faults, and imperfect coverage. The overall model is Markov in which the states of the Markov chain represent the long-term evolution of the system-structure. For each operational configuration, a fault-tree model captures the effects of software faults and transient hardware faults on the task computation. The software fault model is parameterized using experimental data associated with a recent implementation of an NVP system using the current design paradigm. The hardware model is parameterized by considering typical failure rates associated with hardware faults and coverage parameters. The authors results show that it is important to consider both hardware and software faults in the reliability analysis of an NVP system, since these estimates vary with time. Moreover, the function for error detection and recovery is extremely important to fault-tolerant software. Several orders of magnitude reduction in system unreliability can be observed if this function is provided promptly  相似文献   

5.
Fault tolerance is the ability of a system to perform its function reliably in the presence of faulty hardware or software components. For a system to have this property, many separate issues are involved: fault confinement, fault detection, fault masking, retry, diagnosis, reconfiguration, recovery, restart, repair, and reintegration. These issues are discussed, and are applied to two well-known fault tolerance distributed systems.  相似文献   

6.
综合化航电核心处理系统容错技术研究   总被引:1,自引:0,他引:1  
当前航空电子系统是一个高度综合化、模块化的系统,作为其基础平台的核心处理计算机系统是一个综合数据处理、信号处理和图像处理的实时分布武计算机系统,可靠性要求很高,必须采用客错技术.文章在介绍实时分布武计算机系统硬件和软件结构的基础上,重点对容错技术进行了研究,提出了分层的容错管理策略,并通过具体实例说明了故障监控、故障处理和系统重构的方法.  相似文献   

7.
本文采用小波神经网络的专家诊断方法,提出了一种基于六氟化硫气体分解物的高压设备专家诊断系统的思路和方法,并且将良好的软件和硬件协同应用于高压设备专家诊断中,用于分析气体分解物的含量并做出高压设备的故障判断与对故障的预测。介绍专家诊断系统的硬件组成和软件实现方式,并通过与其他业内常用方法的比较进而突出本方法的优势。最后,通过分析一些列实验数据证明专家诊断系统的可靠性和实用性。  相似文献   

8.
This paper describes the software fault tolerance scheme, t/(n-1)-variant programming (t/(n-1)-VP), which is based on a particular system diagnosis technique used in hardware and thereby has some spectral advantages involving a simplified adjudication mechanism and enhanced capability of tolerating faults. The dependability of the t/(n-1)-VP architecture is evaluated and then compared with two similar schemes: N-version programming (NVP) and N self-checking programming (NSCP). The comparison shows that t/(n-1)-VP is a viable addition or alternative to present techniques. Much of the classical dependability-analysis of software fault tolerance approaches has focused on the simplest architectural examples that tolerate only single software faults, without considering tolerance to multiple and/or related faults. The results obtained from such analyses are thus restricted. The dependability evaluation in this paper deals with more-complicated and general software redundancy: various architectures tolerating two or more faults. It is no surprise that we came to new conclusions: both t/(n-1)-VP and the NVP scheme have the ability to tolerate some related faults between software variants; in general, t/(n-1)-VP has higher reliability, whereas NVP is better from the safety viewpoint  相似文献   

9.
This paper describes an experimental tool to evaluate and support the development of fault-tolerant machines designed for aerospace motor drives. Aerospace applications involve essentially safety-critical systems which should be able to overcome hardware or software faults and therefore need to be fault tolerant. A way of achieving this is to introduce variable degrees of redundancy into the system by duplicating one or all of the operations within the system itself. Looking at motor drives, multiphase machines, such as multiphase brushless dc machines, are considered to be good candidates in the design of fault-tolerant aerospace motor drives. This paper introduces a multiphase two-level inverter using a flexible and reliable field-programmable gate-array/digital-signal-processor controller for data acquisition, motor control, and fault monitoring to study the fault tolerance of such systems.   相似文献   

10.
本文介绍了分级分布式计算机检测与控制系统在机动车辆安全性能检测线中的应用。叙述了系统功能特点、硬件和软件设计,以及系统稳定可靠运行所采取的措施。并对多机通讯的实时性和容错性调度软件做了介绍。  相似文献   

11.
将SynqNet技术与LabVIEW应用于精调Stewart平台的运动控制系统中,介绍Stewart平台硬件结构及其控制原理,设计基于LabVIEW的Stewart平台控制系统软件,重点阐述控制系统软件的主要功能模块及实现。实验表明该系统在控制精度、稳定性、容错能力、数据处理和人机交互等方面达到比较满意的效果。  相似文献   

12.
This paper presents an innovative two-processor computer architecture, developed for the data processing unit (DPU) of the Magnetospheric IMaging Instrument (MIMI), on-board the Cassini spacecraft mission to Saturn. The main advantages of this architecture are its high performance and reliability, and its intelligence. The high performance is justified by the following: 1) optimum combination of two powerful Harris RTX 2010 processors; 2) adoption of two independent main bus structures used for the communication of the processors with the various instrument interfaces and subsystems; 3) adoption of two additional local buses on each processor board used to speed the on-board operations of the processors; 4) high speed interprocessor communication port. The high reliability is justified by the following: 1) simplicity of hardware/software structures; 2) fault tolerance capabilities; 3) capability for on-flight hardware/software reconfiguration by ground command. Moreover, the on-board intelligence is justified by the following: 1) sophisticated fault protection, data handling, and instrument control software; 2) intelligent interfaces [implemented using held programmable gate arrays (FPGAs)]; 3) capability for autonomous on-flight hardware/software reconfiguration in case of an unrecoverable failure in one processor. The advantages of this architecture make it the best choice for the DPU of the complex, sophisticated scientific MIMI instrument, compared to the traditional master-slave (low reliability-single point failure) and common shared bus (low performance, hardware and software complexity) architectures  相似文献   

13.
三余度飞控计算机架构及其可靠性研究   总被引:10,自引:1,他引:9  
余度架构设计是解决飞控计算机可靠性问题的有效途径。为了满足高可靠性飞控计算机系统对可靠性和容错性的特殊要求的目的,提出一种新型三余度飞控计算机的余度架构方案,描述飞控计算机冗余设计方法,设计余度计算机软硬件的总体框架,采用马尔可夫方法对该方案进行了可靠性分析,获得了故障覆盖率和失效率对飞控计算机整体可靠性的影响结果,得到所设计余度架构方案可行的结论。  相似文献   

14.
In this paper, we focus on reliability, one of the most fundamental and important challenges, in the nanoelectronics environment. For a processor architecture based on the unreliable nanoelectronic devices, fault tolerance schemes are required so as to ensure the basic correctness of any computation. Since any fault tolerance approach demands redundancy either in the form of time or hardware, reliability needs to be considered in conjunction with the performance and hardware tradeoffs. We propose a new computational model for the nanoelectronics based processor architectures, that provides flexible fault tolerance to deal with the high and time varying faults. The model guarantees the correctness of instruction executions, while dynamically balancing hardware and performance overheads. The correctness of every instruction is confirmed by multiple execution instances through a hybrid hardware-time redundancy approach. To achieve high system performance, multiple unconfirmed computation branches are exploited in a speculative manner. Hardware resource growth that these speculative computations entail is controlled so that the utilization of hardware is balanced between the two competing goals of performance and fault tolerance. In addition, we examine the impact on the proposed computational model of other nanoelectronic characteristics such as the necessity for localization of interconnections and the regularity of nanofabric structures on the proposed computational model. We set up an experimental framework to validate the effectiveness of the proposed scheme as well as to investigate multiple tradeoff points within the proposed approach. Simulation data confirm that the proposed computational model achieves the goal of providing flexible fault tolerance under a wide range of fault occurrence rates, while at the same time guaranteeing high system performance and efficient utilization of hardware resources.  相似文献   

15.
文章提出了一种软件的故障定位方法,该方法利用硬件的故障定位法结合遗传算法理论能够帮助测试人员在较短时间内完成软件的故障定位。  相似文献   

16.
17.
下一代网络核心业务平台的可靠性分析   总被引:2,自引:0,他引:2  
基于随机Petri网,提出了一种计算NGN核心业务平台可靠性的方法。对双机热备份容错系统和N 1互备份容错系统的硬件环境分别进行了可靠性建模与分析。提出了软件模块容错方法,并得到了该方法的软件模块可靠性表示方法;然后从软件模块的角度,对业务软件的可靠性进行了建模与分析。最后给出了一个实例,分析了NGN核心业务平台的硬件系统和业务软件是如何协作满足业务可靠性要求。  相似文献   

18.
王晶  荣金叶  周继芹  于航  申娇  张伟功 《电子学报》2018,46(10):2534-2538
针对现有容错计算机故障注入方法缺乏对空间环境中频发的单粒子故障模型的支持,本文提出了一种利用背板技术的软硬件协同仿真与故障注入技术,分别针对寄存器部件和存储器部件的特性,设计了多位错误的单粒子故障模型,在寄存器传输级实现了通过软件生成故障并注入到硬件设计中的软硬件协同故障注入方案,避免了在硬件设计中修改代码生成故障破坏系统完整性的问题.基于Leon2内核的故障注入实验表明,本文设计的平台为处理器容错设计提供了一个自动化、非侵入、低开销的故障注入和可靠性评估方案.  相似文献   

19.
This paper discusses fault tolerance in discrete-time dynamic systems, such as finite-state controllers or computer simulations, with focus on the use of coding techniques to efficiently provide fault tolerance to linear finite-state machines (LFSMs). Unlike traditional fault tolerance schemes, which rely heavily-particularly for dynamic systems operating over extended time horizons-on the assumption that the error-correcting mechanism is fault free, we are interested in the case when all components of the implementation are fault prone. The paper starts with a paradigmatic fault tolerance scheme that systematically adds redundancy into a discrete-time dynamic system in a way that achieves tolerance to transient faults in both the state transition and the error-correcting mechanisms. By combining this methodology with low-complexity error-correcting coding, we then obtain an efficient way of providing fault tolerance to k identical unreliable LFSMs that operate in parallel on distinct input sequences. The overall construction requires only a constant amount of redundant hardware per machine (but sufficiently large k) to achieve an arbitrarily small probability of overall failure for any prespecified (finite) time interval, leading in this way to a lower bound on the computational capacity of unreliable LFSMs.  相似文献   

20.
张萍 《长江信息通信》2021,34(3):121-123
在弹载嵌入式软件设计中,需考虑各种故障模式并进行针对性软件容错设计。软件容错设计包含信息容错、时间容错和结构容错。对于实时性系统来说,接口通讯过程中受到干扰等外界因素会出现通讯数据异常的偶发性故障,针对该故障模式,在信息容错的基础上,进一步设计两种软件容错方案,并开展其风险分析。这两种软件容错设计方法的可行性和有效性均在工程实际应用中得到试验验证。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号